Closed raulk closed 1 year ago
~https://github.com/filecoin-project/ref-fvm/issues/784~
edit: hm. Wrong issue.
On bloom filters, we should revisit that decision from first principles.
From that, I'd say we should:
Where to put them...
I'd prefer to hang them off the block (treat them like receipts), we should talk with the core implementers to see how difficult this would be. For example, we could change BlockHeader.ParentMessageReceipts
to actually be BlockHeader.ParentArtifacts
(or something like that), including receipts, events, and anything else we need to stash in the block header. This should be quite doable (even simple) given few components interact with the receipts.
If not, storing them in an actor isn't the end of the world. However:
Resolution from the discussion today:
Specifically, something like:
type BlockHeader struct {
Miner address.Address // 0 unique per block/miner
// ...
ParentArtifacts cid.Cid
}
type ExecutionArtifacts struct { // name TBD
// A variable-sized bloom filter to quickly tell what events may exist.
EventBloomFilter []byte
// An AMT of all events.
Events cid.Cid
// A HAMT indexing events mapping index keys to indices in the Events AMT.
EventIndex cid.Cid
}
Design rational:
Drawbacks:
Open Questions:
@raulk we should probably discuss the open questions in standup before continuing here.
Next step: Write up a series of use-cases to better understand the problem.
Use cases include:
We will need to associate the logs to the concrete messages that emitted them. Ethereum does this by embedding the logs in the receipt (including a bloom filter, which I don't know if it's scoped to the logs in that message, or the cumulative bloom filter up until then; I'd imagine the former). One idea is to have a top level structure vector structure collecting all logs from the tipset, and receipts would contain bitfields addressing the emitted logs via their index into the vector. However, this makes producing inclusion proofs harder (I think), and it makes the message receipts less useful by themselves.
@Stebalien what is "index keys" that are the keys of the HAMT?
I agree that logs/events need to be referenced from the message receipts in order to be most useful to light clients, UIs etc. If we put such structure in the message receipts, then do we need the events and index in the block at all? They're committed via the receipts root CID.
what is "index keys" that are the keys of the HAMT?
TBD. We want to make it possible for a light client to get a succinct (and cheap) proof that some event did or did not happen in any given block.
Likely:
But I'm a bit concerned that the HAMT could grow large.
I agree that logs/events need to be referenced from the message receipts in order to be most useful to light clients, UIs etc. If we put such structure in the message receipts, then do we need the events and index in the block at all? They're committed via the receipts root CID.
Unfortunately, light clients would have to download all messages and receipts (including top-level return values) for that to work. We'd like light clients to be able to download just:
Then, if their event is in the bloom filter:
Concrete proposal:
log
that takes a set of log topics and a block ID.fn log(count: u32, topics: *const u8, value: BlockId)
Where:
count
is the number of topics (1-4 for now).topics
is a byte slice with the length 32*topics
. Each topic is an arbitrary 32bit key topics[i*32..(i+1)*32]
.value
is a block ID of a value.Define an event object of the type:
struct Event {
actor: ActorID,
topics: Vec<u8>,
value: cid.Cid,
}
When an event is logged:
value
block where:
When creating a message receipt, pack all events into an AMT in-order and include the AMT root in the receipt.
Decisions
Notes from sync design meeting + concrete proposals
We are moving the indexes out of the scope of this solution. Right now we want to focus on the simplest, extensible solution that: (a) is not overengineered for what we need now, (b) does not back us into a design corner now without sufficient information, (c) is easily extensible in the future.
For now, we will be storing the raw events only, allowing clients to experiment and generate indexes client side entirely. The schema of an event is as follows:
(see @Stebalien's comment above)
During execution, the Call Manager adds emitted events to the blockstore and populates an AMT tracking the Cids of those event objects.
We extend the Receipt
chain data structure with a new field:
pub struct Receipt {
// existing fields
exit_code: ExitCode,
return_data: RawBytes,
gas_used: i64,
// new field
events: Cid,
}
When the message is finalized, we return the Receipt with the events field populated.
While the protocol does not mandate this, clients may wish to cache events in a local database for efficient access. With the structure above, it's possible to access events for a given message or all events for a tipset by returning events from all receipts.
At this stage, we do not track logs blooms and we definitely do not track Ethereum formatted blooms (fixed size keccak256 based hashing). The Ethereum JSON-RPC API will need to recreate the bloom filters on demand (or implementations could choose to do something different if they wish to optimise for faster bloom query).
Draft FIP at https://github.com/filecoin-project/FIPs/pull/483.
We can consider the technical design phase to have finished, culminating with the FIP draft at https://github.com/filecoin-project/FIPs/pull/483. Closing this issue.
Context
The Ethereum blockchain has the concept of logs, which are events emitted from smart contracts during execution. Logs contain arbitrary data, and are annotated with zero to four 32-byte topics depending on the opcode used (LOG0..LOG4). The fields from logs (topics, data, emitting address) are added to a 2048-bit bloom filter which is then incorporated to the block header.
The bloom filter is important because it is used by:
eth_getLogs
,eth_getFilterLogs
,eth_getFilterChanges
); either in a streaming or polling fashion. Filter support implies tracking state at the node level.AFAIK logs in Ethereum are not part of the world state, i.e. they are not stored in the state tree (we need to double check this). They are just emitted during execution and consensus is arrived to through the bloom filter, gas used, and other outputs.
Requirements
The EVM compatiblity in Filecoin will need to support Ethereum logs at the protocol level and the JSON-RPC level. We should avoid overfitting to Ethereum's needs -- this feature should be available to native actors too and should be generally usable and accessible.
Possible design direction
At this stage, we do not plan on introducing modifications to the chain data structures, so populating an aggregation of logs in block headers is a no-go. That leaves us with three options:
GetLogsBloom(height)
to return it. We'd need to add a cron job to prune LogActor entries and limit them to the current finality. Getting the logs would require re-execution and introspection of call parameters through execution traces.Light client operation
In Ethereum, light clients monitor block headers containing event bloom filters to determine whether they want to act on a block. Since Filecoin does not include the logs blooms in a chain structure, Filecoin light clients would operate by receiving the current bloom in the system actor accompanied by a merkle inclusion proof.