filecoin-project / ref-fvm

Reference implementation of the Filecoin Virtual Machine
https://fvm.filecoin.io/
Other
380 stars 136 forks source link

Technical design: Logs and events #728

Closed raulk closed 1 year ago

raulk commented 2 years ago

Context

The Ethereum blockchain has the concept of logs, which are events emitted from smart contracts during execution. Logs contain arbitrary data, and are annotated with zero to four 32-byte topics depending on the opcode used (LOG0..LOG4). The fields from logs (topics, data, emitting address) are added to a 2048-bit bloom filter which is then incorporated to the block header.

The bloom filter is important because it is used by:

  1. light clients and wallets to quickly evaluate if a block is of interest depending on what they are looking for.
  2. full nodes to service log-related JSON-RPC queries (eth_getLogs, eth_getFilterLogs, eth_getFilterChanges); either in a streaming or polling fashion. Filter support implies tracking state at the node level.

AFAIK logs in Ethereum are not part of the world state, i.e. they are not stored in the state tree (we need to double check this). They are just emitted during execution and consensus is arrived to through the bloom filter, gas used, and other outputs.

Requirements

The EVM compatiblity in Filecoin will need to support Ethereum logs at the protocol level and the JSON-RPC level. We should avoid overfitting to Ethereum's needs -- this feature should be available to native actors too and should be generally usable and accessible.

Possible design direction

At this stage, we do not plan on introducing modifications to the chain data structures, so populating an aggregation of logs in block headers is a no-go. That leaves us with three options:

Light client operation

In Ethereum, light clients monitor block headers containing event bloom filters to determine whether they want to act on a block. Since Filecoin does not include the logs blooms in a chain structure, Filecoin light clients would operate by receiving the current bloom in the system actor accompanied by a merkle inclusion proof.

Stebalien commented 2 years ago

~https://github.com/filecoin-project/ref-fvm/issues/784~

edit: hm. Wrong issue.

Stebalien commented 2 years ago

On bloom filters, we should revisit that decision from first principles.

From that, I'd say we should:

  1. Consider storing events (or at least the keys) in a HAMT (reset every epoch). Clients can download only the parts of the HAMT that they need.
  2. If we still need a bloom filter (likely easier for quick light-client checks), we should probably make the size dynamic depending on the number of events. This isn't something we can reasonably do if we put it into the block header itself, but it's something we can do if we put it in the state-tree.

Where to put them...

I'd prefer to hang them off the block (treat them like receipts), we should talk with the core implementers to see how difficult this would be. For example, we could change BlockHeader.ParentMessageReceipts to actually be BlockHeader.ParentArtifacts (or something like that), including receipts, events, and anything else we need to stash in the block header. This should be quite doable (even simple) given few components interact with the receipts.

If not, storing them in an actor isn't the end of the world. However:

  1. I'd just clear the list on every epoch.
  2. I wouldn't make the events available to other actors, that's not really what these are for.
Stebalien commented 2 years ago

Resolution from the discussion today:

Specifically, something like:

type BlockHeader struct {
    Miner address.Address // 0 unique per block/miner
    // ...
    ParentArtifacts cid.Cid
}

type ExecutionArtifacts struct { // name TBD
    // A variable-sized bloom filter to quickly tell what events may exist.
    EventBloomFilter []byte

    // An AMT of all events.
    Events cid.Cid

    // A HAMT indexing events mapping index keys to indices in the Events AMT.
    EventIndex cid.Cid
}

Design rational:

Drawbacks:

Open Questions:

Stebalien commented 2 years ago

@raulk we should probably discuss the open questions in standup before continuing here.

Stebalien commented 2 years ago

Next step: Write up a series of use-cases to better understand the problem.

raulk commented 2 years ago

Use cases include:

raulk commented 2 years ago

We will need to associate the logs to the concrete messages that emitted them. Ethereum does this by embedding the logs in the receipt (including a bloom filter, which I don't know if it's scoped to the logs in that message, or the cumulative bloom filter up until then; I'd imagine the former). One idea is to have a top level structure vector structure collecting all logs from the tipset, and receipts would contain bitfields addressing the emitted logs via their index into the vector. However, this makes producing inclusion proofs harder (I think), and it makes the message receipts less useful by themselves.

anorth commented 2 years ago

@Stebalien what is "index keys" that are the keys of the HAMT?

I agree that logs/events need to be referenced from the message receipts in order to be most useful to light clients, UIs etc. If we put such structure in the message receipts, then do we need the events and index in the block at all? They're committed via the receipts root CID.

Stebalien commented 2 years ago

what is "index keys" that are the keys of the HAMT?

TBD. We want to make it possible for a light client to get a succinct (and cheap) proof that some event did or did not happen in any given block.

Likely:

But I'm a bit concerned that the HAMT could grow large.

I agree that logs/events need to be referenced from the message receipts in order to be most useful to light clients, UIs etc. If we put such structure in the message receipts, then do we need the events and index in the block at all? They're committed via the receipts root CID.

Unfortunately, light clients would have to download all messages and receipts (including top-level return values) for that to work. We'd like light clients to be able to download just:

Then, if their event is in the bloom filter:

Stebalien commented 2 years ago

Concrete proposal:

  1. Introduce a new log that takes a set of log topics and a block ID.
  2. Do NOT index anything (yet). Indexing will be handled in a followup FIP.
fn log(count: u32, topics: *const u8, value: BlockId)

Where:

Define an event object of the type:

struct Event {
    actor: ActorID,
    topics: Vec<u8>,
    value: cid.Cid,
}

When an event is logged:

  1. Make a CID of the value block where:
    1. The CID is "inline" if the length of the value is <= 32 bytes.
    2. Otherwise, we hash with blake2b.
  2. Record an event object with the caller's ActorID, the specified topics, and the value CID.

When creating a message receipt, pack all events into an AMT in-order and include the AMT root in the receipt.

Decisions

raulk commented 2 years ago

Notes from sync design meeting + concrete proposals

Descoping indices

We are moving the indexes out of the scope of this solution. Right now we want to focus on the simplest, extensible solution that: (a) is not overengineered for what we need now, (b) does not back us into a design corner now without sufficient information, (c) is easily extensible in the future.

Storing raw events

For now, we will be storing the raw events only, allowing clients to experiment and generate indexes client side entirely. The schema of an event is as follows:

(see @Stebalien's comment above)

During execution, the Call Manager adds emitted events to the blockstore and populates an AMT tracking the Cids of those event objects.

Commitment on chain

We extend the Receipt chain data structure with a new field:

pub struct Receipt {
    // existing fields
    exit_code: ExitCode,
    return_data: RawBytes,
    gas_used: i64,
    // new field
    events: Cid,
}

When the message is finalized, we return the Receipt with the events field populated.

Patterns of access

While the protocol does not mandate this, clients may wish to cache events in a local database for efficient access. With the structure above, it's possible to access events for a given message or all events for a tipset by returning events from all receipts.

Ethereum JSON-RPC compatibility

At this stage, we do not track logs blooms and we definitely do not track Ethereum formatted blooms (fixed size keccak256 based hashing). The Ethereum JSON-RPC API will need to recreate the bloom filters on demand (or implementations could choose to do something different if they wish to optimise for faster bloom query).

raulk commented 1 year ago

Draft FIP at https://github.com/filecoin-project/FIPs/pull/483.

raulk commented 1 year ago

We can consider the technical design phase to have finished, culminating with the FIP draft at https://github.com/filecoin-project/FIPs/pull/483. Closing this issue.