filecoin-project / specs

The Filecoin protocol specification
https://spec.filecoin.io
Other
367 stars 171 forks source link

IPLD/state store DAG semantics, views, garbage collection #760

Open anorth opened 4 years ago

anorth commented 4 years ago

The current IPLD store interface appears simply as a Get/Put-style object store, which is misleading about the desired semantics. This issue documents my understanding, discussed with others, ahead of expression in the spec.

The state visible to and operated on by the VM is semantically a connected DAG, with a distinguished root being the root CID written into a block header. Only objects reachable in a traversal from this root, following CID links, are accessible in the store. The Get and Put methods are intended to be used to implement IPLD data structures such as the HAMT and AMT, not as a general object or K/V store. A future better abstraction could make this more clear.

Transaction scope The logical lifetime of a value Put() in the store, but not yet referenced transitively from the root, is still TBD (i.e. transaction scope). Garbage collection should remove unreferenced objects after that lifetime (logically, if not physically).

The shortest reasonable lifetime would be until the next state handle release, and I propose we do this. This would prevent the store being used as a communication channel between actors unless the CID of the object was also written to actor state, since actors cannot send messages while the state handle is held.

(A longer lifetime could be the current method invocation, or the current top-level message invocation. This would permit the state store to be used as a short-term communication channel between actors, which could exchange CIDs in messages pointing to objects placed in the state store. The need for this is unclear, though, and doing so would prevent future improvements to the Runtime API to make the tree nature more clear. Efficient inter-actor communication could be achieved more directly with some other mechanism in the future.)

Non-urgency of views and garbage collection The view and garbage collection semantics outlined above are not urgent for implementation because, given our control of the built-in actor code, we can ensure that the semantics are indistinguishable from having no views, transactions, or garbage collection. I believe that a sufficient constraint to achieve this is:

This ensures that actors never attempt to retrieve an object which might have been gc'd (barring bugs). Note that there are CIDs in the state tree which are not expected to resolve: un/sealed sector CIDs, Piece CIDs in deals, etc. No actor should ever attempt to retrieve these from the state tree.

Implementers will probably add some kind of garbage collection for performance reasons, but initially have some freedom in semantics.

cc @jbenet @whyrusleeping @icorderi @sternhenri

wadealexc commented 4 years ago

Is actor state distinct from the state tree accessed through Get and Put?

I know an actor can Get things another actor Puts (as long as they have the CID). Does an actor's state have a root CID, and could another actor use that CID to Get the actor's state?

anorth commented 4 years ago

Pragmatically, they are in the same store and so we should define it that way. So yes, if an actor transmitted its state head CID to another actor, that actor could load and access it.