De-tangle syncer and store

ZenGround0 commented 5 years ago

Description

From @whyrusleeping in #2067: "that object [the default store] is a tangle. It (and the syncer) appears to be doing the same things multiple times in several different overlapping ways. For example, the logic in the syncers 'maybe get blocks from network' is really weird... we first try to get things from the chain store, which pulls from our local disk. then we try to get from the 'offline' cbor store, which pulls from our local disk, then from an offline exchange, which pulls from the local disk (this should get fixed, theres no need for this offline thing, the blockservice should just learn to deal with a nil exchange), then it fetches from the online cborstore, which checks the local disk, then asks (everyone) over bitswap.

Acceptance criteria

Clean up this mess

Risks + pitfalls

Where to begin

anorth commented 5 years ago

Another effect of the tangling is that it's hard to insert validation logic for blocks received over the network, without redundantly validating blocks already in the store.

ZenGround0 commented 5 years ago

Using this to track technical debt in syncer tests currently. Setup is too laborious and needs rethinking. The shared dependency of fetcher and store on global IPLD store seems like it can't be helped but is a headache for testing and at least needs to be done better than right now. In the current state the test fetcher is not backed by a properly exposed or interfaced IPLD store so it can't be shared with the chain store.

anorth commented 5 years ago

As part of this we should probably be able to get rid of the blockservice that's backed only by an "offline" exchange, replacing use of that code with a thin wrapper.

anorth commented 5 years ago

We have cleaned up a few things:

there's now just one IPLD store for chain data, instead of two
tests are well on their way to being usable
the chain/state store presents a more helpful interface

The tangling is still real and confusing though. The chain store interface does not expose blocks, only the tipsets for which the syncer has computed and provided the state root. When the syncer invokes the fetcher, the fetcher directly manipulates the IPLD store underlying the chain store, but completely opaque to the syncer: it can't even inspect it to check. The fetcher also returns tipsets in an arbitrarily-large slice, which the syncer uses to compute state but then discards.

The syncer will never (and cannot) inspect the blocks the fetcher wrote to the store, but there's an assumption that they have been persisted such that store.Load will be able to rebuild the chain from whatever setHead() the syncer informs it of (the store doesn't check that it actually has those blocks).

This complex arrangement is non-intuitive and makes complex tests hard to construct. It's weird that the syncer can't inspect the block store. It's weird that the fetcher writes directly to the store as a side-effect, but the syncer can't actually use that fact and the blocks are still passed in a giant slice. I think we should either:

stop the fetcher writing to the store, use the return/callback values, and have the syncer commit blocks, or
require the fetcher to write to the store and have the syncer read the blocks from it rather than a return/callback value

ZenGround0 commented 5 years ago

stop the fetcher writing to the store, use the return/callback values, and have the syncer commit blocks

In terms of go-filecoin code structure SGTM. It will be simple to modify the store to put tipset blocks to the global ipld store. The PutTipSetAndState is a very logical place for persisting tipsets.

In terms of interacting with IPFS dependencies this might be a tough road ahead. It is a well entrenched pattern that networked content addressed data services persist the data they fetch. We could use a separate (memory?) store for that but that implies useless copying we probably don't want. Perhaps graphsync will help things.

require the fetcher to write to the store and have the syncer read the blocks from it rather than a return/callback value

Also SGTM, assuming the fetcher returns cids. I think we eventually need something like this to avoid storing huge slices in memory and this way the blockstore does all of our caching for us! Keep in mind that we recently moved the global ipld store out of the syncer and this would require putting it back.

frrist commented 5 years ago

stop the fetcher writing to the store, use the return/callback values, and have the syncer commit blocks, or

At the risk of repeating something we just removed: What if we defined an ephemeral in memory store and passed it to the fetcher for each run it made. The fetcher could write the data to this temporary store and then pass it to the syncer for inspection. The temporary store could then merge with the on disk store if it satisfies the requirements of the syncer -- basically try and make this transactional.

This would allow us to separate invalid chains from the actual chain store, but would required re-working (possible too many) things around how we construct the syncer and fetcher.

anorth commented 5 years ago

Keep in mind that we recently moved the global ipld store out of the syncer and this would require putting it back.

I imagine expanding the chain store interface, which already wraps the IPLD store, to provide the blocks, rather than passing the two stores separately again.

I concur the second path is probably better, at least if we can do it to avoid too much unnecessary de/serialization.

ZenGround0 commented 5 years ago

I imagine expanding the chain store interface, which already wraps the IPLD store, to provide the blocks, rather than passing the two stores separately again.

We also recently moved block accessor methods from the chain store api which seemed to be the right direction to scope down the store's interface.

However it is worse than just adding back something we removed. Before the blocks returned from the chain store's GetBlocks method were already validated. The change proposed above requires that the chainstore become the place to go to get validated tipsets and unvalidated blocks. This is convenient for the syncer caller but expands the store's behavior significantly eroding the store's single responsibility as the source of validated chain data.

anorth commented 5 years ago

It is a pity that the exchange-writes-directly-to-store pattern makes it difficult for us to validate blocks before they are persisted. However, I'm not convinced the the store's responsibility for validated data should necessarily be taken this far. We need good responsibility around the head and everything reachable from it, but that single action of updating the head could be the point of trust.

We can work around this by intercepting the store that the fetcher writes to, delaying actual writeback to this store until after validation. I'm not sure it's worth it, open to options as we go.

frrist commented 5 years ago

Running through the requirements to get https://github.com/filecoin-project/go-filecoin/issues/3443 working made me think of this issues. Storing everything in a datastore key'd on things other than CID's does not play nicely with dependencies from IPFS land.

I'd like (love) to play with IPFS deps- I would expect part of the detangling to include adding a blockstore (go-ipfs-blockstore) to the chain store. It should be used for persisting blockheaders, messages, and receipts. I would expect the concept of a "TipSet" to be abstracted away when persisting data to the blockstore.

ZenGround0 commented 5 years ago

Storing everything in a datastore key'd on things other than CID's does not play nicely with dependencies from IPFS land.

This makes sense for the tsKey to state root mapping particularly in the future when we want to send state snapshots to light clients. But please note that the head key is inherently mutable and should not work with the whole IPFS model.

I would expect the concept of a "TipSet" to be abstracted away when persisting data to the blockstore.

In my head this is kind of the point of the chain store as opposed to the node's ipld storage (aka wrapper around global blockstore). In the chain store we track how blocks are arranged and what their validated state looks like once processed (tipset level). The node's ipld storage handles the actual data (block level).

I would expect part of the detangling to include adding a blockstore (go-ipfs-blockstore) to the chain store. It should be used for persisting blockheaders, messages, and receipts.

I am quite frustrated that we keep going in circles on this. Part of the work "de-tangling" was to remove block data accesses from the chain store to simplify the interface. While I am in the minority here I don't think the suggested change has much to do with the syncer + store entanglement this was originally filed for. I have a strong opinion that adding more and more to the interface makes things worse, not better but it seems like we keep talking past each other on this issue.

I would really like us to hash out a vision for the chain store / fetcher / syncer boundary so that we can front load all of our disagreements and gain enough shared context to drive this code home smoothly.

filecoin-project / venus