filecoin-project / venus

Filecoin Full Node Implementation in Go
https://venus.filecoin.io
Other
2.05k stars 459 forks source link

Chain Manager Refactor Notes #797

Closed ZenGround0 closed 6 years ago

ZenGround0 commented 6 years ago

@dignifiedquire here are thoughts I want to sync up about before I continue with this

The Store

  1. I don’t like the name Store that much. I like ‘BlockTree’ because that is what this is, but it’s not ideal because we already use Tree and Block too much. Maybe ‘Chain’? I guess chain.Store as you have now isn’t so bad.

  2. I really like how store relies on a block service. I guess that is the obvious choice but I like it.

  3. Now is a good time to stop and think more about @aboudman’s suggestion that we store the leaf cids of the entire block tree on shutdown, and regenerate on startup so that we can make the Store truly persist between startups. Alternatively the question is, should all blocks of the Store persist between startups? Here’s the discussion: https://github.com/filecoin-project/go-filecoin/issues/614. I can see both sides but @aboodman’s argument was convincing to me. If we do this we’ll need to think about pruning probably. Maybe this is a NICE TO HAVE or something further down the road.

  4. We need to develop while keeping in mind that adding to the Store is a highly trusted operation. One nice thing about the State function in the old world is that you only need to trust State (and its caches which admittedly sucks). Now you need to trust isValid(same code as State basically), Store.Put, and the Syncer call that runs put only after isValid returns true. This isn’t necessarily worse, Store is a cleaner way to store all of the caches that the chain manager was implicitly trusting would only be set by State. We need to be clear about this though and remember we are designing to make it really hard for programmers to let bad things in the Store by mistake.

  5. I’m not sure if SetHead or Head should be on the store at all. Why not have consensus store the current best head and handle new head events? My position makes more sense if its taken alongside the idea presented in 3 that the store should track all leaves (or all leaves of a certain height /weight). If we are only storing the best chain then I am more sympathetic to keeping a Head on the store.

  6. The store’s various caches and their policies seem like the core of Store implementation. Just thinking out loud here, one thing to do is try to guarantee that the “frontier” of the block tree, i.e. the heads of all valid chains seen, are kept in memory, this way we can validate really fast. Just reiterating what we already know, we also want to organize blocks into tipsets and store an index of these tipsets. Right now we index by height and parent hash, we should think about all of this from first principles to make sure the indexing makes sense. Now that things are getting deleted, we should think about how indexes are regenerated when valid but evicted blocks are fetched after a cache miss.

The Syncer

  1. Syncer HandleBlock call should be scoped to a single block, tipsets don’t propagate, blocks do, because miners mine individual blocks

  2. Syncer HandleBlock and state/isValidBlock need some adjustments. The fundamental issue is that the block handling code needs to be able to handle valid chains with more than 1 block that has not yet been checked to be in the store. As written in the ProcessBlock pseudocode, a syncer that was two blocks behind could mistakenly find a valid head invalid.

  3. When making these adjustments we have some options. We could prefetch all blocks that the Store does not have and run isValidBlock (i.e. the state function but no recursion) one at a time, building up the new chain. This is familiar to me because it is how things worked before the state function refactor. @aboudman and I moved to the state function factoring primarily because we found it made things simpler to reason about. I touched on this above, the state/isvalid function only works under the precondition that the Store has valid blocks. However this seems like maybe it is now the simplest thing to do now that we are refactoring into a Syncer, and is definitely the quickest way to modify the pseudocode to work.

  4. On the opposite side of the spectrum we could use the current state function directly on the head of new blocks as we do now. This is a problem because it opens up a DOS vector and is just less efficient because state is a recursive function with call stack overhead and stack overflow risk. We would need to do caching controlled by the state function and these caches would look a lot like the Store. See discussion at the bottom of https://github.com/filecoin-project/go-filecoin/issues/605.

  5. In the last comment of https://github.com/filecoin-project/go-filecoin/issues/605 I mused over a potential intermediate solution. It somewhat kept simplicity in the validation logic while allowing for better resource management. In reality it might just make everything less explicit and harder to understand because the resource management is transparent to the reader of the validation logic. Also this solution or (10) probably require slightly more complex interaction between the Store and the Consensus component’s State function. I’m ok with solution (9) because hopefully the refactoring itself will make all of this easier to follow, just wanted to put all the past thinking out in the open.

  6. The Syncer is a great place to cache bad blocks and enforce the “punctuality” limit where node’s reject chains that extend too far back in the past (which shores up another DOS vector)

Thanks for reading! Let’s sync up as soon as you can this week so we can move this along with maximal shared understanding.

dignifiedquire commented 6 years ago

Folding into #708