filecoin-project / venus

Filecoin Full Node Implementation in Go
https://venus.filecoin.io
Other
2.06k stars 459 forks source link

Garbage collect piece data from blockstore after staging #3062

Closed anorth closed 4 years ago

anorth commented 5 years ago

Description

Piece data is transmitted from a deal client to a miner over bitswap. In implementation, this means that the data is written to a bitswap blockstore at both ends. The miner then stages this data for sealing, after which point the data in the blockstore is redundant. Nothing ever cleans up this blockstore so it eats disk space without bound.

For a miner receiving a lot of deals, this is a big deal. The redundant data in the blockstore exceeds their sealed data.

Client data should be removed from the blockstore after it is staged to the sector builder.

See a report in slack here.

Acceptance criteria

Risks + pitfalls

This blockstore is the same one that stores blockchain blocks (also exchanged over bitswap) and the state tree. Identifying the redundant client data may be challenging given its breakdown into a unixfsv1 DAG. Those the blocks and state also need garbage collection (#2634, #2637), but that's less pressing than, and beyond the scope of, this issue.

A possible way out is that the blockchain blocks will soon move to graphsync for exchange, so client data will be the only use of bitswap, and it could use a dedicated store.

Where to begin

The data is fetched and staged in storage/miner.go processStorageDeal.

cc @ZenGround0 @laser @acruikshank

anorth commented 5 years ago

See also #2345, which proposes a CLI invocation for the client (but doesn't address the miner side).

ridewindx commented 5 years ago

@anorth

ridewindx commented 5 years ago

@anorth I can help to implement it, but is it OK?

anorth commented 5 years ago

There is work in progress for this (currently assigned to @ingar). I believe the result will end up with a separate datastore for the piece data. This will move it out of scope of the dag commands. Your third point suggestion is somewhat complex, we might try to just empty the datastore entirely while there are no pieces being staged instead.

anorth commented 4 years ago

This will be made redundant by the shared storage market component.