Closed anorth closed 4 years ago
See also #2345, which proposes a CLI invocation for the client (but doesn't address the miner side).
@anorth
go-filecoin dag remove <pieceid>
, it should find all nodes under the top node, and then check whether these cids are used by other pieces.@anorth I can help to implement it, but is it OK?
There is work in progress for this (currently assigned to @ingar). I believe the result will end up with a separate datastore for the piece data. This will move it out of scope of the dag
commands. Your third point suggestion is somewhat complex, we might try to just empty the datastore entirely while there are no pieces being staged instead.
This will be made redundant by the shared storage market component.
Description
Piece data is transmitted from a deal client to a miner over bitswap. In implementation, this means that the data is written to a bitswap blockstore at both ends. The miner then stages this data for sealing, after which point the data in the blockstore is redundant. Nothing ever cleans up this blockstore so it eats disk space without bound.
For a miner receiving a lot of deals, this is a big deal. The redundant data in the blockstore exceeds their sealed data.
Client data should be removed from the blockstore after it is staged to the sector builder.
See a report in slack here.
Acceptance criteria
Risks + pitfalls
This blockstore is the same one that stores blockchain blocks (also exchanged over bitswap) and the state tree. Identifying the redundant client data may be challenging given its breakdown into a unixfsv1 DAG. Those the blocks and state also need garbage collection (#2634, #2637), but that's less pressing than, and beyond the scope of, this issue.
A possible way out is that the blockchain blocks will soon move to graphsync for exchange, so client data will be the only use of bitswap, and it could use a dedicated store.
Where to begin
The data is fetched and staged in storage/miner.go
processStorageDeal
.cc @ZenGround0 @laser @acruikshank