Open th7nder opened 3 weeks ago
More details:
But how do we know we should retry it? When cancelling we need to place some state in the database that indicates that
AddPiece
with certain parameters was cancelled. Or better yet, add it in the beginning and removing it whenadd_piece
finishes.Also, is
AddPiece
idempotent? If retrying is fine, it indicates that it may be — https://github.com/eigerco/polka-storage/pull/483/files#r1824853789You're right, AddPiece is not idempotent at this stage, as it'll create a new sector each time it is called. It is cancellation safe though, because when interrupted it does not leave the state corrupted. It won't add deal to a sector, the SP will be slashed and end of story.
About retries, right... We don't have a mechanism for that yet. We'd need to store the state of the piece somewhere (modify the deal storage, because AddPiece can be called again solely based on published deal data) and have some periodic scanner, on startup that checks whether something was in process but not finished and clean it up, call add piece again.
Maybe can be done as part of #493? — https://github.com/eigerco/polka-storage/pull/483/files#r1825948465
As a takeaway, we'll need to track the pieces in some persistent storage (like RocksDB) and perform checks and retries.
About the retries, it depends where the error happened. If it was a disk failure, it might not be worth it
_Originally posted by @jmg-duarte in https://github.com/eigerco/polka-storage/pull/483#discussion_r1824096430_
E.g. when Piece was being added, and cancelled (node shutdown), on boot we need to check pieces which have been cancelled and retry them.