Closed hannahhoward closed 2 years ago
@hannahhoward Thanks for the updates!
Tho this doesn't sounds like a lotus issue yet, and most of the fix will be in gs and gdt then get dep integrated to lotus? so would you mind/ would it be proper for me to transfer this to our lotus discussion and the community may track the investigation progress there?
Several storage providers have identified that transfers sometimes can get stuck at "zero bytes" on their miners. In working with a test miner and estuary, so far I have identified the following sequences that seem to produce these "zero byte transfers":
Proposed solution: fix https://github.com/ipfs/go-graphsync/issues/314 and https://github.com/filecoin-project/go-data-transfer/issues/288
Never got to top of queue error a. The provider receives and accepts a data transfer request b. It queues the outgoing graphsync request c. Due to simultaneous outgoing request limits (20 by default I think), Graphsync request is not sent for several hours d. Before reaching top of queue, markets node restart occurs e. Upon restart, deal enters "storage provider await restart". Nothing else happens. d. Estuary will eventually hit a 24 hour accept timeout and cancel the deal - may or may not receive the timeout
Failed while restarting a. Data transfer in progress when provider restarts, going off line b. Provider is off-line long enough for Estuary to restart the transfer enough times to decide to fail the transfer c. Provider comes back on line, enters StorageProviderAwaitingRestart, never hears anything else as transfer is cancelled on Estuary side.
Proposed solution: The provider does not monitor for restarts or attempt them. The thinking behind this is that the client may not be reachable if not already connected. However, this has the side effect that the provider has no monitor for transfers that have essentially failed. A better solution would be to attempt restarts, but not fail if they don't go through. Moreover, it makes sense for AwaitRestart to have a timeout.