Fetch failure due to lack of disk space

Describe the bug

WARN    rpc go-jsonrpc@v0.1.2-0.20200822201400-474f4fdccc52/handler.go:241  error in RPC call to 'Filecoin.Fetch': allocate local sector for fetching:
    github.com/filecoin-project/lotus/extern/sector-storage/stores.(*Remote).AcquireSector
        /home/downloads/lotus/extern/sector-storage/stores/remote.go:111
  - couldn't find a suitable path for a sector:
    github.com/filecoin-project/lotus/extern/sector-storage/stores.(*Local).AcquireSector
        /home/downloads/lotus/extern/sector-storage/stores/local.go:402

If a worker happened to get more sectors than it can handle, eventually it runs out of disk space. When it happens, all sectors that belong to the worker fail to finalize because they can't be fetched.

Usually when this happens, I can find some files that the worker should no longer have.

report_fetch report_fetch_2

This time, I found 682G worth of cache files in cache/fetching directory. They are all in Proving state so they should have been deleted.

To Reproduce

A worker dedicated to PC2 takes sectors while it has some sectors in WaitSeed, Commiting, and Finalize state.
Some sectors that are now proving didn't get deleted from the worker properly.
The worker runs out of disk space.
The sectors it had can't be finalized.
From that point, the worker does nothing but outputting errors.

Expected behavior

Any obsolete sector files should always be deleted from the worker.
If the worker can't do anything due to lack of disk space, it should try to move some sectors to the miner or other workers to make some space.

Version (run lotus version): Tag v0.8.0

Additional context I think this is happening because I simply don't have enough disk space in each worker. In theory, it should be enough but sometimes, some workers get more sectors than others and run out of disk space.

filecoin-project / lotus

Fetch failure due to lack of disk space #4069