race conditions in fsspec cache?

fsspec / filesystem_spec

A specification that python filesystems should adhere to.

BSD 3-Clause "New" or "Revised" License

1.01k stars 356 forks source link

race conditions in fsspec cache? #639

Open nbren12 opened 3 years ago

nbren12 commented 3 years ago

I recently had some strange data corruption errors happen when using a cached fsspec filesystem from parallel processes. Seemed like a race condition. Is the cached file system thread/process safe?

According this this stackoverflow answer os.rename can ensure that parallel writes do not corrupt a single file.

martindurant commented 3 years ago

The cached filesystem is largely untested from multiple processes. I would probably expect multi-thread to be OK, though. The current cache implementations might not work well, in general, for large loads. There is an idea elsewhere to try to decouple the logical layer (what gets cached, when) from the storage end (dealing with the disk or other backend). I could use help planning these.

Note that the cache metadata is saved using shutil.move, which indeed is the same as os.rename where the paths are on the same device. This should probably be used for the files themselves - which would perhaps best be accomplished by using LocalFileSystem.open(, autocommit=False) or transactions.

nbren12 commented 3 years ago

The cached filesystem is largely untested from multiple processes.

Yah, I just started playing around with it. It seems like a potentially powerful feature, but as they say: "there are two hard problems in CS..."

This should probably be used for the files themselves - which would perhaps best be accomplished by using LocalFileSystem.open(, autocommit=False) or transactions.

Refactors aside, I think this would resolve this issue.

martindurant commented 3 years ago

Essentially, the cachers call fs.get(remotepaths, localpaths) in just a couple of places, so it would be simple to use a temporary location until downloading has completed. I don't think there is any foolproof way to ensure that the temporary file is on the same device as the eventual target, except to put them in the same directory (e.g., "{localpath}.temp.UUID").

nbren12 commented 3 years ago

except to put them in the same directory (e.g., "{localpath}.temp.UUID")

this sounds like a good idea. It would also probably not work reliably on NFS.