fsspec / filesystem_spec

A specification that python filesystems should adhere to.
BSD 3-Clause "New" or "Revised" License
1.02k stars 358 forks source link

`FSMap.setitems` fails when `FSMap.__setitem__` succeeds (nested directory on local filesystem). #1041

Open d70-t opened 2 years ago

d70-t commented 2 years ago

When using a mapper on a local filesystem, it's possible to create nested subdirectory structures using FSMap as in:

import fsspec
import tempfile

with tempfile.TemporaryDirectory() as d:
    m = fsspec.get_mapper(d)
    m["foo/bar"] = b"baz"

however, trying the same thing using setitems fails:

import fsspec
import tempfile

with tempfile.TemporaryDirectory() as d:
    m = fsspec.get_mapper(d)
    m.setitems({"foo/bar": b"baz"})

I'd expect the latter to work just as well as the former, however, it raises:

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp4msurdfc/foo/bar'

This is likely because there's no mkdirs in setitems, while there is one in __setitem__. I don't know which place would be the best to put a mkdirs-like call when setting multiple items at once.

martindurant commented 2 years ago

The following works:

with tempfile.TemporaryDirectory() as d:
    m = fsspec.get_mapper(d, auto_mkdir=True)
    m.setitems({"foo/bar": b"baz"})

So I'm not sure whether the mapper ought to be creating directories or not :)

d70-t commented 2 years ago

Cool :+1: I didn't know about this option. But still, shouldn't be both of the two consistent (i.e. either both working or both raising?)

martindurant commented 2 years ago

shouldn't be both of the two consistent

Yes, I'm just not sure which one is right. For object stores, it doesn't matter as there are no directories.

martindurant commented 2 years ago

(or indeed, whether auto_mkdir should be True by default, which has also come up before)

d70-t commented 2 years ago

I'd argue, if I use the mapper API, I want to treat the filesystem as if it were an object store, thus I'd expect an object-store like behavior.

But if auto_mkdir is supposed to intentionally disable the creation of intermediate directories, then of course this still shouldn't happen...

Maybe auto_mkdir should be something like None by default, with the following semantics:

martindurant commented 2 years ago

That sounds reasonable, but tricky to implement for the single backend, I think.

d70-t commented 2 years ago

True.