fsspec / filesystem_spec

A specification that python filesystems should adhere to.
BSD 3-Clause "New" or "Revised" License
1.05k stars 362 forks source link

Recursive get does not put the files in the right subdirs #1741

Open MitchellAcoustics opened 4 weeks ago

MitchellAcoustics commented 4 weeks ago

Thanks for the package, it's very helpful!

I'm having an issue using the get command to recursively copy a directory, its subdirs, and included files from a Github repo to local. Although fsspec finds all of the files, with their correct path and lists them with find, when running get, it creates the relevant subdirs, but then copies the files to the parent directory.

I'm trying to copy the contents and structure of this folder from Github to a local folder: https://github.com/MitchellAcoustics/JASAEL-HowToAnalyseQuantiativeSoundscapeData/tree/main/_freeze/paper

Running:

fs = fsspec.filesystem("github", org="MitchellAcoustics", repo="JASAEL-HowToAnalyseQuantiativeSoundscapeData", ref="main")
fs.find("_freeze/paper", withdirs=True)

finds everything just fine

['_freeze/paper',
 '_freeze/paper/execute-results',
 '_freeze/paper/execute-results/html.json',
 '_freeze/paper/execute-results/tex.json',
 '_freeze/paper/figure-html',
 '_freeze/paper/figure-html/fig-circ-output-1.png',
 '_freeze/paper/figure-html/fig-circ-output-2.png',
 '_freeze/paper/figure-html/fig-circ-output-3.png',
 '_freeze/paper/figure-html/fig-circ-output-4.png',
 '_freeze/paper/figure-pdf',
 '_freeze/paper/figure-pdf/fig-circ-output-1.pdf',
 '_freeze/paper/figure-pdf/fig-circ-output-2.pdf',
 '_freeze/paper/figure-pdf/fig-circ-output-3.pdf',
 '_freeze/paper/figure-pdf/fig-circ-output-4.pdf']

But running get:

fs.get(fs.ls("_freeze/paper"), "~/Documents/Trials/embedded_paper/", recursive=True) # or with fs.get(fs.find(..., withdirs=True), ...)

results in this:

embedded_paper
├── execute-results
├── fig-circ-output-1.pdf
├── fig-circ-output-1.png
├── fig-circ-output-2.pdf
├── fig-circ-output-2.png
├── fig-circ-output-3.pdf
├── fig-circ-output-3.png
├── fig-circ-output-4.pdf
├── fig-circ-output-4.png
├── figure-html
├── figure-pdf
├── html.json
└── tex.json

Where the subdirs (execute-results, figure-html, figure-pdf) are created, but left empty, and what should have been put in them are just copied into the main directory.

Is this a bug, or should I be doing something else to make sure the files are placed in the correct subdirectories?

martindurant commented 4 weeks ago

This is functioning correctly, with behaviour copied from command-line cp. If you supply a list of concrete paths (files), then they will all appear in the target directory at the root level. To copy the directory tree, supply the root path name with recursive, and fsspec will find all the paths for you:

fs.get("_freeze/paper", "~/Documents/Trials/embedded_paper/", recursive=True)

what should have been put in them are just copied into the main directory

This does sound odd.

(it's worth noting that git clone or the ZIP download may be faster for this particular operation).

MitchellAcoustics commented 3 weeks ago

Ah, thank you! Both of your suggestions were very helpful. I think I was including the fs.ls(...) because that was included in the original suggestion I saw to use fsspec. The documentation for .get confused me a bit for how to include the path, but of course it's very simple!

But yes, as you said, directly using git was much faster.