fatiando / pooch

A friend to fetch your data files
https://www.fatiando.org/pooch
Other
620 stars 74 forks source link

pooch.make_registry does not handle spaces in filenames correctly #369

Open mscheltienne opened 1 year ago

mscheltienne commented 1 year ago

I'm running v1.7.0, and it seems like this is not fully resolved. Filename: "LICENSE (copy)", with a space.

Run: pooch.make_registry(DATASET, output=REGISTRY, recursive=True) with DATASET the path to the folder containing LICENSE (copy) and more. It results in this registry:

LICENSE 1f7e3edb3fc584df7f618e17cef4befa18232b99302dd8d55e519ab3d9e028b4
LICENSE (copy) 1f7e3edb3fc584df7f618e17cef4befa18232b99302dd8d55e519ab3d9e028b4
ssp/ssp_0_230120.fif b34bd4c97e8e32854c40dd01025bb5f236f6def3cf77fbb16b8e4c0f0c3e69d1
ssp/ssp_0_ias_230120.fif ba8d51427be963268884f6dc2b994b494268e1e80648aaf8cffca2a01240cb19
ssp/ssp_60_230120.fif 9465dcabfb9808b13eced54a644b9104f3d5d8b348bb4315f76dd572eff35ef2
ssp/ssp_60_ias_230120.fif 14f55a49ae8fba62c868fb117daf902f32981cecf7bc8675a4e21e44018d9ce8
ssp/ssp_68_230120.fif c6b5f56a30582b930e5c83f4404c82579929fde8cc650d062dc33429e95ebc4f
ssp/ssp_68_ias_230120.fif 379ea05dc48869c57368d354f17e65e943d555322d63b3b5caba3ae7221374cf
version.txt e2556a181068db2c7e3b2b127de33540448820fb1e97da29239833b6a8e09764

And now fetcher.load_registry(REGISTRY) with fetcher being an instance of Pooch yields:

fetcher.load_registry(REGISTRY)
fetcher.registry

>>>
{'LICENSE': '(copy)',
 'ssp/ssp_0_230120.fif': 'b34bd4c97e8e32854c40dd01025bb5f236f6def3cf77fbb16b8e4c0f0c3e69d1',
 'ssp/ssp_0_ias_230120.fif': 'ba8d51427be963268884f6dc2b994b494268e1e80648aaf8cffca2a01240cb19',
 'ssp/ssp_60_230120.fif': '9465dcabfb9808b13eced54a644b9104f3d5d8b348bb4315f76dd572eff35ef2',
 'ssp/ssp_60_ias_230120.fif': '14f55a49ae8fba62c868fb117daf902f32981cecf7bc8675a4e21e44018d9ce8',
 'ssp/ssp_68_230120.fif': 'c6b5f56a30582b930e5c83f4404c82579929fde8cc650d062dc33429e95ebc4f',
 'ssp/ssp_68_ias_230120.fif': '379ea05dc48869c57368d354f17e65e943d555322d63b3b5caba3ae7221374cf',
 'version.txt': 'e2556a181068db2c7e3b2b127de33540448820fb1e97da29239833b6a8e09764'}

Which is wrong.

Originally posted by @mscheltienne in https://github.com/fatiando/pooch/issues/315#issuecomment-1655320564

leouieda commented 11 months ago

Thanks for reporting @mscheltienne! We completely forgot to update the registry writing logic after updating the reading code. Sorry about that.

The fix should be relatively simple: Use shlex.quote on the file name we write out to the file. We'd need to add a test for this case that fails currently to prevent this issue from coming back in the future.