drivendataorg / cloudpathlib

Python pathlib-style classes for cloud storage services such as Amazon S3, Azure Blob Storage, and Google Cloud Storage.
https://cloudpathlib.drivendata.org
MIT License
458 stars 55 forks source link

glob does not work on Python < 3.7.6 #210

Closed remi-braun closed 2 years ago

remi-braun commented 2 years ago

Hello,

I tried your promising 0.7.0 version this morning, however, the glob function breaks for me now:

directory = S3Path('s3://sertit-extracteo-ci/water')
directory.glob('*S2*_MSIL2A*')

throws:

self = S3Path('s3://sertit-extracteo-ci/water'), pattern = '*S2*_MSIL2A*'

    def glob(self, pattern):
        self._glob_checks(pattern)

        pattern_parts = PurePosixPath(pattern).parts
>       selector = _make_selector(tuple(pattern_parts), _posix_flavour)
E       TypeError: _make_selector() takes 1 positional argument but 2 were given

/usr/local/lib/python3.7/dist-packages/cloudpathlib/cloudpath.py:344: TypeError

I'm on docker, debian and cloudpathlib is installed through pip.

jayqi commented 2 years ago

Hi @remi-braun, what version of Python 3.7 are you using? I suspect that there was a change to pathlib internals, and this error is caused by compatibility. I'm not certain from a quick look at the python/cpython source code, but I think maybe 3.7.5+ or 3.7.6+ may resolve this? Our latest tests ran 3.7.12, so I would expect at least that should work.


Noting for future debugging that this is the commit that changed the _make_selector signature: https://github.com/python/cpython/commit/175abccbbfccb2f6489dc5c73f4630c1b25ce504#

remi-braun commented 2 years ago

Yes you're right, I am using the version 3.7.3, provided by the command line apt-get python3.7 Do you know how to download a newer version of Python 3.7 on Debian ?

jayqi commented 2 years ago

I'm not a power user of Linux package managers, but some things to try or consider:

Otherwise, I don't really know how to help off of the top of my head. In any case, thanks for bringing this error to our attention. We'll definitely either want to support these older versions or put in version minimums.

remi-braun commented 2 years ago

Yes sadly :( 3.7.3 is the last maintained 3.7 version for Debian. I discarded the official Python images for paths reasons, but I think I will try them again.

remi-braun commented 2 years ago

Switching to 3.7.12 solves this 👍

cedricdonie commented 2 years ago

I also get the same error also for GSPath with version 0.6.5 on Python 3.6.15 (Ubuntu 20.04.4 LTS) through the following minimum working example. Interestingly, only iterating through the generator yields the error.

>>> from cloudpathlib import GSPath
>>> p = GSPath("gs://blablabla/bla")
>>> p
GSPath('gs://blablabla/bla')
>>> p.glob("*.txt")
<generator object CloudPath.glob at 0x7f1ed54af888>
>>> list(p.glob("*.txt"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/cedric/.venvs/mjff-ldopa/lib/python3.6/site-packages/cloudpathlib/cloudpath.py", line 344, in glob
    selector = _make_selector(tuple(pattern_parts), _posix_flavour)
TypeError: _make_selector() takes 1 positional argument but 2 were given

This is fixed if I pip install git+https://github.com/drivendataorg/cloudpathlib.git@cb031d3987ffdd9b751169b01f44ac793f1d847b instead to get the fix from https://github.com/drivendataorg/cloudpathlib/pull/202.

pjbull commented 2 years ago

Thanks for reporting @cedricdonie. Did you try the latest version on PyPI as well (0.9.0)? That also has the fix.

That said, we no longer support Python 3.6 since it is end-of-life.

pjbull commented 2 years ago

This is a won't fix. The glob logic is too complicated to try to fix this just for these old versions of Python. Plus, it looks like current Debian stable and Ubuntu LTS releases have compatible Python versions.

anishazaveri commented 1 year ago

Hi,

I have the same issue but I'm using Python 3.8

cp = CloudPath("s3://compteam/anisha_zaveri/2022_09_06_ab_coupled_vae/out/fitted_models/04ffd59445a14ab0952aa4dcfdc0c6e6/", client=client)
list(cp.glob('models/*.ckpt'))

yields

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [37], in <cell line: 2>()
      1 cp = CloudPath("s3://compteam/anisha_zaveri/2022_09_06_ab_coupled_vae/out/fitted_models/04ffd59445a14ab0952aa4dcfdc0c6e6/", client=client)
----> 2 list(cp.glob('models/*.ckpt'))

File ~/anaconda3/envs/coupled_vae/lib/python3.8/site-packages/cloudpathlib/cloudpath.py:353, in CloudPath.glob(self, pattern)
    350 self._glob_checks(pattern)
    352 pattern_parts = PurePosixPath(pattern).parts
--> 353 selector = _make_selector(tuple(pattern_parts), _posix_flavour)
    355 yield from self._glob(selector)

TypeError: _make_selector() takes 1 positional argument but 2 were given

Cloudpathlib version = 0.10.0 Python version = 3.8.0

pjbull commented 1 year ago

@anishazaveri Can you upgrade to a more recent release of 3.8.x? I believe 3.8.16 is the latest, but anything reasonably recent should work (I believe even 3.8.1 of 3.8.2 will work, but you likely want the security patches in more recent versions).