Closed shobsi closed 7 months ago
duplicate: https://github.com/fsspec/s3fs/issues/862
https://github.com/fsspec/filesystem_spec/pull/1551 allowed you to pass expand=True
, to enforce finding the first matching file when expecting only one file from an open() with a globstring. This would match the old behaviour, which was unintended.
@Skylion007 , it seems like your use case of paths you do NOT want expanded may be the in the minority. I may change the default value of expand=
to match the previous code-path, and then in your code you would need to pass expand=False
explicity. Thoughts?
Just to add another datapoint to this. I have been impacted by this change when reading within duckdb
. A way to globally revert to the previous behavior would be greatly appreciated.
This actually a major footgun in pandas right now. If I understand correctly, this glob behavior actually will have different behavior with local file filesystems and fsspec.open (or at least with other libraries like dask). This behavior before was undocumented and a bug, it also mean you just couldn't open certain files before. I know this is a breaking change, but it's breaking behavior that really shouldn't be supported. fsspec.open is suppose to be mostly a dropin replacement for builtins.open and therefore they should share similar semantics.
If pandas want's to support this by default, they are welcome to change their use of the APIs to opt-in to this behavior by default.
I think it's fair to say that pandas will make not changes on our behalf, it's up to us to decide what the most expected and most useful behaviours are.
Yeah, I would say the special glob characters are actually quite common file[train].csv
is a common convention for instance
https://github.com/googleapis/python-bigquery-dataframes has a dependency on fsspec. We started noticing our tests are failing since version 2024.3.1:
This is how the initial environment looks like
pandas.read_csv works as expected
install gcsfs version 2024.3.1
rerun the same command, it fails now