allenai / cached_path

A file utility for accessing both local and remote files through a unified interface.
https://cached-path.readthedocs.io/
Apache License 2.0
36 stars 12 forks source link

Make `extract_archive=True` the default #214

Open bryant1410 opened 9 months ago

bryant1410 commented 9 months ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

cached_path is great because I can take a file path or a URL, seamlessly. However, if the file referenced by a URL is a file inside an archive file (e.g., a ZIP file), I have to pass extract_archive=True. I have to do this for every function call that may potentially receive a file inside another one, which hurts the "seamlessly" aspect.

At the same time, extract_archive=True because the same as if it was False if the URL has no exclamation sign ("!") in it.

Describe the solution you'd like

I believe we should switch to extract_archive=True by default, as it behaves to automatically detect if it's an archive or not. URLs won't have this symbol to indicate anything else, as it'd otherwise be URL-encoded ("%21").

I have used this behavior for a while on my own fork, and it works without any issues.

Describe alternatives you've considered

I haven't considered other alternatives.

dirkgr commented 8 months ago

This sounds good to me. @epwalsh, any concerns?

epwalsh commented 8 months ago

Yea I'm okay with this!

bryant1410 commented 8 months ago

Okay! I can put together a PR within a few days.