allenai / cached_path

A file utility for accessing both local and remote files through a unified interface.
https://cached-path.readthedocs.io/
Apache License 2.0
35 stars 11 forks source link

feature: partially download ZIPs #240

Open jeswr opened 2 months ago

jeswr commented 2 months ago

Is your feature request related to a problem? Please describe. I don't want to have to use the network bandwith required to download an entire zip when I only want to access one file from it.

Describe the solution you'd like I new kwarg on the cached_path function to choose to only download the requested file in a zip folder

cached_path("https://example.org/file.zip!file/path.txt", extract_archive=True, single_file=True)

Describe alternatives you've considered Using remoteZip which "provides a way to access single members of a zip file archive", but does not have the file caching functionality of this library out-of-the box. I would suggest that remoteZip could be used under the hood to implement this feature.