epwalsh / rust-cached-path

🦀 Rust utility for accessing both local and remote files through a unified interface
Apache License 2.0
29 stars 14 forks source link

Determination of archive format #68

Open eggyal opened 1 year ago

eggyal commented 1 year ago

I see that cached-path currently determines how to extract an archive according to its filename extension:

https://github.com/epwalsh/rust-cached-path/blob/db8cafb061ec1ff561747026f5db4317bfbaff7d/src/archives.rs#L17-L23

The problem that I have is that some archives do not use the expected extension format (in my case, gzipped tarballs are using .tgz rather than .tar.gz). While this could be addressed by expanding/customising the extension list used by cached-path, perhaps it's also an opportunity to consider some alternative approaches:

Personally I feel that HTTP headers would be best (if available: obviously not the case for local resources), perhaps falling-back to magic and/or file extensions if no other option is available.

Happy to submit a PR with whatever approach you feel is most suitable for this library, even if only adding .tgz to existing extension list?

epwalsh commented 1 year ago

Hey @eggyal, I would definitely accept a PR for this. I like the idea of using HTTP headers, so I think that should be the first priority. It would also be nice to allow the user to directly specify the format, so if that's straightforward enough to do in the same PR, please go ahead. I'm not opposed to detection "magic" as a fallback as well... that could always be an optional feature of this crate.