Closed rdbisme closed 1 year ago
Hi @rdbisme, thanks for making this PR!
Could you please add a test for the Windows path handling fix that you created? The would be failing with the current main
branch and successful with your branch.
I like the addition of the zip file system. Zip is a good example of where nested file systems are useful. In your PR, you are assuming a local filesystem as underlying filesystem. However, the zip files could also be stored on HTTP, S3, etc. It would be nice to come up with a solution that also works for these. We touched on this also in #60. Do you have any thoughts on that?
(maybe it would be easier just to expose the pathlib.Path protocol without subclassing?)
I think subclassing is very useful, because it makes typing and isinstance
easier. I am open suggestions for how to make the object creation easier to understand, though.
Would your implementation also work with nested zip files, e.g. zip://path/to/first.zip/another.zip/actual.file
?
@normanrz The current tests were already failing on Windows without the UPath._sanitize_path
addition.
Would your implementation also work with nested zip files, e.g. zip://path/to/first.zip/another.zip/actual.file?
I'll add a test to check this
I like the addition of the zip file system. Zip is a good example of where nested file systems are useful. In your PR, you are assuming a local filesystem as underlying filesystem. However, the zip files could also be stored on HTTP, S3, etc. It would be nice to come up with a solution that also works for these. We touched on this also in https://github.com/fsspec/universal_pathlib/pull/60. Do you have any thoughts on that?
Could we have this merged and maybe fix the nested usage in another PR if it doesn't work?
The current tests were already failing on Windows without the
UPath._sanitize_path
addition.
Not sure I understand. The tests run in our CI just fine, also on Windows. See https://github.com/fsspec/universal_pathlib/runs/7010368282?check_suite_focus=true
I like the addition of the zip file system. Zip is a good example of where nested file systems are useful. In your PR, you are assuming a local filesystem as underlying filesystem. However, the zip files could also be stored on HTTP, S3, etc. It would be nice to come up with a solution that also works for these. We touched on this also in #60. Do you have any thoughts on that?
Could we have this merged and maybe fix the nested usage in another PR if it doesn't work?
For the sake of stability, I'd rather not merge something that will needs to be changed with breaking changes shortly after. For example, we could choose to introduce a new syntax for s3 + zip paths (e.g. zip://actual.file::s3://bucket/path/to/file.zip
as in the fsspec
package). That would be quite different from the implementation that you are proposing and would require breaking changes that affect users.
The current tests were already failing on Windows without the
UPath._sanitize_path
addition.Not sure I understand. The tests run in our CI just fine, also on Windows. See https://github.com/fsspec/universal_pathlib/runs/7010368282?check_suite_focus=true
What I mean is that the tests I've added to test the ZipFileSystem
implementation were failing on my Windows machine (running under an MSYS
environment, i.e. Git for Windows
and similar environment). Let me check again if running them in Powershell would work instead.
For the sake of stability, I'd rather not merge something that will needs to be changed with breaking changes shortly after. For example, we could choose to introduce a new syntax for s3 + zip paths (e.g. zip://actual.file::s3://bucket/path/to/file.zip as in the fsspec package). That would be quite different from the implementation that you are proposing and would require breaking changes that affect users.
Does fsspec
support nested filesystems?
For the sake of stability, I'd rather not merge something that will needs to be changed with breaking changes shortly after. For example, we could choose to introduce a new syntax for s3 + zip paths (e.g. zip://actual.file::s3://bucket/path/to/file.zip as in the fsspec package). That would be quite different from the implementation that you are proposing and would require breaking changes that affect users.
Does
fsspec
support nested filesystems?
I haven't tried it, but the docs say so. I don't think it is trivial to implement that for UPath, though.
For example, what would UPath("zip://folder::s3://bucket/path/to/file.zip") / "actual.file"
resolve to? UPath("zip://folder/actual.file::s3://bucket/path/to/file.zip")
or UPath("zip://folder::s3://bucket/path/to/file.zip/actual.file")
.
Also, how can nested paths be constructed in a pathlib-y way? Perhaps, UPath("s3://bucket/path/to/file.zip") / "zip://folder/actual.file"
This should add support for a zip file
zip://path_to_my_file.zip
.I'm not sure if there's an easy way to rewrite correctly the paths, but the
UPath
object creation is quite difficult to understand (maybe it would be easier just to expose thepathlib.Path
protocol without subclassing?)There's also a fix for Windows path of the type
C:\\Users\\user\\file.zip
. They were handled incorrectly.