Closed joouha closed 9 months ago
Hi @joouha
Currently we throw away query and fragments here: https://github.com/fsspec/universal_pathlib/blob/5d844b274d058819af6f900cd190688251a15a3b/upath/core.py#L179
But since only the first argument to UPath is parsed, there's one way to construct the path right now.
>>> upath.UPath("http://example.com/a")
HTTPPath('http://example.com/a')
>>> upath.UPath("http://example.com/", "a?q=123")
HTTPPath('http://example.com/a?q=123')
but joining on this doesn't work correctly, since it just treats it like a filename.
I think the correct way to fix this is adding query and fragment in https://github.com/fsspec/universal_pathlib/blob/5d844b274d058819af6f900cd190688251a15a3b/upath/core.py#L213 if they are present.
And then there needs to be some code that updates query and fragments whenever we join paths. (to do joins correctly this will also require fixing #88)
Would you like to work on a PR? I'll only have time to work on this in about 2-3 weeks from now.
Cheers, Andreas
@ap-- I have an urgent use case for this.
We have a commercial project that tracks different versions of S3 objects using a query parameter::
"s3://udp-spec/manual/force/README.md?versionId=B0zNSMELW__87yfYtZcfcdBw3qxHkFhm"
I know s3.open
can handle it:
fo_old_version = s3.open('versioned_bucket/object', version_id='SOMEVERSIONID')
The naive solution is to pass any query parameters as kwargs to open, but I don't know if that might cause problems for others.
Should I just create a PR against this issue solve my use case, then work with you to figure out how to generalize it?
Or would it be wiser to define a general mechanism for passing "open" parameters, and manually extract the query on my side first?
Hi @drernie,
The issue you are linking to seems to be in a private repository.
upath should already allow you to do this, if you are willing to handle the query parameter yourself:
from upath import UPath
pth = UPath("s3://udp-spec/manual/force/README.md", version_aware=True)
with pth.open(version_id="B0zNSMELW__87yfYtZcfcdBw3qxHkFhm") as f:
data = f.read()
Let me know if that helps!
I'm currently working on a larger refactor to address a few other important issues, and would then address a common way for handling query parameters. If you'd like to contribute, I would ping you once that is ready.
Cheers, Andreas
Ouch. I just realized that
UPath("s3://udp-spec/manual/force/README.md?versionId=B0zNSMELW__87yfYtZcfcdBw3qxHkFhm")
drops the query string.
This seems to imply I have to manually track the version ID everywhere. Is that correct, or is there a way to attach attributes to the UPath?
Did this get somehow addressed in #125 ?
When opening a resource using universal pathlib, the query component of the URI is dropped, resulting in the wrong resource being opened.
For example, running the following:
makes a GET request for
https://duckduckgo.com/
without the query parameter, giving unexpected output.