Closed bbockelm closed 5 months ago
Forgot to mention -- here's the xrootd configuration in use:
s3.begin
s3.path_name /test
s3.bucket_name genome-browser
s3.service_name s3.amazonaws.com
s3.region us-east-1
s3.service_url https://s3.us-east-1.amazonaws.com
s3.url_style virtual
s3.end
Further testing shows this also breaks more significantly. With a minimal config of:
all.export /
xrd.protocol http:8443 libXrdHttp.so
ofs.osslib /workspaces/pelican_xrootd_s3/xrootd-s3-http/build/libXrdS3.so
xrootd.async off
s3.url_style path
s3.begin
s3.path_name /aws-opendata
s3.service_name s3
s3.region us-east-1
s3.service_url https://s3.us-east-1.amazonaws.com
s3.end
The first GET test I usually run results in a segfault. From Curl:
curl -v http://`hostname`:8443/aws-opendata/noaa-wod-pds/MD5SUMS
* Trying 172.17.0.4:8443...
* Connected to 0f6eebf9123e (172.17.0.4) port 8443 (#0)
> GET /aws-opendata/noaa-wod-pds/MD5SUMS HTTP/1.1
> Host: 0f6eebf9123e:8443
> User-Agent: curl/7.76.1
> Accept: */*
>
* Empty reply from server
* Closing connection 0
curl: (52) Empty reply from server
And from the server:
s3_Stat: Stat'ing path /aws-opendata/noaa-wod-pds/MD5SUMS
240524 20:07:02 68098 s3_SendRequest: Sending HTTP request https://.s3.us-east-1.amazonaws.com/?delimiter=%2F&list-type=2&max-keys=1000&prefix=noaa-wod-pds%2FMD5SUMS
240524 20:07:02 68098 s3_Stat: Failed to stat path /aws-opendata/noaa-wod-pds/MD5SUMS; response code 0
240524 20:07:02 68098 ofs_stat: unknown.1:29@0f6eebf9123e Unable to locate /aws-opendata/noaa-wod-pds/MD5SUMS; input/output error
240524 20:07:02 68098 http_Req: XrdHttpReq::Error
240524 20:07:02 68098 unknown.1:29@0f6eebf9123e http_Req: PostProcessHTTPReq req: 2 reqstate: 0 final_:False
240524 20:07:02 68098 unknown.1:29@0f6eebf9123e http_Req: PostProcessHTTPReq mapping Xrd error [3007] to status code [500]
240524 20:07:02 68098 unknown.1:29@0f6eebf9123e http_Protocol: Process. lp:(nil) reqstate: 0
240524 20:07:02 68098 unknown.1:29@0f6eebf9123e http_Req: No checksum requested; skipping to request state 2
240524 20:07:02 68098 unknown.1:29@0f6eebf9123e http_Protocol: Process is exiting rc:0
240524 20:07:02 68098 unknown.1:29@0f6eebf9123e ofs_open: 0-600 (600) fn=/aws-opendata/noaa-wod-pds/MD5SUMS
240524 20:07:02 68098 s3_S3File::Open: Opening file /aws-opendata/noaa-wod-pds/MD5SUMS
Segmentation fault (core dumped)
Are the new URL queries supposed to be added on a basic GET like this? It also looks like the URL generated in s3_SendRequest
isn't coming out correctly, because exporting an entire S3 endpoint requires not setting a bucket.
Yup - my testing was all for specifying a bucket in the s3.begin ... s3.end
block. I can go back and test that with your reproducer.
Can you confirm that if you specify a single bucket it works though?
Hardcoding the bucket with the config:
s3.url_style path
s3.begin
s3.path_name /aws-opendata
s3.bucket_name noaa-wod-pds
s3.service_name s3
s3.region us-east-1
s3.service_url https://s3.us-east-1.amazonaws.com
s3.end
still produces a segfault when I curl -v http://$HOSTNAME:8443/aws-opendata/MD5SUMS
. I'm also noticing that s3_SendRequest: Sending HTTP request https://noaa-wod-pds.s3.us-east-1.amazonaws.com/?delimiter=%2F&list-type=2&max-keys=1000&prefix=noaa-wod-pds%2FMD5SUMS
is constructing a virtual bucket URL instead of the configured path-style url.
I'm back from vacation. Let me know if I can help with any of this, but I don't want to duplicate work.
@rw2, definitely! Do you want to see if you can start stitching up some of the missing pieces to get this working? We have a lot of interest in this right now.
ok, working on it now. I'll ping you on slack if I have questions.
@rw2 - any updates on this? I'd like to get the directory listing wrapped into this month's release, which nominally happens on Thursday.
Last Thu/Fri: Wouldn't compile for me, which was weird. There must be something different in our environments. Fixed compile. Reproduced it. Got rid of the segfault, but it was a symptom, so the file didn't download. Couldn't work on it yesterday, back at it again this morning.
@jhiemstrawisc -- ready for re-review!
Beyond fixing the original branch (and some patches from @rw2), this also adds unit tests for the listing functionality and pre-commit settings (so I stop getting caught by the clang-format
linter)!
Implements the
ListObjectsV2
query against the S3 endpoint, allowing XRootD to interpret a directory-like structure in S3 as a XRootD directory.This has been lightly tested against public S3 buckets using
curl
queries like this:where the query file is as follows:
It is the first use of query parameters in the signature creation code -- that might need some attention. Also did some modest refactoring around how the requests are passed (mostly to avoid repeating all the configuration strings). Needs to have path-style URLs tested and simple GET/PUT of data to look for regressions.