PelicanPlatform / pelican

The Pelican Platform for creating data federations
https://pelicanplatform.org/
Apache License 2.0
10 stars 24 forks source link

Move s3 path style into s3.begin/s3.end blocks #1601

Closed jhiemstrawisc closed 3 days ago

jhiemstrawisc commented 1 week ago

It looks like the path style directive is scoped to each s3.begin/s3.end block, which is why the previous declaration failed for multiple buckets. Technically it would succeed if all the buckets were supposed to use virtual paths (the default), but in the path-style case, only the first bucket inherited the value. Other buckets reverted to the default.

This commit scopes the path style to each origin export to guarantee each of the request URLs generated by XRootD follow the same bucket convention.

The easiest way to test this is probably by starting an Origin with full debug logging at the origin/xrootd level and point it at two public AWS buckets. Here's origin config I used:

Logging:
  Level: debug
  Origin:
    Xrootd: trace
    Http: debug
Origin:
  S3UrlStyle: "path"
  S3Region: "us-east-1"
  S3ServiceUrl:  https://s3.us-east-1.amazonaws.com
  StorageType: "s3"
  Exports:
    - FederationPrefix: "/aws-opendata/noaa-wod-pds"
      S3Bucket: "noaa-wod-pds"
      Capabilities: ["PublicReads", "Listings", "DirectReads"]
    - FederationPrefix: "/aws-opendata/genome-browser"
      S3Bucket: "genome-browser"
      Capabilities: ["PublicReads", "Listings", "DirectReads"]

From there, curl one object from each namespace. Two objects for testing are

/noaa-wod-pds/MD5SUMS
/genome-browser/htdocs/.welcome.msg

You can then search through the origin log for the generated URLs. Prior to this commit, you'd find that the object request for the first listed bucket (in this case noaa-wod-pds) generates the URL https://s3.us-east-1.amazonaws.com/noaa-wod-pds/MD5SUMS while the object request for any subsequent buckets reverts to virtual-style, e.g. https://genome-browser.s3.us-east-1.amazonaws.com/htdocs/.welcome.msg

After the change, both of these URLs should have the form https://s3.us-east-1.amazonaws.com/<bucket>/<object>

Closes #1561