catalyst-cooperative / pudl-catalog

An Intake catalog for distributing open energy system data liberated by Catalyst Cooperative.
https://catalyst.coop/pudl/
MIT License
9 stars 2 forks source link

Make anonymous public data access work #5

Closed zaneselvans closed 2 years ago

zaneselvans commented 2 years ago

Right now none of the truly public data access methods seems to be functional, which defeats most of the purpose of publishing the data catalog. Several issues have come up:

Ideally we would be able to provide public access both via gcs:// (which seems to provide much more "filesystem" like access) and over https:// (which has much better support in generic download tools for the less cloud-literate).

Need to understand the intended patterns of usage with public cloud accessible data, and how to make the public resource as functional / convenient as it can be.

May also need to understand better how to limit the risk of a bajillion downloads costing us on data egress fees, which might mean going requester-pays.

zaneselvans commented 2 years ago

It turns out adding {"token": "anon"} to the storage_options and having the bucket objects be readable the the bucket itself be listable was enough.