JuliaCloud / GoogleCloud.jl

Google Cloud APIs for Julia
Other
43 stars 16 forks source link

Problems accessing publicly available Google Cloud Storage buckets. #36

Open Mobius1D opened 3 years ago

Mobius1D commented 3 years ago

Hello everyone,

I am a contributor at ReinforcementLearning.jl and we are working on enhancing the support for Offline Reinforcement Learning for which one of the main goals is to make publicly available datasets readily available natively. For that purpose I am creating a subpackage called RLDatasets.jl.

Some of the datasets are available publicly in GCS but I am not sure how to access public GCS buckets using the given API.

We would be accessing this dataset GCP bucket for instance. Since, I am relatively new to using Google Cloud Storage, it would be great if someone could shed light on this using the given instance.

Thanks in advance.

mhudecheck commented 2 years ago

Hi Mobius1D,

This is already possible. As long as you have permissions to access the bucket, you should be able to use GoogleCloud.jl the same way you would with a private bucket. See Sentinel.jl for an example.

using GoogleCloud
using JSON

# Set Credentials
creds = JSONCredentials(credentials)
session = GoogleSession(creds, ["devstorage.full_control"])
set_session!(storage, session)

# Set Bucket - We'll use Google's Sentinel 2 repository for now. See Sentinel.jl for how this works in action.
bucketName = "gcp-public-data-sentinel-2

# Get File List - You can set prefix = "folder/.../..." if you only want to retrieve files under a directory
rawFileList = GoogleCloud.storage(:Object, :list, bucketName; prefix="") 
io = IOBuffer()
write(io, rawFileList)
fileList = String(take!(io))
fileList = JSON.parse(fileList)

You can then download individual files by iterating through the file list, getting, if I remember correctly, the file["name"] key, and pushing it to GoogleCloud.storage(:Object, :get, bucketName, file["name"]). The process for handling IO is the same as above.