Closed robinatw closed 6 months ago
You can use boto3/botocore to access cellpainting-gallery
anonymously as follows:
import boto3
from botocore import UNSIGNED
from botocore.client import Config
s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED),region_name='us-east-1') #set your region
Small "folders" (prefixes) can then be directly listed as:
s3.list_objects_v2(
Bucket='cellpainting-gallery',
Prefix='cpg0000-jump-pilot/source_4/images/2020_11_04_CPJUMP1/illum/BR00116991/')
Large "folders" (prefixes) you'll need to use a paginator such as:
paginator = s3.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket='cellpainting-gallery', Prefix='cpg0000-jump-pilot/')
We now have https://github.com/broadinstitute/cpg/tree/main/cpgdata which will make indexing and finding files dramatically easier. I'll close this out now.
Requirement
Dear Community,
This is Robin from Novo Nordisk Pharmaceutical company.
Firstly, thanks to your contribution to project cellpainting-gallery.
As cellpainting-gallery is a public S3 bucket, We'd like to do some data analysis based the data you have on AWS S3.
I can list datasets with AWS CLI
aws s3 ls --no-sign-request s3://cellpainting-gallery/cpg0000-jump-pilot/
.seems it allows anonymous access the S3 bucket with AWS CLI , but when I access the S3 bucket via browser, it prompts me Access Denied.
http://s3.amazonaws.com/cellpainting-gallery/cpg0000-jump-pilot
We want to do that in a smart way, here we'd like to analyze data directly via s3 bucket instead of download datasets to local to analyze.
Because of limitation of local storage space, I'd like to know if it's possible to access the cellpainting-gallery's datasets via S3 REST API or S3 SDK for python (like boto3).
Let's take dataset cellpainting-gallery/cpg0000-jump-pilot as example, I'd like to get file size recursively through the whole dataset.
What prerequisites do I need to prepare for?
It would be great if you could tell me the detailed steps to implement it?
Best Regards, Robin