broadinstitute / cellpainting-gallery

Cell Painting Gallery
https://broadinstitute.github.io/cellpainting-gallery/
MIT License
51 stars 8 forks source link

Add dataset size to README #50

Closed ErinWeisbart closed 7 months ago

ErinWeisbart commented 11 months ago

Would be nice to have approximate size of datasets (maybe list image and numerical data sizes separately) in the Available Datasets table in the README so folks wanting to use the datasets have some idea of what they are getting themselves into...

ErinWeisbart commented 11 months ago

@shntnu do you agree with this? (or at least not disagree?) do you have dataset sizes somewhere? (do we have a console view of size by prefix?)

shntnu commented 10 months ago

@shntnu do you agree with this? (or at least not disagree?)

I agree but it might be a bit of a lift (see below)

do you have dataset sizes somewhere? (do we have a console view of size by prefix?)

We have this https://broad.io/cpgdash which is configured using this https://github.com/jump-cellpainting/cellpainting-gallery-config/blob/f907ef931bb7b6e13400447f3e4244c7a0eb56e3/dashboard/dashboard_stack.py

IIRC I couldn't find a metric that would report total size or number of files. But I didn't poke around much either.

ErinWeisbart commented 10 months ago

It's pretty simple to filter request metrics by prefix (see AWS docs) in case we want get/put/list etc. metrics.

I think I found how to use Storage Lens to give us per-prefix sizes (see AWS docs. It takes 48 hrs for it to report so hopefully I did it correctly and I'll add sizes next week :)

shntnu commented 10 months ago

Very cool!

I don't know how easy it is to do that using CDK (because then we can do it easily for every prefix) but this is good for now

AnneCarpenter commented 7 months ago

bump! Would love to have this info public, even if it's just an estimate.

ErinWeisbart commented 7 months ago

https://github.com/broadinstitute/cellpainting-gallery/pull/52 will address this