Closed ErinWeisbart closed 7 months ago
@shntnu do you agree with this? (or at least not disagree?) do you have dataset sizes somewhere? (do we have a console view of size by prefix?)
@shntnu do you agree with this? (or at least not disagree?)
I agree but it might be a bit of a lift (see below)
do you have dataset sizes somewhere? (do we have a console view of size by prefix?)
We have this https://broad.io/cpgdash which is configured using this https://github.com/jump-cellpainting/cellpainting-gallery-config/blob/f907ef931bb7b6e13400447f3e4244c7a0eb56e3/dashboard/dashboard_stack.py
IIRC I couldn't find a metric that would report total size or number of files. But I didn't poke around much either.
Very cool!
I don't know how easy it is to do that using CDK (because then we can do it easily for every prefix) but this is good for now
bump! Would love to have this info public, even if it's just an estimate.
https://github.com/broadinstitute/cellpainting-gallery/pull/52 will address this
Would be nice to have approximate size of datasets (maybe list image and numerical data sizes separately) in the
Available Datasets
table in the README so folks wanting to use the datasets have some idea of what they are getting themselves into...