carpenterlab / 2017_Goldsborough_MLCB

CytoGAN: Generative Modeling of Cell Images
https://www.biorxiv.org/content/10.1101/227645v1
0 stars 0 forks source link

Access to dataset used in the paper #1

Open shntnu opened 2 years ago

shntnu commented 2 years ago

Alessandro Palma asked:

Since I am dealing with generative modeling, I am mostly using the CytoGAN paper as a reference, together with other publications employing the dataset. I also tried to employ cProfiler to perform cell segmentation on the dataset, but I wasn’t able to make it work remotely on my servers. For now, I am working with 96x96x3 crops of the plates which can contain more than one cell (not an ideal condition). I was Therefore wondering if the CytoGAN dataset is available anywhere online for direct usage.

Alessandro, I suppose you are already familiar with this (primary) resource? https://bbbc.broadinstitute.org/BBBC021 but that you are looking for the processed version?

allepalma commented 2 years ago

Hi,

Thanks very much for your fast answer. Yes, exactly. What I am looking for is a segmentation mask for the cells in BBBC021. I have tried to get the CellProfiler projects in the repository to work, but the set up does not seem to match the names of the downloaded files. I also could not manage to utilize the software on remote servers. For now I am using 96x96x3 crops as in the uploaded image. However I would definitely benefit for exact single-cell outlines like the ones in the CytoGAN paper.

Is there any public binary mask for BBBC021?

Thank you again for your support and for the great datasets!

image

shntnu commented 2 years ago

Sounds good

These files are available internally at s3://imaging-platform/projects/dp_treatment-classification_az/

We will need to move them to a publicly accessible location and then ensure that all the contents can be made available as is.

I can't promise a fast turnaround but I'll keep this on my list.

Update: we should move it to s3://cellpainting-gallery/cpg0010-caie-drugresponse/workspace/deep_learning

python3 restore_intelligent.py imaging-platform projects/dp_treatment-classification_az/dp-project/  --max_workers 8 --logfile dp-project_log.csv

source=s3://imaging-platform/projects/dp_treatment-classification_az/dp-project/ 
destination=s3://cellpainting-gallery/cpg0010-caie-drugresponse/workspace/deep_learning

aws s3 sync \
  --quiet \
  --profile jump-cp-role \
  --acl bucket-owner-full-control \
  --request-payer requester \
  --metadata-directive REPLACE \
  ${source} \
  ${destination}  

aws s3 ls --recursive $source|wc -l

aws s3 ls --recursive $destination|wc -l

Similarly

python3 restore_intelligent.py imaging-platform projects/dp_treatment-classification_az/workspace/analysis/ljosa_2013/  --max_workers 8 --logfile analysis_log.csv

source=s3://imaging-platform/projects/dp_treatment-classification_az/workspace/analysis/ljosa_2013/
destination=s3://cellpainting-gallery/cpg0010-caie-drugresponse/workspace/analysis/ljosa_2013/

aws s3 sync \
  --quiet \
  --profile jump-cp-role \
  --acl bucket-owner-full-control \
  --request-payer requester \
  --metadata-directive REPLACE \
  ${source} \
  ${destination}  

aws s3 ls --recursive $source|wc -l

aws s3 ls --recursive $destination|wc -l
python3 restore_intelligent.py imaging-platform projects/dp_treatment-classification_az/workspace/load_data_csv/ljosa_2013/  --max_workers 8 --logfile load_data_csv_log.csv

source=s3://imaging-platform/projects/dp_treatment-classification_az/workspace/load_data_csv/ljosa_2013/
destination=s3://cellpainting-gallery/cpg0010-caie-drugresponse/workspace/load_data_csv/ljosa_2013/

aws s3 sync \
  --quiet \
  --profile jump-cp-role \
  --acl bucket-owner-full-control \
  --request-payer requester \
  --metadata-directive REPLACE \
  ${source} \
  ${destination}  

aws s3 ls --recursive $source|wc -l

aws s3 ls --recursive $destination|wc -l
python3 restore_intelligent.py imaging-platform projects/dp_treatment-classification_az/workspace/backend/ljosa_2013/  --max_workers 8 --logfile backend_csv_log.csv

source=s3://imaging-platform/projects/dp_treatment-classification_az/workspace/backend/ljosa_2013/
destination=s3://cellpainting-gallery/cpg0010-caie-drugresponse/workspace/backend/ljosa_2013/

aws s3 sync \
  --quiet \
  --profile jump-cp-role \
  --acl bucket-owner-full-control \
  --request-payer requester \
  --metadata-directive REPLACE \
  ${source} \
  ${destination}  

aws s3 ls --recursive $source|wc -l

aws s3 ls --recursive $destination|wc -l
python3 restore_intelligent.py imaging-platform projects/dp_treatment-classification_az/workspace/images  --max_workers 8 --logfile images_log.csv

source=s3://imaging-platform/projects/dp_treatment-classification_az/workspace/images
destination=s3://cellpainting-gallery/cpg0010-caie-drugresponse/broad-az/images

parallel \
  --dry-run \
  aws s3 sync \
  --quiet \
  --profile jump-cp-role \
  --acl bucket-owner-full-control \
  --request-payer requester \
  --metadata-directive REPLACE \
  ${source}/Week{1}/ \
  ${destination}/Week{1}/images/ ::: 1 2 3 4 5 6 7 8 9 10

aws s3 ls --recursive $source|wc -l

aws s3 ls --recursive $destination|wc -l

For our notes:

This should be the same (processed version of the) dataset that we used in all these publications

  1. Caicedo JC, McQuin C, Goodman A, Singh S, & Carpenter AE (2018). Weakly Supervised Learning of Single-Cell Feature Embeddings. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 9309–9318 / doi. pdf. PMID: 30918435. PMCID: PMC6432648 (Conference paper)
  2. Goldsborough P, Pawlowski N, Caicedo JC, Singh S, Carpenter AE (2017). CytoGAN: Generative Modeling of Cell Images. Workshop on Machine Learning in Computational Biology, Neural Information Processing Systems (NeurIPS). bioRxiv. p. 227645 / doi. pdf. (Conference paper)
  3. Pawlowski N, Caicedo JC, Singh S, Carpenter AE, Storkey A (2016). Automating Morphological Profiling with Generic Deep Convolutional Networks. Neural Information Processing Systems (NeurIPS) MLCB Workshop 2016 Conference / doi. pdf. PMCID: N/A (Conference paper)
allepalma commented 2 years ago

Thank you again for your help!

shntnu commented 2 years ago

@allepalma The files are now available at s3://cellpainting-gallery/cpg0010-caie-drugresponse/

Unfortunately, the documentation is pretty sparse so you'd need to figure out the structure yourself (and please add notes to this issue in case you have any clarifying notes for future users)