Closed ErinWeisbart closed 10 months ago
I think I have cmQTL completely transferred to cpg. I am unable to run data-validation. I'm assuming that it is not worth the time for me to dig too deeply into it given that the validation script is currently written to look for JUMP-specific naming/structure? I have looked carefully at what has transferred and confirmed that everything that was in the imaging-platform bucket has moved to the cellpainting-gallery bucket and I moved all the profiles I found in https://github.com/broadinstitute/cmQTL to the gallery as well.
The one thing that I will flag is that batch PILOT_1
does have images, analysis files, and backends but I was unable to find profiles for them. In the cmQTL repo, the folder for pilot data the plates listed are those from 2019_05_13_Batch2
NOT PILOT_1
.
@shntnu Thoughts?
I agree – no need to run data validation at this stage because the code is looking for JUMP-specific columns in the metadata (should be a simple fix to generalize at this point but let's punt on that).
Regarding PILOT_1
, the profiles are in backend
aws s3 ls s3://cellpainting-gallery/cpg0022-cmqtl/broad/workspace/backend/PILOT_1/BR00098071/
2022-11-18 15:10:03 18679262 BR00098071.csv
2022-11-18 15:10:04 19352133632 BR00098071.sqlite
2022-11-18 15:14:10 18682116 BR00098071_augmented.csv
2022-11-18 15:14:11 18790045 BR00098071_augmented.gct
2022-11-18 15:14:13 19194075 BR00098071_normalized.csv
2022-11-18 15:14:14 19302004 BR00098071_normalized.gct
2022-11-18 15:14:16 3340077 BR00098071_normalized_variable_selected.csv
2022-11-18 15:14:16 3359822 BR00098071_normalized_variable_selected.gct```
Turns out this is also the case for possibly all other batches:
aws s3 ls s3://cellpainting-gallery/cpg0022-cmqtl/broad/workspace/backend/2019_05_13_Batch2/BR00103267/
2022-11-18 14:41:54 31142861 BR00103267.csv
2022-11-18 14:41:54 13099298816 BR00103267.sqlite
2022-11-18 14:42:40 31157110 BR00103267_augmented.csv
2022-11-18 14:42:41 27883812 BR00103267_colony.csv
2022-11-18 14:42:41 27896716 BR00103267_colony_augmented.csv
2022-11-18 14:42:41 28601118 BR00103267_colony_normalized.csv
2022-11-18 14:42:41 4024618 BR00103267_colony_normalized_variable_selected.csv
2022-11-18 14:42:42 31036454 BR00103267_isolated.csv
2022-11-18 14:42:42 31050703 BR00103267_isolated_augmented.csv
2022-11-18 14:42:42 31923521 BR00103267_isolated_normalized.csv
2022-11-18 14:42:42 4464820 BR00103267_isolated_normalized_variable_selected.csv
2022-11-18 14:42:42 31989142 BR00103267_normalized.csv
2022-11-18 14:42:43 4480421 BR00103267_normalized_variable_selected.csv
aws s3 ls s3://cellpainting-gallery/cpg0022-cmqtl/broad/workspace/profiles/2019_05_13_Batch2/BR00103267
2022-11-30 17:45:29 4024618 BR00103267_colony_normalized_variable_selected.csv
2022-11-30 17:45:29 4464820 BR00103267_isolated_normalized_variable_selected.csv
2022-11-30 17:45:29 4480421 BR00103267_normalized_variable_selected.csv
2022-11-30 17:45:29 588702622 BR00103267_single_cell_colony_profiles.tsv.gz
2022-11-30 17:45:29 157327762 BR00103267_single_cell_isolated_profiles.tsv.gz
This structure anomaly is because we were in transition between the old and new structures.
Here's what I'd recommend doing
backend
should have only BR00098071.csv
and BR00098071.sqlite
as with the new structurebackend
should be moved to profilesprofiles
does not currently have the proper batch-plate structure; everything is under batch so that needs to be fixedHere's what I did (below). If all this looks good, you can delete profiles/
and backend/
and rename profiles_tmp
to profiles
and backend_tmp
to backend
Thanks so much for your help Shantanu! Looks good to me. I'll perform the delete/rename and then we can consider this dataset fully moved!
Super! Thanks for wrapping it up!
Data can be public in RODA Immediately profile repo: https://github.com/broadinstitute/cmQTL cpg0022-cmqtl
Transfer to CellPainting Gallery:
If data is being published, prepare for publication:
Once published: