broadinstitute / cell-health

Predicting Cell Health with Morphological Profiles
MIT License
35 stars 9 forks source link

LoadData CSV File Missing #162

Open jenna-tomkinson opened 1 year ago

jenna-tomkinson commented 1 year ago

Hello!

I will be rerunning the Cell Health dataset that was run with CellProfiler 3.0 (I assume) using the latest CellProfiler 4.2.4 version for a project to assess any differences.

As I am trying to use the LoadData module, I noticed it was struggling with being able to select metadata for grouping. I attempted to use both .csv files located in the 0.download_data/IDR folder, but neither of them worked.

I figured out this was due to the files not containing columns with metadata (e.g. Metadata_Plate, etc.).

Do you happen to have the exact .csv file that you used to run Cell Health with CellProfiler 3.0? If you did, that would greatly help my ability to reproduce the results with the newer version.

Thank you!

shntnu commented 1 year ago

I found this on my laptop!

Archive.zip

But they are also on our S3 bucket; see details below


It's worth @gwaybio downloading the files listed below (using his AWS account) and checking if they are the same as the archive. The S3 version is more reliable

aws s3 ls --profile imaging-amazon --recursive s3://imaging-platform/projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/ |grep load_data_with_illum.csv
2020-03-05 10:02:10    6421131 projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/CRISPR_PILOT_B1/SQ00014610/load_data_with_illum.csv
2020-03-05 10:02:05    6422953 projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/CRISPR_PILOT_B1/SQ00014611/load_data_with_illum.csv
2020-03-05 10:02:11    6422935 projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/CRISPR_PILOT_B1/SQ00014612/load_data_with_illum.csv
2020-03-05 10:02:12    6422966 projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/CRISPR_PILOT_B1/SQ00014613/load_data_with_illum.csv
2020-03-05 10:02:13    6422958 projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/CRISPR_PILOT_B1/SQ00014614/load_data_with_illum.csv
2020-03-05 10:02:09    6422918 projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/CRISPR_PILOT_B1/SQ00014615/load_data_with_illum.csv
2020-03-05 10:02:07    6422919 projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/CRISPR_PILOT_B1/SQ00014616/load_data_with_illum.csv
2020-03-05 10:02:08    6422929 projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/CRISPR_PILOT_B1/SQ00014617/load_data_with_illum.csv
2020-03-05 10:02:06    6423015 projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/CRISPR_PILOT_B1/SQ00014618/load_data_with_illum.csv

Wait ~12h for this to be restored:

aws s3 ls --profile imaging-amazon --recursive s3://imaging-platform/projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/ |grep load_data_with_illum.csv|tr -s ' '|cut -d" " -f4 > /tmp/load_data_files.txt
parallel -a /tmp/load_data_files.txt aws s3api --profile imaging-amazon restore-object --bucket imaging-platform --key {} --restore-request GlacierJobParameters={"Tier"="Standard"}