broadinstitute / cmQTL

High-dimensional phenotyping to define the genetic basis of cellular morphology
BSD 3-Clause "New" or "Revised" License
6 stars 0 forks source link

Download sample images #35

Closed shntnu closed 3 years ago

shntnu commented 4 years ago

@jatinarora-upmc @sasgari The notebook 1.profile-cell-lines⁩/7.select_images_to_print.Rmd shows how to download sample images. Have a look and LMK if you have any questions.

shntnu commented 4 years ago

I have uploaded sample images here (13Gb)

But you may want to resample to get more / different images.

shntnu commented 4 years ago

@jatinarora-upmc The file data/gwas_images.csv generated by 7.select_images_to_print.md has the information you need about cell lines

In #36 I updated the code so that all this information is readily available in two CSV files. E.g. In a row from sample_images.csv

Metadata_Plate Metadata_Well Metadata_Channel filename URL
cmqtlpl1.5-31-2019-mt A12 URL_OrigDNA r01c12f05p01-ch5sk1fk1fl1.tiff https://s3.amazonaws.com/imaging-platform/projects/2018_06_05_cmQTL/2019_06_10_Batch3/images/cmqtlpl1.5-31-2019-mt__2019-06-10T16_42_36-Measurement2/Images/r01c12f05p01-ch5sk1fk1fl1.tiff

we see that the file cmqtlpl1.5-31-2019-mt/r01c12f05p01-ch5sk1fk1fl1.tiff comes from plate cmqtlpl1.5-31-2019-mt and well A12.

We can join with sample_images_metadata.csv

to figure the metadata corresponding to plate cmqtlpl1.5-31-2019-mt and well A12:

Metadata_Plate Metadata_Well Metadata_Row Metadata_FieldID Metadata_Assay_Plate_Barcode Metadata_Plate_Map_Name Metadata_well_position Metadata_plating_density Metadata_line_ID
cmqtlpl1.5-31-2019-mt A12 1 5 cmqtlpl1.5-31-2019-mt cmQTL_plate1_5.31.2019 A12 10000 34

specifically, that the cell line id is 34.

jatinarora-upmc commented 4 years ago

I guess there is an issue here. There are many image identifiers which map to more than 1 cell lines. For example, the image r14c11f05p01-ch5sk1fk1fl1.tiff maps to two cell lines on two plates (98 on BR00106709, and 236 on BR00107338). Could you check, or am I looking in wrong way?

shntnu commented 4 years ago

The file names are identical across all plates.

Use Metadata_Plate as well as filename to identify

Does that help?

Sent from my iPhone

On Apr 2, 2020, at 2:43 PM, Jatin Arora notifications@github.com wrote:

 I guess there is an issue here. There are many image identifiers which map to more than 1 cell lines. For example, the image r14c11f05p01-ch5sk1fk1fl1.tiff maps to two cell lines on two plates (98 on BR00106709, and 236 on BR00107338). Could you check, or am I looking in wrong way?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

shntnu commented 4 years ago

@jatinarora-upmc I'm following up on your Slack message here. Can you review this thread and LMK if you are able to figure out how to get example images for a cell line ID?

jatinarora-upmc commented 4 years ago

@shntnu thanks much for reminding me of this thread. Is the file sample_images_metadata.csv for all images across all plates (for which the link you put in second message here (13g))?

shntnu commented 4 years ago

The notebook referred to in https://github.com/broadinstitute/cmQTL/issues/35#issue-591953618 was used to produce sample_images_metadata.csv and sample_images.csv. It samples one well per cell line and then a single, fixed field-of-view a.k.a. site from each well (it always picks Metadata_Site = 5)

jatinarora-upmc commented 4 years ago

Thanks @shntnu . Would it be possible to generate another set of images with another Metadata_Site, let's say 3? I guess the images are stored at your side, so you might have to re-run the script?

shntnu commented 4 years ago

Now available via #45

shntnu commented 4 years ago

@jatinarora-upmc this notebook has details on how to download; I am copying it below. The sample_images.csv file referred to below is produced by that notebook.

IMAGE_DIR=/tmp/cmqtl

mkdir -p $IMAGE_DIR

cut -d"," -f1 data/sample_images.csv | grep -v Metadata_Plate| sort -u > /tmp/plates.txt

parallel -a /tmp/plates.txt --no-run-if-empty mkdir -p $IMAGE_DIR/{} 

parallel \
 --header ".*\n" \
 -C "," \
 -a data/sample_images.csv \
 --eta \
 --joblog ${IMAGE_DIR}/download.log \
 wget -q -O ${IMAGE_DIR}/{1}/{4} {5}
jatinarora-upmc commented 4 years ago

@shntnu i saw this code previously, but i could not figure out where new sample_images.csv file is. #45 redirects me to #35, and i got lost in circle. Sorry to bug again, am not used to github at all - so i think a direct link to sample_images.csv would be so helpful.

shntnu commented 4 years ago

Sure thing. This is the file https://github.com/broadinstitute/cmQTL/blob/bcef95625d964d10ad8d81e31b453ab21f09f969/1.profile-cell-lines/data/sample_images.csv

In that snippet, I had a relative path to it data/sample_images.csv because the notebook is in the folder 1.profile-cell-lines/.

jatinarora-upmc commented 4 years ago

all set now, thanks so much @shntnu

jatinarora-upmc commented 4 years ago

@shntnu i am going through the comments i got in today's meeting. May i ask for the images for plate7 also?

shntnu commented 4 years ago

I have updated sample_images.csv to include the new version of plate 7 https://github.com/broadinstitute/cmQTL/blob/master/1.profile-cell-lines/data/sample_images.csv

@jatinarora-upmc see https://github.com/broadinstitute/cmQTL/issues/35#issuecomment-658963662 for what to do next (everything is the same as before, just that I have now replaced with the new plate 7 images)

jatinarora-upmc commented 3 years ago

@shntnu it seems there are fewer lines in this updated samples_images.csv, and there is no plate cmQTLplate7-7-22-20 anywhere in Metadata_Plate column. Could you please check?

shntnu commented 3 years ago

@jatinarora-upmc Now fixed in #60, which updated https://github.com/broadinstitute/cmQTL/blob/master/1.profile-cell-lines/data/sample_images.csv