broadinstitute / cmQTL

High-dimensional phenotyping to define the genetic basis of cellular morphology
BSD 3-Clause "New" or "Revised" License
6 stars 0 forks source link

Nov 2020 Discussions (associations with PRLR) #64

Closed jatinarora-upmc closed 2 years ago

jatinarora-upmc commented 3 years ago

Rare variation in PRLR gene is associated with multiple traits, but only in isolate cells. One of such traits is Cells_RadialDistribution_MeanFrac_ER_4of4. image

Other associated traits are: image

About PRLR:

So, can it be that differential regulation of AGXT2 might be linked to the localisation of mitochondria? The variants taken for rare variant burden test (high or moderate impact, protein coding variants) in PRLR gene did not overlap with AGXT2 promoters/enhancers. But they could very possibly be in LD with low impact variants. I will check this. Please let me know if you have any thoughts meanwhile. @shntnu @AnneCarpenter @bethac07 @raldanehme

shntnu commented 3 years ago

Tagging @raldanehme separately because she may not have gotten the notification (GitHub does not notify if you mention someone in an edit to a comment, like you did above @jatinarora-upmc )

AnneCarpenter commented 3 years ago

The fact that the feature list includes both Cells_RadialDistribution_MeanFrac_ER_4of4 plus the same feature with Mito in place of ER tells me this is more about cell shape than intensity patterns of those particular channels. The mean intensity of DNA stain at edge of the Cell compartment implies to me that the cells are rounded up. It would help to see images to confirm this.

Do you have any mechanism for normalizing features to try to remove the impact of cell count? @jatinarora-upmc

jatinarora-upmc commented 3 years ago

@shntnu @AnneCarpenter @raldanehme @bethac07 Dear all, in the final test, rare variant burden in PRLR is associated with 3 traits (column feat in screenshot) in isolate cells. image

Btw, these 3 associations with PRLR are just nominally significant (p ~ 10^-4) in non-isolate cells (having any neighbor).

AnneCarpenter commented 3 years ago

I suspect Cytoplasm_Texture_InfoMeas1_DNA_20_00 is related to the same issues of cell shape as Cells_Intensity_MaxIntensityEdge_DNA, given that it's again about DNA in the cytoplasm - @bethac07 should ponder it too and confirm (I don't recall what InfoMeas1 is, nor the impact of those scale numbers 20_00). @jatinarora-upmc you can solidify these hypotheses by plotting on a single cell level the relationship/correlation between those features and each other, and those features and cell area.

The third one might be interestingly about mitochondrial distribution but my suspicion again is that it may be directly related to the same issues of cell shape as the other two. If a cell is rounded up, the mito stain will be preferentially in the inner ring around the nucleus (ring 1 out of 4 concentric rings).

Another collaborator made a schematic about this (not to be published, but just to give you the concept): Screen Shot 2021-03-08 at 8 22 47 AM

BUT I don't know what RadialCV means in the context of this CV so hopefully Beth can illuminate (or you can check CellProfiler's manual). It may be that in the innermost ring around the nucleus, the mitochondria have a higher CV, which would mean more bright and dim (aka contrasty) staining as opposed to smooth uniform staining.

bethac07 commented 3 years ago

@jatinarora-upmc 's intution about Radial CV is correct; essentially, within each "ring", is the staining uniformly distributed or unevenly distributed. I think you have it backwards, though- if it has a lower coefficient of variation, wouldn't that mean the variation is lower, and therefore the staining is more evenly distributed in the variant?

Per the CellProfiler documentation, InfoMeas1 is defined like this (mathematical definition here ).

InfoMeas1: A measure of the total amount of information contained within a region of pixels derived from the recurring spatial relationship between specific intensity values.

I would not though that either the Cytoplasm_Texture measure or the Cell_Intensity measure had anything to do with cell size or shape, except perhaps incidentally; both are measuring DNA in places where no DNA should be (within the cytoplasm, or at the outside edge of the cell, respectively), so honestly I'd say they are measures of either segmentation quality (aka that segmentation is worse in those variants, perhaps due to size or shape) or crowding. As @AnneCarpenter said, it would really be nice to see the images.

AnneCarpenter commented 3 years ago

Oh, Beth - there's no written trail but we discussed in our meetings - the link to cell size/shape is that we hypothesize the cells are packed in closely so that one's DNA overlaps another cell's cytoplasm.

Regarding the drawing, wouldn't both drawings have identical CV because it's more about the CV of pixel intensities within the ring, not paying attention to spatial arrangements? So to get a different CV we'd need the distribution of pixel intensities to be either more uniform or less inform to get changes in CV?

bethac07 commented 3 years ago

the link to cell size/shape is that we hypothesize the cells are packed in closely so that one's DNA overlaps another cell's cytoplasm

Sure, but this association is in theory in cells that have no neighbors, which is why my $ would be on segmentation errors.

The CV metric in question is "divide each ring into 8 'wedges' and look at coefficient of variation". I'd have to (and can, if need be) dig into the source code to say more, but I think it essentially does end up breaking down into "how evenly around the ring is the staining distributed".

AnneCarpenter commented 3 years ago

Oh, GREAT catch on this being supposedly isolated cells. Fully agree, then.

And yes, I wasn't aware of the wedges so Jatin's original interpretation makes sense, that they would be clustered together spatially. Thanks for answering both definitively!

jatinarora-upmc commented 3 years ago

Thanks much everyone for detailed comments. Here i attach RadialCV_mito association and images of 3 cell lines without and with rare variant burden in PRLR. Isolate cells are much fewer than non-isolate cells, so you might have to spot them a little bit. It seems their segmentation would be fine. prlr.pdf

AnneCarpenter commented 3 years ago

I wish we could display individual isolated cells from the two classes instead of a field of view when so few cells are in that class. @shntnu is it too painful to pick random cells from the isolate class to make a montage of ~50 of each from across samples and fields?

Also, Jatin, have you looked at per cell histograms for these metrics to see whether in PRLR mutants the whole population shifts a bit higher or lower vs a few cells becoming outliers which causes the mean to shift (unless you're using median but still it might be nice to see the single cell data for the 3 features).

shntnu commented 3 years ago

is it too painful to pick random cells from the isolate class to make a montage of ~50 of each from across samples and fields?

It's likely painful :D But I've pinged profilers.

bethac07 commented 3 years ago

I don't see any segmentation outlines on that image, unless I'm missing something? So I'm not sure on what basis we could say segmentation is or isn't fine.

AnneCarpenter commented 3 years ago

Once again awesome catch - to elaborate, Jatin: for example, if a single isolated cell is mistakenly segmented into two by splitting the nucleus down the middle, it would cause there to be DNA stain very close to the edge of each of the two half-cells and could explain the behavior we are seeing; it could explain all 3 features in fact.

Although this is a technical artifact, this is not to say that there is no actual phenotype here, it could be something like a lumpy nucleus that gets split into two. But if the numbers of isolated cells being analyzed is small, here, it could just be a technical artifact and not scientifically interesting. So Jatin, we would want to check the absolute numbers of isolated cells for these samples.

AnneCarpenter commented 3 years ago

I imagine Jatin you need help to see the segmentation. You need the raw images (as their separate channels, not overlaid) + the original CellProfiler pipeline. Then it's easy to run (using the version of CP that was used originally). Let us know if you need help on any of those steps (Beth I cannot recall which person on your team to point to, if any ?)

Unless we saved the segmentation outlines, which I doubt.

bethac07 commented 3 years ago

We often DO save the segmentation outlines, I can check if I know which batch(es) those images come from.

On Tue, Mar 9, 2021 at 9:30 AM Anne Carpenter notifications@github.com wrote:

I imagine Jatin you need help to see the segmentation. You need the raw images (as their separate channels, not overlaid) + the original CellProfiler pipeline. Then it's easy to run (using the version of CP that was used originally). Let us know if you need help on any of those steps (Beth I cannot recall which person on your team to point to, if any ?)

Unless we saved the segmentation outlines, which I doubt.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/cmQTL/issues/64#issuecomment-793971456, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTI7256VSHBTA7QKDKCNLTTCYPFPANCNFSM4USYISJA .

-- Beth Cimini, PhD CZI Imaging Scientist/Senior Computational Biologist Imaging Platform, Broad Institute 415 Main St Room 5011 Cambridge, MA 02142 Current office number- (617) 714-8189 Pronouns - She/her/hers I will sometimes send or respond to emails outside of my local office hours, but I never expect responses outside of your local office hours.

jatinarora-upmc commented 3 years ago

Hi everyone, thanks again for replies. Here is the comparison of number of isolate cells (n on y axis) in cell lines with (1 on x axis) and without (0 on x axis) rare variant burden in prlr gene. This is a significant difference, potentially because of unbalanced size, but you can see cell lines with rare variant burden do no have really that different number of isolate cells. image

I do have channel images, and will check for segmentation. But I would need help in this, because your eyes are trained to look at them :)

AnneCarpenter commented 3 years ago

Great, the cell count is low but not egregiously low to the point where it's likely to be an artifactual result. That's a relief.

Super, if you cannot find segmentation it sounds like Beth can help if you tell her the details.

jatinarora-upmc commented 3 years ago

@bethac07 hi Beth, could you please help in check segmentation outlines? The images of 3 cell lines (shown above) having rare variant burden in PRLR are:

In the format of Plate:Image BR00107338:r11c01f05p01 with_variant_BR00107338_r11c01f05p01 5chanels BR00107339:r07c15f05p01 image BR00106708:r06c04f03p01 image

The plate and images of all 7 cell lines (including 3 shown above) with rare variant burden are here. r.03p01 and r.05p01 are 3rd or 5th field of view i guess. Cell Line ID Plate:Image
214 BR00107339:r07c15f03p01
214 BR00107339:r07c15f05p01
181 BR00106708:r06c04f03p01
181 BR00106708:r06c04f05p01
32 cmqtlpl1.5-31-2019-mt:r06c05f03p01
32 cmqtlpl1.5-31-2019-mt:r06c05f05p01
238 BR00107338:r11c01f03p01
238 BR00107338:r11c01f05p01
29 cmqtlpl1.5-31-2019-mt:r03c23f03p01
29 cmqtlpl1.5-31-2019-mt:r03c23f05p01
153 BR00106708:r16c19f03p01
153 BR00106708:r16c19f05p01
260 BR00107338:r06c23f03p01
260 BR00107338:r06c23f05p01

Please let me know if I can help with anything else.

bethac07 commented 3 years ago

Those images are not sufficient; you would need the segmentation outlines, aka the outlines of what is called the nucleus and what is called the cell body. Since those cells are relatively rare, you'd likely want to look at all 4 (or 5? ) wells * 9 images per well for the segmentation to see if there are any trends (as well as probably a similar number of "control" images to see if the level of mistake is comparable between this and average or not).

jatinarora-upmc commented 3 years ago

Thanks Beth. Alright. There would be 8 wells per cell line * 8 images per well. I have no idea how and where to look for segmentation outlines. Do you have them already calculated? Do the plate (batch) information and cell line IDs would be sufficient information to pull out segmentation outlines.

bethac07 commented 3 years ago

We sometimes save them after generation and sometimes do not; I would need to know which batch(es) to check (but likely if it's on for one it's on for all).

If they are not already calculated, we'd need to re-run CellProfiler to re-generate them.

jatinarora-upmc commented 3 years ago
Sure, here is plate (batch) and image information of cases and control cel lines. Cases    
Cell Line ID Plate:Image  
214 BR00107339:r07c15f03p01
214 BR00107339:r07c15f05p01
181 BR00106708:r06c04f03p01
181 BR00106708:r06c04f05p01
32 cmqtlpl1.5-31-2019-mt:r06c05f03p01
32 cmqtlpl1.5-31-2019-mt:r06c05f05p01
238 BR00107338:r11c01f03p01
238 BR00107338:r11c01f05p01
29 cmqtlpl1.5-31-2019-mt:r03c23f03p01
29 cmqtlpl1.5-31-2019-mt:r03c23f05p01
153 BR00106708:r16c19f03p01
153 BR00106708:r16c19f05p01
260 BR00107338:r06c23f03p01
260 BR00107338:r06c23f05p01
     
Controls    
Cell Line ID Plate:Image  
112 BR00107338:r04c16f03p01
112 BR00107338:r04c16f05p01
136 BR00106709:r03c08f03p01
136 BR00106709:r03c08f05p01
30 cmqtlpl1.5-31-2019-mt:r02c04f03p01
30 cmqtlpl1.5-31-2019-mt:r02c04f05p01
255 BR00107338:r16c15f03p01
255 BR00107338:r16c15f05p01
195 BR00107339:r01c05f03p01
195 BR00107339:r01c05f05p01
12 cmqtlpl1.5-31-2019-mt:r02c24f03p01
12 cmqtlpl1.5-31-2019-mt:r02c24f05p01
158 BR00106708:r16c10f03p01
158 BR00106708:r16c10f05p01
215 BR00107339:r03c11f03p01
215 BR00107339:r03c11f05p01
233 BR00107338:r01c01f03p01
233 BR00107338:r01c01f05p01
277 cmQTLplate7-7-22-20:r07c19f03p01
277 cmQTLplate7-7-22-20:r07c19f05p01
206 BR00107339:r09c10f03p01
206 BR00107339:r09c10f05p01
248 BR00107338:r02c19f03p01
248 BR00107338:r02c19f05p01
113 BR00106709:r10c24f03p01
113 BR00106709:r10c24f05p01
155 BR00106708:r13c19f03p01
155 BR00106708:r13c19f05p01
125 BR00106709:r08c17f03p01
125 BR00106709:r08c17f05p01
227 BR00107339:r12c05f03p01
227 BR00107339:r12c05f05p01
222 BR00107339:r14c03f03p01
222 BR00107339:r14c03f05p01
213 BR00107339:r06c02f03p01
213 BR00107339:r06c02f05p01
265 BR00107338:r16c17f03p01
265 BR00107338:r16c17f05p01
bethac07 commented 3 years ago

Plate information is not batch information, do you have easy access to the batch numbers? Otherwise I need to go hunting.

bethac07 commented 3 years ago

(There are also for your cases only one well (with two images) listed for each, you said there were 8 wells per line?)

bethac07 commented 3 years ago

I went ahead and spot checked, in at least a couple of batches the images are there so it's likely there for all of them, you can find them (assuming you have access, since you have some original images, @shntnu is this in fact the case?) at s3://imaging-platform/projects/2018_06_05_cmQTL/workspace/analysis/BATCH/PLATE/analysis/PLATE-WELL-SITE/outlines/WELL_sSITE--nuclei_outlines.png and s3://imaging-platform/projects/2018_06_05_cmQTL/workspace/analysis/BATCH/PLATE/analysis/PLATE-WELL-SITE/outlines/WELL_sSITE--cell_outlines.png. You'd want to ideally pull all images (raw AND outlines) from all wells of all the lines in question. When you have that pulled together, I can walk you through the next step. Hope that helps!

jatinarora-upmc commented 3 years ago

Actually, i have only two random images per cell lines, which were kindly provided by Shantanu. @shntnu could you please help in pulling out all images from all wells for the cell lines in the table above?

shntnu commented 3 years ago

@jatinarora-upmc I've updated the files that we used in https://github.com/broadinstitute/cmQTL/issues/35, and it now includes the outline files as well.

Recap:

IMAGE_DIR=/tmp/cmqtl

mkdir -p $IMAGE_DIR

cut -d"," -f1 data/sample_images.csv | grep -v Metadata_Plate| sort -u > /tmp/plates.txt

parallel -a /tmp/plates.txt --no-run-if-empty mkdir -p $IMAGE_DIR/{} 

parallel \
 --header ".*\n" \
 -C "," \
 -a data/sample_images.csv \
 --eta \
 --joblog ${IMAGE_DIR}/download.log \
 wget -q -O ${IMAGE_DIR}/{1}/{4} {5}
jatinarora-upmc commented 3 years ago

Hi @shntnu , thanks much for the files. Are these images and outlines randomly sampled or they are for the cell lines I listed in the table above? I am asking because I see that images for one plate (cmqtlpl261-2019-mt) is missing.

shntnu commented 3 years ago

they are for the cell lines I listed in the table above?

Yes

I am asking because I see that images for one plate (cmqtlpl261-2019-mt) is missing.

cmqtlpl261-2019-mt is not listed above

jatinarora-upmc commented 3 years ago

@bethac07 hi Beth. Shantanu has provided me the images and outlines of all cell lines in question. Could you please help me in next steps to make sure that PRLR's associations is not any technical artifacts?

bethac07 commented 3 years ago

Sure, you'll want to do something so you can look at the outlines at the same time as the images (ideally, literally on top of the images); this could be an ImageJ script, a CellProfiler pipeline, something in your favorite scripting language, etc.

I'm happy to put together a quick CellProfiler pipeline for you to do that if that's helpful, just send me a zipped thing with all of the images (raw + outlines) from one field of view and LMK what version of CellProfiler pipeline you have.

jatinarora-upmc commented 3 years ago

@bethac07 Hi Beth, it would be really helpful to have a script, as am almost not at all aware of ImageJ and CellProfiler pipelines. Thanks very much. Here is the link to the images from all 5 channels from two field of views (f03 and f05) and all cell outline images. To note, this is for 1 cell line only with rare variants in PRLR gene. https://drive.google.com/file/d/1DAvhlAOOnaavqdRZlSnj08UY1eWuQPo7/view?usp=sharing Please let me know if i am missing anything.

bethac07 commented 3 years ago

I dont' have any prewritten scripts to do that, if you want to do it in a script I would suggest you do it by modifying whatever code you made to create the views above.

If you're willing to go to CellProfiler.org though and just download the program, I can send you a pipeline so that in theory you just drag the pipeline to where it says "drag and drop pipeline", drag and drop your images to where it says "drag and drop images", set the folder for output to go to, and then click "analyze".

bethac07 commented 3 years ago

(if you want me to do that, let me know if you plan to include only the cell outlines or also the nuclear outlines, my suggestion would be to do both but only cell were included in the folder you setn)

jatinarora-upmc commented 3 years ago

@bethac07 Hi Beth, yeah sure, it would also be great to have pipeline that i can import, so i can try to explore Cellprofiler by myself. I plan to use only cell outlines for now, as i have them only.

jatinarora-upmc commented 3 years ago

@shntnu hello Shantanu, i noticed that png images for nuc profiles were empty (0kb). Could you please check?

bethac07 commented 3 years ago

Pipeline is here. It is set to match outline images to raw images by well and site, not plate because I don't know how you're designating plate on your system; you may have to add metadata extraction for "Plate" as well. (You will need to do this if any individual well position (ie A01) is used more than once- you know you will have to, because the system will yell at you saying that some things in NamesAndTypes can't be matched; in that case, if you let me know how files are organized on your system I can quickly adjust the pipeline). cmqtl_outline_overlay.cppipe.zip

jatinarora-upmc commented 3 years ago

@bethac07 Hi Beth, thanks for the reply.

bethac07 commented 3 years ago

This pipeline is the most recent version of CellProfiler (4.1.3), since that's what you'll be downloading if you don't currently have it on your computer. It doesn't matter if the versions aren't the same since it's literally just adding the existing outlines to the existing images, no calculations are being done.

I need to know how the folders are arranged on your system to capture plate in the pipeline.

jatinarora-upmc commented 3 years ago

@bethac07 sure, the plates are arranged as individual folders (screenshot) image

bethac07 commented 3 years ago

Are the images then in those top-level plate folders? IE is it BR00106708/r00c00etc, or is it BR001067088/somesubfolder1/somesubfolder2/r00c00etc?

jatinarora-upmc commented 3 years ago

the images are under these top-level plate folders. An example of image would be BR00106708/r01c13f01p01-ch1sk1fk1fl1.tiff

bethac07 commented 3 years ago

cmqtl_outline_overlay.cppipe.zip

jatinarora-upmc commented 3 years ago

Thanks much @bethac07 for the pipeline. So basically i need to do following steps:

  1. put images (both outlines and all fields of view) under the plate-level directories (as shown above) for selected cell lines in question
  2. browse for images and select folder one level above plate-level directories in CP
  3. change/Set default output folder
  4. click on Analyze images in CP

is this correct?

bethac07 commented 3 years ago

If you drag and drop in the whole folder containing your plate-level directories, it will grab any and all image files there, so I would only drag and drop in the plate level directories if any other subfolders are present that you DON'T want analyzed. You also need to load (via dragging and dropping or File -> Import) the pipeline file I sent at any point between steps 1 and 4. Otherwise, yes, correct

jatinarora-upmc commented 3 years ago

Hi @bethac07, all worked well, and i have overlaid images now. Now, the idea is to inspect the overlaid images visually and look if there are any segmentation problems, is it right?

bethac07 commented 3 years ago

Yes, that's exactly it- specifically, a pattern of segmentation problems that exist in the variant-containing-cells but NOT the matched controls. There will always be SOME segmentation issues, we're looking for a consistent change.

On Thu, Apr 1, 2021 at 10:03 AM Jatin Arora @.***> wrote:

Hi @bethac07 https://github.com/bethac07, all worked well, and i have overlaid images now. Now, the idea is to inspect the overlaid images visually and look if there are any segmentation problems, is it right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/cmQTL/issues/64#issuecomment-811930825, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTI722IRMXXYFGOVN6FLITTGR4L7ANCNFSM4USYISJA .

-- Beth Cimini, PhD CZI Imaging Scientist/Senior Computational Biologist Imaging Platform, Broad Institute 415 Main St Room 5011 Cambridge, MA 02142 Current office number- (617) 714-8189 Pronouns - She/her/hers I will sometimes send or respond to emails outside of my local office hours, but I never expect responses outside of your local office hours.

jatinarora-upmc commented 3 years ago

Great, segmentations seems fine across many images i checked so far, but checking for patterns across all images. Thanks so much for your kind help Beth.