broadinstitute / cmQTL

High-dimensional phenotyping to define the genetic basis of cellular morphology
BSD 3-Clause "New" or "Revised" License
6 stars 0 forks source link

July 20 2020 Discussions (cell count confounders, cell health predictions) #47

Closed shntnu closed 2 years ago

shntnu commented 4 years ago

Let's use this thread to discuss any questions from today @jatinarora-upmc.

shntnu commented 4 years ago

I am copying @gwaygenomics's question here

From Gregory Way to Everyone: (11:55 AM) 
One thing regarding interpretation of morphology features: is it a bad idea to use the morphology feature selection approach to find candidate genes but then run followup tests on these 6 or so specific genes but use all morphology features


shntnu commented 4 years ago

@jatinarora-upmc Recap of Zernike: See https://github.com/broadinstitute/cmQTL/issues/32#issuecomment-648410790

gwaybio commented 4 years ago

I also had another question about nearest gene to GWAS signals. Do we see any of these pop up? How about GWAS gene neighborhoods?

jatinarora-upmc commented 4 years ago

Quick notes from today's meeting on rare variant burden test on morphology features:

  1. check SLFN12 (having significant association with Cytoplasm_Areashape_Zernike_3_1 in any cells) in isolate cells also
  2. interaction between variant burden in a gene and ipsc source tissue, or donor ancestry
  3. cross check associations with images and live cell counter
  4. include doubling time as a covariate in association analysis
  5. can also include total number of cells in well as a proxy for cell cycle
jatinarora-upmc commented 4 years ago

I also had another question about nearest gene to GWAS signals. Do we see any of these pop up? How about GWAS gene neighborhoods?

@gwaygenomics thanks for bringing this up. Actually this is in to-do list once we are done with common and rare variant associations. I was wondering if cell health can also be incorporated as a covariate.

jatinarora-upmc commented 4 years ago

I am copying @gwaygenomics's question here

From Gregory Way to Everyone: (11:55 AM) 
> One thing regarding interpretation of morphology features: is it a bad idea to use the morphology feature selection approach to find candidate genes but then run followup tests on these 6 or so specific genes but use all morphology features

not a bad idea as we saw, while a feature has one or two associated genes, a single gene might impact many features. I think we could do this to check if super correlated features are affected by same genes - as a sanity check in the end.

jatinarora-upmc commented 4 years ago

@bethac07 @shntnu hi Beth, Shantanu, could you help me to get live cell counter information per well?

shntnu commented 4 years ago

Did you mean just cell count (vs fraction of live cells?) For the former see *_count.csv in https://github.com/broadinstitute/cmQTL/tree/master/1.profile-cell-lines/profiles. For the latter, we'd need to use models from https://github.com/broadinstitute/cell-health but it will need some effort to do that. If the latter, can you remind me of the context?

jatinarora-upmc commented 4 years ago

@shntnu actually, i meant the latter, fraction of live cells. The idea was to know how many good cells we have in the condition like this image. Actually, during last presentation, i wanted to ask your opinion to include cell health as a covariate in my model.

image
shntnu commented 4 years ago

@jatinarora-upmc Indeed fraction of live cells could be estimated using the Cell Health models like this.

@gwaygenomics What do you feel about Jatin using these models directly? There's no way to evaluate (in this dataset) but we'll know if it's totally off (e.g. if we get crazy numbers). The results could well be totally off the charts because the models were trained on a very different cell line. But certainly worth testing it out IMO (assuming it will take Jatin no more than 2 days to apply and test)

gwaybio commented 4 years ago

@gwaygenomics What do you feel about Jatin using these models directly? There's no way to evaluate (in this dataset) but we'll know if it's totally off (e.g. if we get crazy numbers). The results could well be totally off the charts because the models were trained on a very different cell line. But certainly worth testing it out IMO (assuming it will take Jatin no more than 2 days to apply and test)

Sounds cool! @jatinarora-upmc and I chatted separately on slack (sorry for not posting my thoughts earlier) but I will summarize below:

I won't be able to get to this for a couple days though, so let's brainstorm if I can do anything else in this time period (but please be gentle and weary of feature creep!)

shntnu commented 4 years ago

Fantastic!

The only other request is: also test a couple of well-performing models that can be easily validated by using CellProfiler features. From the list below, I'd go with cc_all_n_objects and cc_all_nucleus_area_mean (feature mapping is here). Does that sound reasonable @gwaygenomics ?

image

gwaybio commented 4 years ago

that's perfect - will do!

gwaybio commented 4 years ago

I started this analysis today and ran into a road block. It turns out there are 506 features measured in the Cell Health project that are not measured in the cmQTL project. Many of these features have nonzero coefficients for the three models we proposed using. The cmQTL data I am using (Jatin sent over a .tab file on dropbox) has 3,582 features. The missing features are all texture and correlation features.

Unless we can resolve this feature difference, then the Cell Health models can not easily be applied to the cmQTL data and we should abandon this analysis.

gwaybio commented 4 years ago

I added my progress in #51 - if we can resolve this, then outputting predictions can happen very quickly

bethac07 commented 4 years ago

Many of those features may still actually be measured*, just have different names, since IIRC CellHealth was CellProfiler 2 and cmQTL is definitely CellProfiler 3. Is there a list of the unique features from each set somewhere? We may be able to do a fair amount of cross referencing.

shntnu commented 4 years ago

Let's split off the cell health-related discussion to this thread https://github.com/broadinstitute/cmQTL/issues/53

jatinarora-upmc commented 4 years ago

@gwaygenomics @bethac07 @shntnu just following up on cell health readouts, was it feasible to align the features?