Closed shntnu closed 2 years ago
I am copying @gwaygenomics's question here
From Gregory Way to Everyone: (11:55 AM) One thing regarding interpretation of morphology features: is it a bad idea to use the morphology feature selection approach to find candidate genes but then run followup tests on these 6 or so specific genes but use all morphology features
@jatinarora-upmc Recap of Zernike: See https://github.com/broadinstitute/cmQTL/issues/32#issuecomment-648410790
I also had another question about nearest gene to GWAS signals. Do we see any of these pop up? How about GWAS gene neighborhoods?
Quick notes from today's meeting on rare variant burden test on morphology features:
I also had another question about nearest gene to GWAS signals. Do we see any of these pop up? How about GWAS gene neighborhoods?
@gwaygenomics thanks for bringing this up. Actually this is in to-do list once we are done with common and rare variant associations. I was wondering if cell health can also be incorporated as a covariate.
I am copying @gwaygenomics's question here
From Gregory Way to Everyone: (11:55 AM) > One thing regarding interpretation of morphology features: is it a bad idea to use the morphology feature selection approach to find candidate genes but then run followup tests on these 6 or so specific genes but use all morphology features
not a bad idea as we saw, while a feature has one or two associated genes, a single gene might impact many features. I think we could do this to check if super correlated features are affected by same genes - as a sanity check in the end.
@bethac07 @shntnu hi Beth, Shantanu, could you help me to get live cell counter information per well?
Did you mean just cell count (vs fraction of live cells?) For the former see *_count.csv
in https://github.com/broadinstitute/cmQTL/tree/master/1.profile-cell-lines/profiles. For the latter, we'd need to use models from https://github.com/broadinstitute/cell-health but it will need some effort to do that. If the latter, can you remind me of the context?
@shntnu actually, i meant the latter, fraction of live cells. The idea was to know how many good cells we have in the condition like this image. Actually, during last presentation, i wanted to ask your opinion to include cell health as a covariate in my model.
@jatinarora-upmc Indeed fraction of live cells could be estimated using the Cell Health models like this.
@gwaygenomics What do you feel about Jatin using these models directly? There's no way to evaluate (in this dataset) but we'll know if it's totally off (e.g. if we get crazy numbers). The results could well be totally off the charts because the models were trained on a very different cell line. But certainly worth testing it out IMO (assuming it will take Jatin no more than 2 days to apply and test)
@gwaygenomics What do you feel about Jatin using these models directly? There's no way to evaluate (in this dataset) but we'll know if it's totally off (e.g. if we get crazy numbers). The results could well be totally off the charts because the models were trained on a very different cell line. But certainly worth testing it out IMO (assuming it will take Jatin no more than 2 days to apply and test)
Sounds cool! @jatinarora-upmc and I chatted separately on slack (sorry for not posting my thoughts earlier) but I will summarize below:
% dead only
model to this matrix and output predictionsI won't be able to get to this for a couple days though, so let's brainstorm if I can do anything else in this time period (but please be gentle and weary of feature creep!)
Fantastic!
The only other request is: also test a couple of well-performing models that can be easily validated by using CellProfiler features. From the list below, I'd go with cc_all_n_objects
and cc_all_nucleus_area_mean
(feature mapping is here). Does that sound reasonable @gwaygenomics ?
that's perfect - will do!
I started this analysis today and ran into a road block. It turns out there are 506 features measured in the Cell Health project that are not measured in the cmQTL project. Many of these features have nonzero coefficients for the three models we proposed using. The cmQTL data I am using (Jatin sent over a .tab
file on dropbox) has 3,582 features. The missing features are all texture
and correlation
features.
Unless we can resolve this feature difference, then the Cell Health models can not easily be applied to the cmQTL data and we should abandon this analysis.
I added my progress in #51 - if we can resolve this, then outputting predictions can happen very quickly
Many of those features may still actually be measured*, just have different names, since IIRC CellHealth was CellProfiler 2 and cmQTL is definitely CellProfiler 3. Is there a list of the unique features from each set somewhere? We may be able to do a fair amount of cross referencing.
Let's split off the cell health-related discussion to this thread https://github.com/broadinstitute/cmQTL/issues/53
@gwaygenomics @bethac07 @shntnu just following up on cell health readouts, was it feasible to align the features?
Let's use this thread to discuss any questions from today @jatinarora-upmc.