broadinstitute / infercnv

Inferring CNV from Single-Cell RNA-Seq
Other
556 stars 163 forks source link

CNV alterations in expected normal cells #531

Open r-osorio opened 1 year ago

r-osorio commented 1 year ago

Dear CNV Team,

Thank you for putting together such a comprehensive tool for use! I am running into a confusing issue where I am processing a sample to look for clusters with CNV alterations and am finding that every cluster in my "observations" set appears to have CNV changes. I figured this wasn't an issue, but wanted to validate this wasn't an error by showing a negative result was possible, so I isolated a CD14+ macrophage population of cells from my sample (theoretically these should have zero CNV alterations), and analyzed them with a new InferCNV::run(). I used a normal population of macrophages from a different sample as controls. In the final CNV output, I am still seeing significant CNV changes in this macrophage population, which doesn't make sense to me.

I am guessing this is moreso due to something along the lines of batch effect, since I am comparing macrophage populations from different individuals, but am still wondering what the best path forward would be. Can you really only use reference cells that come from the same sample? If that is the case, how do you select a reference population of cells that you are confident are non-tumor, while still getting a substantial number of them? Is there an ideal minimum for number of reference cells you need? I don't want to use the mean of my sample as the reference.

Thank you for any help you can provide!

GeorgescuC commented 1 year ago

Hi @r-osorio ,

Are you looking at the residual expression results or HMM results?

If you look at the log or dim(final_object@expr.data), how many genes are kept in the analysis?

How widespread is the signal you see, and how does it look? There is a known common case where signal shows up in references for immune cells in chromosome 6 in human, due to MHC genes. In that case, references display either gain or loss signal over that region, while observations all display loss signal.

About references, it is possible to use cells from different samples in the same infercnv run (the example provided with the package has multiple samples/patients) as long as you use the same sequencing method. Having the same cell type is overall more important than the same patient because cell types control expressed genes so both the filtering and then smoothing are affected. For selecting a reference population, the ideal option would be to do it through the wet lab experimental design. If that is not possible, you can use other computational approaches to classify your cells using for example marker genes for the specific type of cancer you are studying. It is probably also worth looking through published papers how different scientists did that step given the specifications of their experiment.

Regards, Christophe.