broadinstitute / infercnv

Inferring CNV from Single-Cell RNA-Seq
Other
557 stars 164 forks source link

How to distinguish between normal and malignant epithelial cells? #594

Open YuliaInn opened 1 year ago

YuliaInn commented 1 year ago

I am trying to find malignant cells within all epithelial cells in a tumor tissue. I found a paper that could do it using infercnv and they describe it as "Cells defined as endothelia, fibroblast, and macrophage were used as reference to identify somatic copy number variations (CNV) with the R package infercnv (v0.8.2). We scored each cell for the extent of CNV signal, defined as the mean of squares of CNV values across the genome. Putative malignant cells were then defined as those with CNV signal above 0.05 and CNV correlation above 0.5." While trying to reproduce this I had the following questions:

  1. Are CNV signal values represented by expression counts that are stored in infercnv_obj@expr.data?
  2. How to find "mean of squares of CNV values across the genome"? I assume it can be something like apply(infercnv_obj@expr.data,2,function(x){mean(x^2)})
  3. What can CNV correlation be?
bvaldebenitom commented 1 year ago

Hi there,

I think you might find useful issues #439 and #338.

Also, in this post you might find some helpful considerations when applying the mean of squares:

The thing to keep in mind when doing this, is that if there are double deletion or double duplication, a single dupl/del will have a score around 0.5/-0.5, which once squared is only 0.25, vs a squared score of 1 for a doube dupl/del.

Best, Braulio.

YuliaInn commented 1 year ago

thank you @bvaldebenitom for directing me to similar issues. #439 looks almost exactly like my question. However, I couldn't find exactly what I was looking for.

  1. They said that CNV signal can be calculated using residual expression values infercnv_obj@expr.data the majority of which are ~1. So, the means of squares for each cell will also be closer to 1. hist_cnv_sign

In referenced papers, CNV signal values are way lower than 1. How were they calculated?

  1. There is no answer to the question "what is CNV correlation?". If it is described as "the correlation between the CNA profile of each cell and the average CNA profile of all cells from the corresponding tumor.", which output should I use to calculate it? What can a CNA profile be? Is "average CNA profile of all cells from the corresponding tumor" just one avg value for all "observed" cells?

thank you, Yulia

xuezhang335 commented 1 year ago

@YuliaInn Hello, have you solved it? I also have this problem. The CNV signal, which is defined as the sum of the squared values of all genes, is all around 1, which is much larger than the standard proposed in the literature: 'Putative malignant cells were then defined as those with CNV signal above 0.05 and CNV correlation above 0.5'. Should the background value 1 be subtracted? Also, what is the formula for calculating COR?