Determining "CNA Signal" and "CNA Correlation" from inferCNV output files

pshukla99 commented 2 years ago

Hi all,

I'm trying to follow the analysis in this paper (https://www.sciencedirect.com/science/article/pii/S0092867419306877?via%3Dihub#figs1) to quantitatively classify the cells in my sample as malignant vs non-malignant. I'm interested in computing "CNA Signal" which is defined as "mean of the squares of CNA values across the genome" and the "CNA Correlation" which is defined as "the correlation between the CNA profile of each cell and the average CNA profile of all cells from the corresponding tumor."

Which output files should I be looking at to find these CNA values? Also, are there any other ways to assign labels of malignant vs non-malignant to cells in a sample that are more quantitative than visual inspection of the final inferCNV heatmap? Thanks in advance. Screen Shot 2022-07-21 at 7 49 34 PM

sunshine1126 commented 2 years ago

I'm also interested in this. I hope there's a more quantitative way to identify malignant or non-malignant cells in a sample based on CNV.

GeorgescuC commented 2 years ago

Hi @pshulka99 , Hi @sunshine1126 ,

If you want to do the analysis in R, you can use the infercnv object you have at the end of the analysis (which can be loaded again with infercnv_obj = readRDS("run.final.infercnv_obj)"). The residual expression values found in the infercnv_obj@expr.data slot can be used to calculate the mean of square across the genome. Alternatively, you can read the text file matrices output with each plot.

For identification of confident CNAs, you can run the HMM of infercnv that will define specific boundaries for CNAs and the specific fold change. A Bayesian network is also used for filtering based on posterior probabilities. You can find more details about the HMM on the wiki.

Regards, Christophe.

sunshine1126 commented 2 years ago

@GeorgescuC Thanks for your reply, and I will try it.

gloriafight commented 1 year ago

@GeorgescuC Thanks for your reply, and I will try it.

Do you solve this question? When I calculate the mean square for each cell based on the infercnv.observations.txt, the result is as follow. However, the cnv score is very low.

Lualululu commented 1 year ago

@GeorgescuC Thanks for your reply, and I will try it.

Do you solve this question? When I calculate the mean square for each cell based on the infercnv.observations.txt, the result is as follow. However, the cnv score is very low.

I got the same results as yours. If I calculate the mean of the squares of CNA values across the genome as the cnv signals, the results are 0.00X-0.00X. If I calculate the standard deviation of CNA values across the genome, the results are 0.0X-0.0X. The method using the standard deviation calculation is closer to the results of the paper. But no matter which method is used, tumor cells and non-malignant cells are still relatively divided, because tumor cells both have a higher cnv signal and cnv correlation.

The paper didn't specify the threshold selection of cnv signal and cnv correlation, which is what I am curious about, whether it is possible to divide most tumor cells from non-malignant cells. For your data, it is feasible to choose 0.003 and 0.45 as the cnv signal and cnv correlation threshold.

Not sure if my idea is reasonable, looking forward to your reply！

pikapika505 commented 1 year ago

Hi @pshulka99 , Hi @sunshine1126 ,

Did you find out how to compute CNA signal and CNA correlation? I am trying to reproduce another cancer data analysis which also includes CNA signal and CNA correlation. Based on @GeorgescuC answer, I guess I can find CNA signal as mean of squares of expr.data for each cell across the genes. But what about CNA correlation?

@gloriafight @Lualululu how did you calculate CNA correlation?

thank you, Yulia

broadinstitute / infercnv

Determining "CNA Signal" and "CNA Correlation" from inferCNV output files #439