broadinstitute / infercnv

Inferring CNV from Single-Cell RNA-Seq
Other
526 stars 159 forks source link

if some gene in count-matrix should be filted #451

Open lizhan96 opened 1 year ago

lizhan96 commented 1 year ago

Dear developer:

i ran inferCNV with the liver cancer single-cell-RNA dataset, and the hepatocyte from tumor sample was select as the obsevation, the hepatocyte from non-cancer liver tissue sample as the reference.

Totally, 14833 hepatocyte from tumor, 2000 hepatocyte from non-cancer liver tissue and 19030 gene was included.

No error message happend.

but I can't get any CNV amplification/loss. If some gene with expression 0 in almost cell sould be filted?

Only 8442 genes in the result table, and all of that didn't have any CNV amplification/loss

cnv_score_table <- as.matrix(cnv_table)
cnv_score_table[cnv_score_mat > 0 & cnv_score_mat < 0.3] <- "A" #complete loss. 2pts
cnv_score_table[cnv_score_mat >= 0.3 & cnv_score_mat < 0.7] <- "B" #loss of one copy. 1pts
cnv_score_table[cnv_score_mat >= 0.7 & cnv_score_mat < 1.3] <- "C" #Neutral. 0pts
cnv_score_table[cnv_score_mat >= 1.3 & cnv_score_mat <= 1.5] <- "D" #addition of one copy. 1pts
cnv_score_table[cnv_score_mat > 1.5 & cnv_score_mat <= 2] <- "E" #addition of two copies. 2pts
cnv_score_table[cnv_score_mat > 2] <- "F" #addition of more than two copies. 2pts

table(cnv_score_table[,1])

table(cnv_score_table[,1])

C 8442

I am looking for some advises from you, thank you !!

GeorgescuC commented 1 year ago

Hi @lizhan96 ,

Infercnv filters out genes that are not expressed, or whose expression is below a given threshold on average because including them dilutes the signal too much. Some genes are not expressed in certain cell types, and some are expressed at levels too low for consistent detection. The threshold is defined through the "cutoff" option, but around 8442 genes left after filtering is common.

How are you defining the thresholds you use for for cnv_score_mat? The first thing to do is to inspect the residual expression plot infercnv.png and see if the references look clean of signal, and if the observations seem to have any. You can then compare that with the HMM results if you ran it to make sure the clustering was run with the correct options for your data, and use those results if that is the case. Alternatively, if you do not want to use the HMM, you can define residual expression thresholds to use as you seem to have done based on the quantiles of residual values for the references.

Regards, Christophe.