Super cool package :). I have a question regarding the method used to remove genes with low expression when running inferCNV. This is specified in the cutoff = argument, which says that it looks for genes that are expressed above the specified threshold in the reference cells. I am using cutoff = 0.1
I am currently working with 2 datasets: my reference dataset and my query dataset. The reference dataset is quite more sparse than my query. When I run by reference dataset by itself (ref_group_names = NULL in CreateInfercnvObject()), only around 1000 genes make it past the cutoff
`STEP 02: Removing lowly expressed genes
INFO [2023-11-29 16:27:16] ::above_min_mean_expr_cutoff:Start
INFO [2023-11-29 16:27:16] Removing 15837 genes from matrix as below mean expr threshold: 0.1
INFO [2023-11-29 16:27:16] validating infercnv_obj
INFO [2023-11-29 16:27:16] There are 1100 genes and 202 cells remaining in the expr matrix.
INFO [2023-11-29 16:27:16] no genes removed due to min cells/gene filter
INFO [2023-11-29 16:27:16] `
However, when I use these cells as a reference in order to infer CNVs in my query data, 10000 genes satisfy the cutoff condition. Since cutoff = should remove genes based on the reference cells, I would expect only 1000 genes to make it through, but this is not the case.
`STEP 02: Removing lowly expressed genes
INFO [2023-11-29 17:38:09] ::above_min_mean_expr_cutoff:Start
INFO [2023-11-29 17:38:10] Removing 1521 genes from matrix as below mean expr threshold: 0.1
INFO [2023-11-29 17:38:10] validating infercnv_obj
INFO [2023-11-29 17:38:10] There are 11069 genes and 2326 cells remaining in the expr matrix.
INFO [2023-11-29 17:38:13] no genes removed due to min cells/gene filter
INFO [2023-11-29 17:38:42] `
Out of curiosity, I plotted the average number of counts of each gene within my a) reference dataset, b) query dataset and c) reference + query dataset, and checked how many genes had a greater average expression than 0.1:
Reference dataset: 1154 genes above 0.1
Query dataset: 8713 genes above 0.1
Reference + query datasets: 8713 genes above 0.1
I am a bit confused by these discrepancies. I might be missing something in how inferCNV curates the gene list, so i was wondering if you could pinpoint me into a direction that might explain these missmatches.
Hey all!
Super cool package :). I have a question regarding the method used to remove genes with low expression when running inferCNV. This is specified in the
cutoff =
argument, which says that it looks for genes that are expressed above the specified threshold in the reference cells. I am usingcutoff = 0.1
I am currently working with 2 datasets: my reference dataset and my query dataset. The reference dataset is quite more sparse than my query. When I run by reference dataset by itself
(ref_group_names = NULL
inCreateInfercnvObject()
), only around 1000 genes make it past the cutoff`STEP 02: Removing lowly expressed genes
INFO [2023-11-29 16:27:16] ::above_min_mean_expr_cutoff:Start INFO [2023-11-29 16:27:16] Removing 15837 genes from matrix as below mean expr threshold: 0.1 INFO [2023-11-29 16:27:16] validating infercnv_obj INFO [2023-11-29 16:27:16] There are 1100 genes and 202 cells remaining in the expr matrix. INFO [2023-11-29 16:27:16] no genes removed due to min cells/gene filter INFO [2023-11-29 16:27:16] `
However, when I use these cells as a reference in order to infer CNVs in my query data, 10000 genes satisfy the cutoff condition. Since
cutoff =
should remove genes based on the reference cells, I would expect only 1000 genes to make it through, but this is not the case.`STEP 02: Removing lowly expressed genes
INFO [2023-11-29 17:38:09] ::above_min_mean_expr_cutoff:Start INFO [2023-11-29 17:38:10] Removing 1521 genes from matrix as below mean expr threshold: 0.1 INFO [2023-11-29 17:38:10] validating infercnv_obj INFO [2023-11-29 17:38:10] There are 11069 genes and 2326 cells remaining in the expr matrix. INFO [2023-11-29 17:38:13] no genes removed due to min cells/gene filter INFO [2023-11-29 17:38:42] `
Out of curiosity, I plotted the average number of counts of each gene within my a) reference dataset, b) query dataset and c) reference + query dataset, and checked how many genes had a greater average expression than 0.1:
I am a bit confused by these discrepancies. I might be missing something in how inferCNV curates the gene list, so i was wondering if you could pinpoint me into a direction that might explain these missmatches.
Thank you!! :)