broadinstitute / infercnv

Inferring CNV from Single-Cell RNA-Seq
Other
557 stars 164 forks source link

what to use for ref_group_name? #598

Open mbihie opened 1 year ago

mbihie commented 1 year ago

I am using a Seurat object to create an infercnv object and have filled out every argument except for ref_group_name. The object uses 10x raw count data and it is a human breast cancer tissue sample. The cells of interest are epithelial cells, which have been identified as a cluster in the object. Would the classifications of the reference (normal) cells be stored somewhere in the Seurat object? I'm not sure what the exact spelling used by the object is.

I am trying to replicate the infercnv workflow from this article (x). I cannot find the reference normal cell names used in the Seurat object titled "TNBCSub.rds" (x).

I attempted to use this c("epithelial", "epithelial-cycling") as the ref_group_name, but I got the error message below. Does anyone know where I went wrong? Any assistance would be appreciated.

subsetting the TNBC object to just the epithelial cells of interest

#read object in
tnbc    <- readRDS("~/BRCA/BRCA-data/SeuratObject_TNBCSub.rds")

#updating the object if it won't open
tnbc = UpdateSeuratObject(object = tnbc) #use this if you get Error in eval(call("@", object, slot)) : no slot of name "images" for this object of class "Seurat"

#rename object
tnbc.epi <- tnbc 

#rename clusters
tnbc.epi <- RenameIdents(tnbc.epi,
                         '0' = "epithelial",
                         '2' = "epithelial-cycling")

#subset to only epithelial cells
tnbc.epi <- subset(x = tnbc.epi, idents = c("epithelial", "epithelial-cycling"))
#set output path
output_dir = "output_dir2"

Create the infercnv object

# ?CreateInfercnvObject
infercnv_obj = CreateInfercnvObject(raw_counts_matrix=GetAssayData(tnbc.epi, "data"),
                                    annotations_file=as.matrix(tnbc.epi@active.ident),
                                    delim="\t",
                                    gene_order_file="~/BRCA/test-data/gencode_v19_gene_pos.txt",
                                    ref_group_names=c("epithelial-cycling"))

run

infercnv_obj_run = infercnv::run(infercnv_obj,
                                 cutoff=1, 
                                 out_dir=output_dir, 
                                 cluster_by_groups=T,
                                 HMM=T,
                                 per_chr_hmm_subclusters=F,
                                 denoise=T,
                                 up_to_step = 15
)

output from creating the object

INFO [2023-09-06 10:51:10] Parsing gene order file: ~/BRCA/test-data/gencode_v19_gene_pos.txt
INFO [2023-09-06 10:51:10] ::order_reduce:Start.
INFO [2023-09-06 10:51:10] .order_reduce(): expr and order match.
INFO [2023-09-06 10:51:11] ::process_data:order_reduce:Reduction from positional data, new dimensions (r,c) = 1000,12262 Total=1401783.23872392 Min=-2.86241296846613 Max=7.11563065412695.
INFO [2023-09-06 10:51:11] num genes removed taking into account provided gene ordering list: 68 = 6.8% removed.
INFO [2023-09-06 10:51:11] -filtering out cells < 100 or > Inf, removing 43.8754 % of cells
WARN [2023-09-06 10:51:11] Please use "options(scipen = 100)" before running infercnv if you are using the analysis_mode="subclusters" option or you may encounter an error while the hclust is being generated.
INFO [2023-09-06 10:51:11] validating infercnv_obj

output from run

INFO [2023-09-06 10:51:16] ::process_data:Start
INFO [2023-09-06 10:51:16] Checking for saved results.
INFO [2023-09-06 10:51:16] Trying to reload from step 15
INFO [2023-09-06 10:51:16] Using backup from step 15
INFO [2023-09-06 10:51:16] 

    STEP 1: incoming data

INFO [2023-09-06 10:51:16] 

    STEP 02: Removing lowly expressed genes

INFO [2023-09-06 10:51:16] 

    STEP 03: normalization by sequencing depth

INFO [2023-09-06 10:51:16] 

    STEP 04: log transformation of data

INFO [2023-09-06 10:51:16] 

    STEP 08: removing average of reference data (before smoothing)

INFO [2023-09-06 10:51:16] 

    STEP 09: apply max centered expression threshold: 3

INFO [2023-09-06 10:51:16] 

    STEP 10: Smoothing data per cell by chromosome

INFO [2023-09-06 10:51:16] 

    STEP 11: re-centering data across chromosome after smoothing

INFO [2023-09-06 10:51:16] 

    STEP 12: removing average of reference data (after smoothing)

INFO [2023-09-06 10:51:16] 

    STEP 14: invert log2(FC) to FC

INFO [2023-09-06 10:51:16] 

    STEP 15: computing tumor subclusters via leiden

INFO [2023-09-06 10:51:16] Reached up_to_step
GeorgescuC commented 10 months ago

Hi @mbihie ,

Looking at the code of how you create the infercnv object, any of the identities present in annotations_file=as.matrix(tnbc.epi@active.ident) is a valid option for reference groups. It can be a single of them or multiple.

Since you also run

tnbc.epi <- RenameIdents(tnbc.epi,
                         '0' = "epithelial",
                         '2' = "epithelial-cycling")

#subset to only epithelial cells
tnbc.epi <- subset(x = tnbc.epi, idents = c("epithelial", "epithelial-cycling"))

It seems you only have 2 groups as part of your annotations, so you are currently setting the "epithelial-cycling" as references, and the "epithelial" as observations.

Further, looking at the log you posted, I don't actually see any error. The log you posted however is from rerunning infercnv in an existing run directory, so the backup reloading already finds all the results up to the step you specified it to run (step 15, the subclustering) and so stops after that. If you thought you had an issue because because you don't have the final figure, you should just need to remove the up_to_step option so it keeps going after the subclustering. If you had an issue in step 15, based on the backup being saved in your rerun attempt, it means the error happened during the plotting which is after saving the backup. In that case, a likely source of issue is having a leiden_resolution that is too high with the default and thus too many singleton subclusters. If you want to rerun step 15 alone to post the log of that step actually being run, you can delete the file named "15_*.infercnv_obj" and rerun infercnv as you did until now.

Regards, Christophe.