A question about reference cells

broadinstitute / infercnv

Inferring CNV from Single-Cell RNA-Seq

Other

566 stars 166 forks source link

A question about reference cells #390

Closed inbarsh2 closed 2 years ago

inbarsh2 commented 2 years ago

Hi, I have several samples with a relatively small number of reference cells. When I run inferCNV, the results are very noisy and seem as there are CNVs both in tumor cells and reference cells. I tried adjusting the cutoff, used diagnostics tools (the results of the diagnostics are as expected according to the GitHub), increased the number of reference clusters and set a threshold for min_max_counts_per_cell (between 500 and 10,000). Do you have an idea how I can try to solve this? I attach an example. Thank you in advance nb166_infer_out

GeorgescuC commented 2 years ago

Hi @inbarsh2 ,

How many genes are kept after the filtering? It seems as though the cells you use as reference are rather heterogeneous within one of your defined annotations. As long as the reference cells are not "clean" of signal, results can't be used. Could you double check the correct cells are in each annotation group by looking at infercnv_obj@reference_grouped_cell_indices ? If they are, you might also want to try letting infercnv reclusters them by setting num_ref_group to 3 to begin with, then there are patterns that more clusters are present, increase that value.

Regards, Christophe.

inbarsh2 commented 2 years ago

Thank you! I don't seem to have this output file (infercnv_obj@reference_grouped_cell_indices). However, I did try to increase the number of reference clusters (to 6 and even 20) and it diesn't seem to solve the problem. The annotations are based on clustering following dimentionality redulction (using Seurat) and cell type markers. Is there any other way to check the reference is accurate? nb114_infer_out_1k_10k_20clusters .

GeorgescuC commented 2 years ago

Hi @inbarsh2 ,

For infercnv_obj@reference_grouped_cell_indices it is a slot in the R object that you create with infercnv and then provide to the run() method, so you should simply need to type that in your R session (you may need to change the infercnv_obj part if you named your object differently). Is the matrix you provide as input to infercnv still a raw counts one? Also, what is the reason for setting the max reads per cell to 10,000? The figure you sent with the 20 ref groups is not very clean, and 20 groups is probably too much subdivision, but it does at least seem not to have any of the stronger signals seen in the observations left in the references.

Regards, Christophe.

mea2712 commented 2 years ago

@GeorgescuC I'm piggy backing here with more of a conceptual question: I have several tumor samples each with few normal cells. I have been running inferCNV within samples (one run for each sample) because I worry about the batch effects if I pull all samples together. However, the downside of my approach is that my inferences sometimes are based on less than 5 normal reference cells. How would you go about this? Do you recommend to run it pulling all samples together? How can I asses if there is any CNV being called due to batch effects? Have you address this issue at any point? Thanks!

GeorgescuC commented 2 years ago

Hi @mea2712 ,

One thing you can test first before pulling all samples together is to only pull all normal cells together and run infercnv with all of them set as observations (so references set to "c()" ). In doing so, the average of all cells will be used as the base level, and you can check if there is any unexpected signal that appears in the normal cells.

Regards, Christophe.