Closed mzhoufulai closed 4 years ago
Hi,
Cell Ranger will attempt to identify which species each cell is derived from, but the approach does not work well when the ratio is far from 50:50.
My recommendation would be to use the filtered matrices directly to make your own determination. Instructions on loading the filtered barcode matrix in either R or Python are available here:
Once loaded, genes from either genome are prefixed with the genome name (e.g. mm10
or hg19
) and you can sum the count of genes from each genome for each barcode to determine the species based on the relative count of mouse genes to human genes.
support@10xgenomics.com can likely provide additional follow up.
Warm wishes, Nigel
I have the same issue. You said we can determine the species based on the relative count of mouse genes to human genes. Is there any threshold or something for that? thank you
@evolvedmicrobe this is an older closed issue but we are having a similar issue. Can you please elaborate on your solution? For example, you mentioned,
Cell Ranger will attempt to identify which species each cell is derived from, but the approach does not work well when the ratio is far from 50:50.
in this case do we use the prebuild human+mouse index? I think on the dl page is called, "GRCh38_and_mm10-2020-A_build" file, refdata-gex-GRCh38-and-mm10-2020-A.tar.gz
thanks.
@ahdee if you can post this image from the websummary (only for your sample), I might be able to advise on a path forward for you.
@evolvedmicrobe thanks. Ok here is one of my samples. In the summary mm10 only mapped to genome .5% while h38 maps 97% however I do see that there a a few really high logFC mm10 genes. What do you advice?
@ahdee it appears you don't have any mouse cells in that sample. The human UMI counts per barcode is typically >1K, while you don't observe any barcodes with mouse counts >60, and I suspect those are mapping artifacts and not real mouse DNA. Are you sure you have mouse cells in this sample?
@evolvedmicrobe thanks. However, I'm still a bit confused. Please see the attached image. I took another sample and aligned it just a simple GRCh38; I also align it with GRCh38_mm10; so first question is.
Hi @ahdee, yes my advice would be to go with just the GRCH38 genome, as you do not appear to have any mouse cells in this data.
Cell calling is done on a per-genome basis, and calls everything within an order of magnitude of the top of the rank plot (e.g. if your 1% percentile of barcodes is 10K, everything >=1K will be called as a cell in the first step of cell calling. Because you have no meaningful mouse cells, this means that barcodes with even < 5 UMI are counted as cell-associated barcodes, and these are non-sense calls that artificially increase your number of cells. These cells are then called multiplets because they have often have much higher human umi counts than mouse counts (and I'd basically ignore the multiplet calls for a dataset like this).
@evolvedmicrobe thanks for such a great explanation. I guess this is why the y-axis for mouse were so low 0-30 instead 0-15K. This makes sense to me. thanks.
Hi, I am new in single-cell RNA-seq. I got a library with a mixture of human and mouse cells. The mouse cells ratio is only 2.5%. I used cellranger count to get the matrix. But the output of cellranger count is a mix of human and mouse. How I can filter out mouse cells and only get a matrix of human cells? Thanks!