Hoohm / CITE-seq-Count

A tool that allows to get UMI counts from a single cell protein assay
https://hoohm.github.io/CITE-seq-Count/
MIT License
79 stars 44 forks source link

CITE-seq-Count for multimodal data #166

Open YingzhengXu opened 2 years ago

YingzhengXu commented 2 years ago

Hi,

Thanks for developing this great tool. I just have 2 questions regarding CITE-seq-Count.

Question1: So, I have a multimodal scRNAseq dataset in which both CITEseq and Hashtag antibodies were pooled together in one R1/R2 pair. I was wondering if CITE-seq-Count can handle such multimodal data.

Specifically, should I run CITE-seq-Count twice to count CITEseq and Hashtag separately like following: CITE-seq-Count -R1 R1 -R2 R2 -t Hashtag_antibodies -wl whitelist -cbf 1 -cbl 16 -umif 17 -umil 28 -cells cell_number --start-trim 10 -o output_folder CITE-seq-Count -R1 R1 -R2 R2 -t CITEseq_antibodies -wl whitelist -cbf 1 -cbl 16 -umif 17 -umil 28 -cells cell_number --start-trim 10 -o output_folder

Or just put all antibodies in one csv and run it once like the following: CITE-seq-Count -R1 R1 -R2 R2 -t Hashtag_plus_CITEseq_antibodies -wl whitelist -cbf 1 -cbl 16 -umif 17 -umil 28 -cells cell_number --start-trim 10 -o output_folder

Question 2: This is a whole tissue sequencing, meaning that I will get a lot of unmapped reads in the CITEseq output because the CITEseq Abs are specific. Should I remove the unmapped row from the matrix before normalization?

Thanks for you time

bassanio commented 1 year ago

@YingzhengXu : Where you able to answer your concern. If Yes. Can you please comment how you handled it. I am having the same question.

Thanks in advance

YingzhengXu commented 1 year ago

@YingzhengXu : Where you able to answer your concern. If Yes. Can you please comment how you handled it. I am having the same question.

Thanks in advance

Hi,

I was able to solve it. I ran CITE and Hashtag Abs separately. But it shouldn’t matter if you decide to run it all together. Just give it a try to see which method is better for you. Let me know if you have any other questions I can help with.

bassanio commented 1 year ago

@YingzhengXu : Thanks understood. Final question Did you ran the citeseq on the Raw run or on the 10x cellranger outs?

YingzhengXu commented 1 year ago

@YingzhengXu : Thanks understood. Final question Did you ran the citeseq on the Raw run or on the 10x cellranger outs?

I assume you meant by fastq files by “raw run”. Yes, you use fastq files as input. CITEseq count is basically an alignment/mapping tool, so you should run it on fastq files.

bassanio commented 1 year ago

@YingzhengXu : I ma little confused now. Does that mean Once I used CITSeq count I don't use the 10x cellranger?

YingzhengXu commented 1 year ago

@YingzhengXu : I ma little confused now. Does that mean Once I used CITSeq count I don't use the 10x cellranger?

You use both pipelines for different purpose. cellranger provides fastq (output of mkfastq) and rna expression matrix(output of count); where CITE-seq-count is specifically for generating matrix for cite seq antibodies or hashtags. In a Seurat object, you save these matrices(RNA, citeseq, hashtag) in different assay slots.

If you have ran mkfastq, use the output fastq files as input for CITE-seq-count to generate additional matrix. Additionally, use cellranger count output as the main RNA expression matrix for RNA assay slot in Seurat.

nyi1991 commented 1 year ago

@YingzhengXu : I ma little confused now. Does that mean Once I used CITSeq count I don't use the 10x cellranger?

You use both pipelines for different purpose. cellranger provides fastq (output of mkfastq) and rna expression matrix(output of count); where CITE-seq-count is specifically for generating matrix for cite seq antibodies or hashtags. In a Seurat object, you save these matrices(RNA, citeseq, hashtag) in different assay slots.

If you have ran mkfastq, use the output fastq files as input for CITE-seq-count to generate additional matrix. Additionally, use cellranger count output as the main RNA expression matrix for RNA assay slot in Seurat.

Hi @YingzhengXu I am new to CITE-seq data analysis and am using Seurat as well. I was wondering how you were able to integrate the citeseq matrix with the main RNA expression matrix but get a "All cells in the object being added must match the cells in this object" error. Is it because the barcodes in the cellranger RNA matrix have "-1" at the end and the barcodes for CITE-seq-Count don't? Thanks for your help!

YingzhengXu commented 1 year ago

@YingzhengXu : I ma little confused now. Does that mean Once I used CITSeq count I don't use the 10x cellranger?

You use both pipelines for different purpose. cellranger provides fastq (output of mkfastq) and rna expression matrix(output of count); where CITE-seq-count is specifically for generating matrix for cite seq antibodies or hashtags. In a Seurat object, you save these matrices(RNA, citeseq, hashtag) in different assay slots. If you have ran mkfastq, use the output fastq files as input for CITE-seq-count to generate additional matrix. Additionally, use cellranger count output as the main RNA expression matrix for RNA assay slot in Seurat.

Hi @YingzhengXu I am new to CITE-seq data analysis and am using Seurat as well. I was wondering how you were able to integrate the citeseq matrix with the main RNA expression matrix but get a "All cells in the object being added must match the cells in this object" error. Is it because the barcodes in the cellranger RNA matrix have "-1" at the end and the barcodes for CITE-seq-Count don't? Thanks for your help!

Hi, You were right about the "-1" attachment. I assume you have removed "-1" but did not remove cells that don't match after that. A simple solution would be: find the overlapping barcodes between citeseq and RNA matrix, remove the barcodes that don't match, then integrate.

Here's a simple template for it: intersect<-instersect(colnames(citeseq_matrix), colnames(RNA_expression_matrix)) citeseq_matrix<-citeseq_matrix[,intersect] RNA_expression_matrix<-RNA_expression_matrix[,intersect] seurat_obj<-CreateSeuratObject(counts=RNA_expression_matrix) seurat_obj[['CITE']]<-CreateAssayObject(counts=citeseq_matrix)

If you get few cells that overlap, try to run CITEseqCount with a whitelist (generated from cellranger, barcodes.tsv) provided to the -wl argument. This will force citeseq to look for the barcodes that are already in the whitelist. Let me know if I can further help