Hoohm / CITE-seq-Count

A tool that allows to get UMI counts from a single cell protein assay
https://hoohm.github.io/CITE-seq-Count/
MIT License
79 stars 44 forks source link

running with totalseqB #158

Open naila53 opened 3 years ago

naila53 commented 3 years ago

Hi,

thanks for devoloping this tool! I'm trying to run citecount with 10xGenomics 5kPBMC public data as a test for the tool. in principle, it should work as i have specified a trim of 10. I also used the script you provided in another issue to convert the barcodes and get a compatible whitelist. I do recover all the cells in the whitelist, However, the tags counts are low when i compare the cellragner output and citecount ouptut for the protein data!

according to the refrence csv for the barcodes, there should be 10N bases, the protein barcode sequence and then another 9N arbitrary sequence. can you please advise on how to run citecount properly with this dataset?

dataset fastqs here: https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.2/5k_pbmc_protein_v3_nextgem

protein barcodes refrence: https://cf.10xgenomics.com/samples/cell-exp/3.0.2/5k_pbmc_protein_v3_nextgem/5k_pbmc_protein_v3_nextgem_feature_ref.csv

my run info: CITE-seq-Count Version: 1.4.3 Reads processed: 41007618 Percentage mapped: 95 Percentage unmapped: 5 Uncorrected cells: 1 Correction: Cell barcodes collapsing threshold: 1 Cell barcodes corrected: 0 UMI collapsing threshold: 2 UMIs corrected: 665766 Run parameters: Read1_paths: /data/raw/5kPBMC/fastq/big_R1.fastq.qz Read2_paths:/data/raw/5kPBMC/fastq/big_R2.fastq.qz Cell barcode: First position: 1 Last position: 16 UMI barcode: First position: 17 Last position: 28 Expected cells: 6794880 Tags max errors: 2 Start trim: 10

I combined R1 and R2 fastq files into one for each. I specified high number of expected cells to get the empty droplets for normalization. Also, whitelist is derived from cellranger's raw_bc_feature matrix befroe filtering to make sure i get all the raw output.

for the same exact cells i compared maximum count per tag and as you can see, counts are very low in cite-count umi output!

Screen Shot 2021-07-06 at 2 26 31 PM Screen Shot 2021-07-06 at 2 26 55 PM
Hoohm commented 3 years ago

Hello @naila53 this comes from the fact that the cells have two barcodes. One for RNA and one for Protein data.

Version 1.5.0 will deal with this on the fly for users but as of today, I would recommend running the antibody data without a whitelist but with a -n_cells argument.

Then use the translation to map the barcodes properly.

hisplan commented 3 years ago

Correct me if I'm wrong, but as far as I know, one of the ouputs, barcodes.tsv.gz, from CITE-seq-Count also needs to be translated properly at the end, especially if you want to look at GEX and HTO together, which looks like what you're doing...

Hoohm commented 2 years ago

Yes, it's been a thorn in my side for a while now. I've worked on 1.5.0 today: https://github.com/Hoohm/CITE-seq-Count/tree/feature/cells_argument I'm nearly done with automated translation on the fly by just selecting the chemistry!

dmiyagi commented 2 years ago

Hi @Hoohm if I am using TotalSeqB with just normal 10x V3, is the -n_cells still recommended? is 1.5.0 ready? Or is what you are saying only for multiomic? Do you happen to know if the nuclear pore antibodies TotalSeqB can be used with 10x multiomic (RNA/ATAC)? Thank you!