Hoohm / CITE-seq-Count

A tool that allows to get UMI counts from a single cell protein assay
https://hoohm.github.io/CITE-seq-Count/
MIT License
79 stars 44 forks source link

Understanding [expected cells] parameter and [whitelist] parameter #155

Open jhjlee opened 3 years ago

jhjlee commented 3 years ago

Hello,

Thanks for making this tool. I have a quick question about the parameters involving cells. User can input the number of cells expected in the run, and/or also input the list of cell barcodes CITE-seq-Count would look for. (1) For the number of cells, should this be restricted to how many filtered (from cellranger) cells were obtained in partner scRNA-seq? If I have 5,000 cells in scRNA-seq data, should I put down 5,000 or would it be better to put down a larger number, say 10,000? Are cells (and their associated hash) potentially lost if I put a smaller number? (2) For the whitelist, I have it as the filtered cell barcodes from the partner scRNA-seq data. While I understand that CITE-seq-Count will correct other barcodes based on this list, am I potentially incurring false positives if I force the analysis to output results from this list? (3) Putting them together, would the optimal approach be to include a larger number of cells (than the filtered cells from partner scRNA-seq data) AND include a whitelist comprised of cell barcodes from scRNA-seq? Thank you!

sunta3iouxos commented 3 years ago

Hi there, Have you found the answers to these? I am also wondering on the same. regards

Hoohm commented 3 years ago

Hello @jhjlee, here are a few answers to your questions

1) I would recommend using about 20% more cells than expected. So for 5k I would use about 6 or 7K. This will allow to catch a few more reads. I don't expect this to change your results much though. You should always try and run with both and compare your results. 2) Using the whitelist is the safest way to ensure that you will only grab cells that you expect. It will only capture and report the cells that are in your list or the ones that have been corrected and are in your input list. 3) For data analysis and downstream processing I would recommend using the received whitelist. For troubleshooting, I would recommend using expected cells to get everything out of your data and then compare to the whitelist.

I hope this helps. Let me know if you have other questions.