chris-mcginnis-ucsf / MULTI-seq

R implementation of MULTI-seq sample classification workflow
59 stars 10 forks source link

cannot find barcode threshold #15

Closed MichaelPeibo closed 4 years ago

MichaelPeibo commented 4 years ago

Hi @chris-mcginnis-ucsf I came to an warning that 'cannot find threshold' for certain barcode. Any suggestion what is reason behind it and how to fix it?

Actually, my dataset is a bit different with multi-seq sample data, I have only five barcodes, and their expression pattern(expression abundance) not like MULTI-seq sample barcode. Ours is much similar to a gene expression. Any suggestion how to determine valid barcode for this?

chris-mcginnis-ucsf commented 4 years ago

Hi @MichaelPeibo ,

Can you show me the barcode space for your data? Also histograms of log-normalized counts for each barcode?

I find that removing uninformative cells can help in barcode identification. For example, you can remove cells with fewer than X total barcode UMIs. You could also compute each cell's signal-to-noise (e.g., ratio of the top two most abundant barcodes for each cell) and remove cells with SNR values <1.1.

Chris

MichaelPeibo commented 4 years ago

Hi @chris-mcginnis-ucsf Here is density plot of one type of barcode, others are quite similar. image There are many low counts of certain barcode in all cells.

I figured one way to determine the valid barcoded cell, I do not know if that is reasonable, which mainly depends on mannual chosen threshold of barcode-specific empirical cumulative distribution.

We are preparing manuscripts of this project. So I am sending you details via email. It would be of great help if you can give some critical advice about that.😀