Closed ysbioinfo closed 6 years ago
I am also not clear about the meaning of those two parameters: UMI-edit-distance and min-count-per-umi. If I set the UMI-edit-distance to 1, does it mean that for a gene, two UMIs with only one different base will be considered as 1 transcript? If I set the min-count-per-umi to 2, does it mean that for a gene, if there is a unique UMI with only one supporting read, it will not be considered as a transcript? Am I right and could you give me some advice on how to set these parameters? By the way, what's the meaning of batch in samples.csv? How will it affect the output by dropseqpipe? Thanks a lot!
Hello @snoopy-448 sorry for the late response, I've been really busy.
There are a lot of questions here, I will change the wiki to add the answers you're looking for. For a quick and dirty response:
The SMART adapter is the SMARTseq primer. You can find it in the documentation here. TrimStartingSequence will try to trim this in the second read (mRNA read).
trimmomatic adapters adapters-file
is a more general way to trim sequences from your wetlab protocol from Read2 (mRNA). Basically you can throw in there all the potential sequences you might see wrongly integrated in your sequences. From the primers linked to the cell capturing, amplification sequencing adapters, etc...
UMI-edit-distance: This is the distance between two UMI within a cell with the same gene. Basically, how many mismatches do you allow between two umi from the same gene/cell. The higher you go the less UMI counts you will have. It depends a bit on your sequencing quality and the length of the UMI.
min-count-per-umi: This specifies how many UMI do you need to call out a gene. I would advise to leave it at 1 since the UMI allows to summarise the PCR amplification step and get very low numbers.
Both of your assumptions are correct.
The batch column in the samples.csv is there for you to have a separation on the BC_drop and yield plot.
This helps if you are running different batches of processing for the same project.
Maybe you sequence part of your data today, but you want to have more cells, so you sequence again two weeks later. You could use this difference as a batch and see if something happened differently between the two runs directly on those plots.
Let me know if that clears up things.
Got it! thanks!
Closing this, feel free to reopen if needed
Hi, Thanks for developing such a convenient and user friendly tools! I am analyzing some drop-seq data now and want to use dropSeqPipe to do prephase analysis. I'm not so familiar with drop-seq and have some questions about this pipeline:
Yang (Edit 2h after I asked this question: I read more about the Drop-seq details and understand, the two adapters in my first question is the same and after tagmentation there will be only one 5' SMART adapter in a read)