Confusions about the 5' smart adapter and Trimmomatic adapter files

ysbioinfo commented 6 years ago

Hi, Thanks for developing such a convenient and user friendly tools! I am analyzing some drop-seq data now and want to use dropSeqPipe to do prephase analysis. I'm not so familiar with drop-seq and have some questions about this pipeline:

what does 5' SMART adapter mean? I read the original paper of Drop-seq by Macosko EZ and noted there are two "adapters" after the formation of STAMPs. One is the adapter initially attached to the bead, called "primer handle" in his paper. The other is an adapter ended with GGG, which is added in the process of reverse transcription. Which adapter is the 5' SMART adapter? Or these two "adapters” are same and both can be removed by TrimStartingSequence?
what is trimmomatic adapter for? Is it aimed at trimming the adapters added during Illumina sequencing? Do I need to include the 5' SMART primer sequence in the trimmomatic adapter files? I'd appreciate so much if you could help me to figure out these definitions! Thanks!

Yang (Edit 2h after I asked this question: I read more about the Drop-seq details and understand, the two adapters in my first question is the same and after tagmentation there will be only one 5' SMART adapter in a read)

ysbioinfo commented 6 years ago

I am also not clear about the meaning of those two parameters: UMI-edit-distance and min-count-per-umi. If I set the UMI-edit-distance to 1, does it mean that for a gene, two UMIs with only one different base will be considered as 1 transcript? If I set the min-count-per-umi to 2, does it mean that for a gene, if there is a unique UMI with only one supporting read, it will not be considered as a transcript? Am I right and could you give me some advice on how to set these parameters? By the way, what's the meaning of batch in samples.csv? How will it affect the output by dropseqpipe? Thanks a lot!

Hoohm commented 6 years ago

Hello @snoopy-448 sorry for the late response, I've been really busy.

There are a lot of questions here, I will change the wiki to add the answers you're looking for. For a quick and dirty response:

The SMART adapter is the SMARTseq primer. You can find it in the documentation here. TrimStartingSequence will try to trim this in the second read (mRNA read).
trimmomatic adapters adapters-file is a more general way to trim sequences from your wetlab protocol from Read2 (mRNA). Basically you can throw in there all the potential sequences you might see wrongly integrated in your sequences. From the primers linked to the cell capturing, amplification sequencing adapters, etc...

UMI-edit-distance: This is the distance between two UMI within a cell with the same gene. Basically, how many mismatches do you allow between two umi from the same gene/cell. The higher you go the less UMI counts you will have. It depends a bit on your sequencing quality and the length of the UMI.
min-count-per-umi: This specifies how many UMI do you need to call out a gene. I would advise to leave it at 1 since the UMI allows to summarise the PCR amplification step and get very low numbers.

Both of your assumptions are correct.

The batch column in the samples.csv is there for you to have a separation on the BC_drop and yield plot.

This helps if you are running different batches of processing for the same project.

Maybe you sequence part of your data today, but you want to have more cells, so you sequence again two weeks later. You could use this difference as a batch and see if something happened differently between the two runs directly on those plots.

Let me know if that clears up things.

ysbioinfo commented 6 years ago

Got it! thanks!

Hoohm commented 6 years ago

Closing this, feel free to reopen if needed

Hoohm / dropSeqPipe

Confusions about the 5' smart adapter and Trimmomatic adapter files #33