Hoohm / dropSeqPipe

A SingleCell RNASeq pre-processing snakemake workflow
Creative Commons Attribution Share Alike 4.0 International
147 stars 47 forks source link

questions on wiki plots #26

Closed olechnwin closed 6 years ago

olechnwin commented 6 years ago

Hi I have several questions on the wiki plots:

  1. On the cell barcode and UMI quality trim plots, what is the tagged x reads means? what is x? At first I thought x is the total number of reads in the sample. However, mine has a different number. It would be nice to show percentage as well.
  2. The polyA trimming of reads, x axis are the length of reads after polyA is trimmed or the length of polyA ? How should we use this plot? What does the distribution supposed to tell us?
  3. Same thing, what are we supposed to see on the distribution of SMART adapter?
  4. The barnyard plot, what is the definition for No Call?

Thank you again for your help. I have to say this pipeline and the wiki make it much more easier to run the programs.

Hoohm commented 6 years ago

@olechnwin

1) X is the number of bases in the UMI/cell barcode that are under the threshold. The plot BC_drop.pdf gives you the percentage of trimmed reads based on those thresholds.

2) X is the length of the polyA that has been trimmed and y, how many times this length has been trimmed. This is probably not the most useful, but if you see really high numbers of a certain length being trimmed you could find a problem stemming from the wetlab protocol.

3) Same as before, you might find out that you have a large portion of your reads having the whole adapter in there instead of just some random distribution.

4) This is based on expected_cells. Cells after this number are defined as No call. Although there is no plan to improve it right now, it will definitely be changed in later versions.

I hope those answers are helpful, don't hesitate to ask more :)

olechnwin commented 6 years ago

Thanks for your clarification. Very helpful. I have some more clarification questions:

  1. That means the 99,802 (in the wiki) is the sum of the area under the red bars? i.e. the total number of bases that are dropped?

  2. Our total reads uniquely mapped by STAR is ~ 7.9M. As you can see in the attached file, there are more than 100k polyA with length 71 trimmed. Does this mean problem from wetlab protocol?

  3. For the SMART plot, does the attached plot look fine? It doesn't look like random distribution to me.

Thank you again for your clarifications.

polya_trimmed.pdf start_trim.pdf

Hoohm commented 6 years ago

1) Yes, you are close the to the reality. There is one difference between tagged and dropped. You have some barcodes where the UMI and the cell barcode are going to be tagged. Those are of course dropped. The 99,802 will be the sum of reads where the cell barcode is tagged + the reads where the UMI AND the cell barcode are tagged. I should change the legend to make it consistent. The individual plots are less interesting than the BC_drop where you compare all samples and you have also the BC + UMI reads. 2) I don't think so, 1% of your reads is nothing. Don't worry about it 3) Same as 2.

If those number grow to 10/20% I would start to worry.