benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
469 stars 142 forks source link

PlotQualityProfile and Trimming #489

Closed saifsikdar closed 6 years ago

saifsikdar commented 6 years ago

I conducted a 16S metagenomic sequencing. Now, I am using R (new R user) with DADA2 package to analyze the sequencing data. But, I failed to edit the “PlotQualityScore” function as I want. I hope you could be of some help-

I would like put a title (GF cohoused with TAC SPF) on the graph instead of my raw sample name (S97m2_......). I tried the following code:

The code I Run: plotQualityProfile(fnFs[50]) + labs(title ="GF cohoused with TAC SPF", outer = F) ggsave("GF cohoused with TAC SPF.jpg") dev.off()

But I failed to put title name removing/replacing the raw sample name (attached). Could you please suggest me how I can do this? Can I also move the red line of the graph up or down?

For another issue- According to the authors- "we choose to truncate the forward reads at position 245, and the reverse reads at position 160. We also choose to trim the first 10 nucleotides of each read based on empirical observations across many Illumina datasets that these base positions are particularly likely to contain pathological errors". They used the following code as an example of doing this-

out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, truncLen=c(240,160), maxN=0, maxEE=c(2,2), truncQ=2, rm.phix=TRUE, compress=TRUE, multithread=TRUE) Now my questions are- I don't see which section of the code trimmed the FIRST 10 nucleotide using this code: 1. Are theymissing the "trimLeft" function? 2. In truncLen function of the code, shouldn't it be 245 instead of 240 like this truncLen=c(245,160)? And 3. can anyone please explain truncQ function? I look forward to hearing from you. Thank you very much

benjjneb commented 6 years ago

But I failed to put title name removing/replacing the raw sample name (attached). Could you please suggest me how I can do this? Can I also move the red line of the graph up or down?

These plots are created by ggplot2 and can be modified as per the mechanisms of that package. You've stumbled on a pretty un-obvious case though, the filename is the title of that facet rather than the overall title, so to get rid of it, you have to turn off the facet titles. You can do that as so:

plotQualityProfile(...) + theme(strip.background = element_blank(), strip.text.x = element_blank())

We also choose to trim the first 10 nucleotides of each read based on empirical observations across many Illumina datasets that these base positions are particularly likely to contain pathological errors

We don't recommend doing that anymore, and that text needs to be updated. Where did you read that?

Edit:

  1. can anyone please explain truncQ function?

It truncates sequences at first instance of truncQ. Those sequences are then tossed if the resulting sequences i less than truncLen.

bmillerlab commented 5 years ago

Hello, The text still says to truncate the first 10 bases -

"We also choose to trim the first 10 nucleotides of each read based on empirical observations across many Illumina datasets that these base positions are particularly likely to contain pathological errors."

It can be found on URL: https://bioconductor.org/help/course-materials/2017/BioC2017/Day1/Workshops/Microbiome/MicrobiomeWorkflowII.html#methods

I got stuck on this too so I was looking to find out the correct commands, since I could tell that wasn't included. Should I be using a different updated URL to try and learn the steps suggested in the workshop?

benjjneb commented 5 years ago

"We also choose to trim the first 10 nucleotides of each read based on empirical observations across many Illumina datasets that these base positions are particularly likely to contain pathological errors."

Thanks for this report. To reiterate, we DO NOT recommend blanket trimming of the first 10 nucleotides any more. You should either trim off a number of nucleotides corresponding to the length of your primers if the primers are sequenced, or set trimLeft=0 (the default).

Other than that, everything else in that bioconductor course materials can be used as is.