benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
459 stars 142 forks source link

majority of reads failed to merge #609

Closed tedsun closed 5 years ago

tedsun commented 5 years ago

Hi Ben,

I am working through the dada2 pipeline for the first time and I found only few sequences can be merged using the function mergePairs, which I chosen the default argument.

here is the result after head(track), as you can see, the fraction of merged sequences is very low.

head(track) input filtered denoisedF denoisedR merged nonchim 3311-bb1-MSITS 79273 43572 43182 43033 176 170 3311-bb10-MSITS 199682 108101 107363 107517 7741 7654 3311-bb100-MSITS 57949 31530 31088 30947 188 188 3311-bb101-MSITS 80427 37820 37627 37663 522 522 3311-bb102-MSITS 118947 63583 63198 62970 583 583 3311-bb103-MSITS 61666 32068 31690 31367 249 249

Background: I amplified the ITS region using the ITS1F/ITS4 primers for around 200 soil samples in a tropical forest. All of the sequences were generated on a MiniSeq platform. I am processing fastq files on my Linux system using R 3.5.1 and the latest version of dada2 (1.10.0). I followed all the workflow of DADA2 ITS Pipeline Workflow (1.8), except changed primer to my ITS1F/ITS4, and changed truncLen = c(280,220) for filterAndTrim function after I checked the quanllity of cutFs/cutRs.

Another issue is that the number of times the primers appear in forward and reverse read is quite low. Do I still need to remove primers?or something goes wrong with the orientation of primers in my sequences . here is the primer and primer appearance in tree of my samples:

print(FWD) [1] "CTTGGTCATTTAGAGGAAGTAA" print(REV) [1] "TCCTCCGCTTATTGATATGC"

sample 3311-bb1-MSITS Forward Complement Reverse RevComp FWD.ForwardReads 0 0 0 0 FWD.ReverseReads 0 0 0 11 REV.ForwardReads 0 0 0 8 REV.ReverseReads 0 0 0 0

sample 3311-bb10-MSITS Forward Complement Reverse RevComp FWD.ForwardReads 0 0 0 0 FWD.ReverseReads 0 0 0 5 REV.ForwardReads 0 0 0 4 REV.ReverseReads 0 0 0 0

sample 3311-bb10-MSITS Forward Complement Reverse RevComp FWD.ForwardReads 0 0 0 0 FWD.ReverseReads 0 0 0 36 REV.ForwardReads 0 0 0 45 REV.ReverseReads 0 0 0 0

I have also attached some of the plot produced for quanlity check and eer check.

Thanks! plotErrors_errF.pdf plotErrors_errR.pdf plotQualityProfile_cutFs.pdf plotQualityProfile_cutRs.pdf

alexiscarter commented 5 years ago

Hi, For the primers, as Ben says: always remove them. For the read truncation be careful not to cut too short. What is the expected amplicon length with the primers you are using? Ciao Alexis

tedsun commented 5 years ago

Thanks Alexis!

Both of the original reads after miseq sequencing process are 300. If I truncate them to c(280,120), majority of reads will be lost in the merging processes, the I will get around 3000 sequences after chimeras removing process.

If I did not truncate both forward and reverse reads, leaving bot of them to 300 in length, I will get around 7000 sequences after chimeras removing process.

However, I found many of the sequences (after neither truncating or not trunctating) share exact some taxonomy name after taxomony asignment.

I am a newbie of all these analysis and even soil-born fungi diversity, my original idea was to find the potential relationship between soil born fungal and their aboveground tree diversity in a seasonal tropical forest at fine scale (from centimeters to hundred meters.).

Cheers, Ted

benjjneb commented 5 years ago

In general you should not truncate the reads in ITS analysis, because there is usually no effective single truncation length due to the biological length variation in the ITS region.

However, I found many of the sequences (after neither truncating or not trunctating) share exact some taxonomy name after taxomony asignment.

That's fine and to be expected. There are often multiple distinguishable variants within a (e.g.) genus.

tedsun commented 5 years ago

Thanks Ben!

Does this mean that I can move to next step and use the RSVs (without truncting) to do community level analysis? Can I also identify fungal functional groups using FUNGuild based on my result of taxonomy result? Do I need to convert taxa file to other format?

Best, Ted

alexiscarter commented 5 years ago

Hi, Yes you can use the ASV without truncating. Reads will be trimmed anyway in the filterAndTrim function. FUNGuild is great, you can use it after dada2 pipeline, for more info you will need to refer to their page. Ciao Alexis

tedsun commented 5 years ago

Thanks Alexis,

Should I keep the two ASVs which share one exact name (species level) in the community analysis, or merge them into one ASV.

Best, Ted

alexiscarter commented 5 years ago

Hi, This is a tricky question. In one case you would merge two ASVs that have the same taxonomic level but in another case you will not merge ASVs just because you do not have the taxonomic information in your taxonomic database. I guess it depends on the type of analysis you are doing. Good luck. Alexis