Closed adityabandla closed 5 years ago
Is there a way to extract these sequences and write them to a fasta file in a sample-specific manner, so that they can be used for annotating eukaryotic sequences?
No automatic way right now. You can retrieve the raw F/R sequences for each row in the merger data.frame though. The $forward
and $reverse
columns have the indexes to the corresponding sequences in the corresponding dada-class object. So the code below will retrieve the unmerged F/R sequences for each row in the merger data.frame:
# Single sample
merger <- mergePairs(ddF, drpF, ddR, drpR, returnRejects=TRUE)
sqF <- ddF$sequence[merger$forward]
sqR <- ddR$sequence[merger$reverse]
ps: This is another use case for adding a new hybrid merge/concatenate option to mergePairs
. See also #565
Thanks Ben for the pointers! By raw sequences, you mean sequences which have not been denoised?
Nope sorry bad terminology on my part. The denoised F/R sequences are what is being extracted there (prior to merging).
I am using the 515F 926R primer which targets all three domains of life. A known challenge with this primer set with regards to amplicon sequencing is that the eukaryotic sequences are ~150bp longer than the prokaryotic ones. So, it's not surprising that these sequences fail to merge.
I understand that unmerged sequences can be retained using the returnRejects=TRUE option in the mergePairs command. Is there a way to extract these sequences and write them to a fasta file in a sample-specific manner, so that they can be used for annotating eukaryotic sequences?