Allow extended prior sequences (e.g. full-length 16S)

benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution

http://benjjneb.github.io/dada2/

GNU Lesser General Public License v3.0

459 stars 142 forks source link

Allow extended prior sequences (e.g. full-length 16S) #496

Open AlexandreThibodeauUdM opened 6 years ago

AlexandreThibodeauUdM commented 6 years ago

I am extremely interested in using priors.

What is the format of this file?

fasta, amplicon style (for example, V4 region only) or fasta full 16S read?

benjjneb commented 6 years ago

priors are provided as a character vector of sequences. They can come from a file, but should be read into a character vector before being passed to the dada function.

Right now, the priors have to be "amplicon style", i.e. the same sequence fragment that would be found in the reads. (In the future, I'd like to relax that requirement to allow exact matching to longer priors like full-length 16S sequences).

marcomeola commented 5 years ago

Any idea in which update of dada2 the exact matching to longer priors like full-length 16S sequences could be available?

benjjneb commented 5 years ago

Any idea in which update of dada2 the exact matching to longer priors like full-length 16S sequences could be available?

No, but step one would be to have an open enhancement request on this issue. I'll convert this thread to that.