bokulich-lab / RESCRIPt

REference Sequence annotation and CuRatIon Pipeline
BSD 3-Clause "New" or "Revised" License
89 stars 26 forks source link

ENH: extract sequence segments with vsearch's `--qsegout` option #138

Closed mikerobeson closed 2 years ago

mikerobeson commented 2 years ago

Resolves #136.

Adds the action extract-seq-segments. This will take a query set of "full-length" sequences and query them against a reference set of target sequence segments (e.g. amplicon sequences). Any query sequence with a matching segment will have that region extracted and written to file. Any query sequences that do not match are written to a separate file.

This approach will allow us to extract target / amplicon regions from online sequence repositories in which one or more of the PCR primers are not present or have been removed prior to that reference sequence being deposited. That is, using primers to search for and extract the target region will fail.

Warning: the PR requires vsearch 2.21 or later to run

To do: [] Upload a tutorial to QIIME 2 forum to outline a typical use case.

mikerobeson commented 2 years ago

It appears that vsearch 2.21 is in the QIIME 2 dev branch now.

thermokarst commented 2 years ago

x ref: https://github.com/bokulich-lab/RESCRIPt/pull/139