bokulich-lab / RESCRIPt

REference Sequence annotation and CuRatIon Pipeline
BSD 3-Clause "New" or "Revised" License
85 stars 26 forks source link

ENH: add vsearch's `-qsegout` option to RESCRIPt #136

Closed mikerobeson closed 1 year ago

mikerobeson commented 2 years ago

We've been discussing internally, for quite a while now, to be able to extract target / amplicon regions from online sequence repositories in which one or more of the PCR primers are not present or have been removed prior to that reference sequence being deposited. That is, using primers to search for and extract the target region will fail. Or better yet, to be able to extract target regions from complete genomes. Anyway, I just realized I never made an issue for this, so here it is. :-)

This, in the past, was eloquently solved by the -qsegout (and perhaps -tsegout?) command of usearch. That is, it can use actual amplicon data, or a pre-generated list of sequences, to use for searching and extracting the target amplicons from a larger reference database (i.e. GenBank, BOLD, etc...). This is the exact approach I used to make reference databases for my pig diet paper, and have been wanting to implement this approach ever since we started development of RESCRIPt. Now that this has been implemented in vsearch. I think we can start planning to include this functionality.

Although I think we may be limited to implementing the vsearch version which now includes the -qsegout option, as vsearch is currently pinned within QIIME 2 due to another plugin dependency? Hopefully, that'll be resolved soon.

mikerobeson commented 2 years ago

Just a short little update... I installed the latest version of vsearch (2.21.1) into my QIIME 2 test environment. I was able to run core-diversity-metrics and deblur denoise-16S w/o issue. Thus, on the surface, it appears that UniFrac and deblur may no longer be the pinning culprits? I can try running the other deblur actions, etc... Are there are plugins / actions I should look into? Let me know.

thermokarst commented 2 years ago

Hey @mikerobeson - I don't think deblur specifically pinned vsearch, but rather just didn't work with newer version of vsearch. A new release of deblur just landed, we're working on pulling it into the core distro, once that's set I hope we'll be able to unpin vsearch.

mikerobeson commented 2 years ago

Thanks for the update @thermokarst!

If anyone is interested I am working on the relevant code for this here. I still need to work on testing and test code etc... When I get further along I'll submit as a PR Draft.