ENH: add vsearch's --qsegout option to extract reads

mikerobeson commented 2 years ago

In my initial #138 PR, I had mistakingly branched from #133 somehow. 🤦 So, I reset and tried again.

Anyway, this addresses #136

-Mike

thermokarst commented 2 years ago

hmm, im not sure why conda is picking up such an old vsearch on linux.

mikerobeson commented 2 years ago

Yeah, I'm looking at that now ... 🤔

mikerobeson commented 2 years ago

Yeah, not sure why the Linux version keeps installing vsearch 2.7 instead of 2.21.

thermokarst commented 2 years ago

@mikerobeson - you should coordinate that version pin with @ebolyen - he might want to add that to our shared conda build config.

mikerobeson commented 2 years ago

Thanks @thermokarst! I was just testing if I can force the pin here for a sanity check. But yes, @ebolyen if we make sure that the latest version of vsearch (2.21.1) is pulled in.

mikerobeson commented 1 year ago

A random thought: what should be the expected behaviour if one of my query sequences contains more than one of the references?... In my test it seemed to extract the sequence that happened to be the first in the query (which I personally sort-of expected).

Hi @misialq , great question! Yes. It'll return the first instance it encounters. We could potentially pull more by setting --strand 'both', but the problem is that multiple sequences will be written out with the same exact sequence ID, which will then cause the code to fail as FeatureData[Sequence] expects unique sequence IDs. This is why I am using the --strand 'plus'. Probably something we can improve upon in a later update?

I figure with the plethora of amplicon and genomic sequences available for most common amplicon targets, this might be not be a major issue as we'll capture most things. 🤷

If you're okay with this, go ahead and merge. :-)

misialq commented 1 year ago

Great, thanks for the explanation @mikerobeson! I agree that this does not seem like a major issue and, as you said, we can always follow up later, if necessary. Will merge! ✅

bokulich-lab / RESCRIPt

ENH: add vsearch's --qsegout option to extract reads #140