Closed mikerobeson closed 1 year ago
hmm, im not sure why conda is picking up such an old vsearch on linux.
Yeah, I'm looking at that now ... 🤔
Yeah, not sure why the Linux version keeps installing vsearch 2.7 instead of 2.21.
@mikerobeson - you should coordinate that version pin with @ebolyen - he might want to add that to our shared conda build config.
Thanks @thermokarst! I was just testing if I can force the pin here for a sanity check. But yes, @ebolyen if we make sure that the latest version of vsearch (2.21.1) is pulled in.
A random thought: what should be the expected behaviour if one of my query sequences contains more than one of the references?... In my test it seemed to extract the sequence that happened to be the first in the query (which I personally sort-of expected).
Hi @misialq , great question! Yes. It'll return the first instance it encounters. We could potentially pull more by setting --strand 'both'
, but the problem is that multiple sequences will be written out with the same exact sequence ID, which will then cause the code to fail as FeatureData[Sequence]
expects unique sequence IDs. This is why I am using the --strand 'plus'
. Probably something we can improve upon in a later update?
I figure with the plethora of amplicon and genomic sequences available for most common amplicon targets, this might be not be a major issue as we'll capture most things. 🤷
If you're okay with this, go ahead and merge. :-)
Great, thanks for the explanation @mikerobeson! I agree that this does not seem like a major issue and, as you said, we can always follow up later, if necessary. Will merge! ✅
In my initial #138 PR, I had mistakingly branched from #133 somehow. 🤦 So, I reset and tried again.
Anyway, this addresses #136
-Mike