Closed jwdebelius closed 2 years ago
Hi @jwdebelius thanks for this! 🎉
Some initial ideas re: the point you raise about aligned seqs
I agree with you: reverse orientation of aligned sequences is a prominent use case, and we should support. But we do not want to introduce a bug re: VSEARCH compatability.
This might be a case for using TypeMap
— i.e., so that FeatureData[AlignedSequence]
is only accepted as input when reference_sequences=None
. If that works, we will need to make the usage clear in the help documentation.
We could also introduce a separate orientation
parameter (with options same, complement, reverse, or reverse complement), which might be a bit easier to use with TypeMap
, though only if we think these options would be useful.
Another idea that might be cleaner and more functional. Accept aligned seqs as input with an unaligned ref.
No matter which route we take, could you please add unit test(s) for orienting aligned seqs with the various options? I think that the current setup actually will not work with aligned sequences, since the view type is DNAFASTAFormat
... I believe a DNAIterator
is needed instead.
@mikerobeson any thoughts?
For whatever reason I just noticed this. I'll look into it now.
I do not currently have much to @nbokulich's comments. Although @jwdebelius and I were discussing the old filter_alignment.py script from QIIME 1. In particular, the ability to remove poorly aligned sequences using the --remove_outliers
and associated flags. Perhaps we can add a separate action in either the q2-alignment
plugin, or rescript simply called detect-alignment-outliers
?
This would set us up for a couple of downstream use cases: 1) detect the poorly aligned sequences (write the sequences and/or their ids to file), then investigate / filter downstream, using one of the QIIME 2 standard filtering commands 2) use output from 1, to specify which sequences in the FASTA file should be reverse-complemented, leave the others as is.
that sounds very useful! also a very different use case, i.e., for filtering alignments on quality (which sounds like a method for q2-alignment). Sounds like the q1 method also masked the alignments? So a very different beast (and one that is partially done now in q2-alignment?). Either way, filtering alignments sounds like very different functionality from read orientation.
So I propose that
@jwdebelius how would you like to proceed? See my note above about your current PR most likely not working for aligned seqs... I propose two options:
I am happy with either path 😁
Hi @nbokulich,
Sorry, things have been umm... busy the last 3 months. I'm a fan of the second option. I'll clean up the PR, and hopefully you can check it and it can be merged.
hi @jwdebelius, thanks for updating this PR! Please let me know when this is ready to review
Thanks @nbokulich, I think it should be good to go!
I hope this addresses your concerns @nbokulich! There are a couple of things I wasn't sure how to address directly
I think this should address your documentation issues, @nbokulich.
As discussed in #119.
Probably worth mentioning that I see some utility in keeping reverse complemented for aligned sequences, which is whats currently implemented, but I haven't explored the implications for orienting to a reference where it probably does break vserach, and might be a motivation for a separate method.