bokulich-lab / RESCRIPt

REference Sequence annotation and CuRatIon Pipeline
BSD 3-Clause "New" or "Revised" License
84 stars 26 forks source link

MAINT: update `orient-seqs` to use vsearch `orient` #149

Closed mirand863 closed 1 year ago

mirand863 commented 1 year ago

This PR replaces vsearch --usearch_global with vsearch --orient, which is more efficient (closes #146). Additionally, it removes the deprecation warning inserted in version 2023.2 (closes #151).

Some parameters were removed since they are not supported with --orient:

However, optional parametes available with --orient were added:

mirand863 commented 1 year ago

@nbokulich, I opened an issue on vsearch and was informed that the reason these parameters are missing is because usearch also does not have them. However, I believe some parameters are irrelevant when using orient (--maxaccepts, --strand and --userfields).

The other parameters can be computed if an alignment is provided. Do you think it would be worth adding alignment information or would it be fine to remove --id, --query_cov and --leftjust? I also looked into the tabbed output provided by orient, but it does not have enough information to be able to replace these parameters.

The only tests failing are the exact match (only matches with identity 1.0 allowed) and left justify (no gaps in the beginning of the alignment)

mirand863 commented 1 year ago

thanks @mirand863 ! Looks good to me, I have only one comment in-line. @misialq will also review/test locally.

I think that we should drop these parameters if they are not used by vsearch orient. I agree that they are probably not needed by users (although there could be some use for id and query_cov in theory). @mikerobeson do you agree with this?

However, if we make breaking changes we should go through a short deprecation cycle, so I am tagging this as do-not-merge for now. Once approved, we can merge in a future release of rescript.

A deprecation cycle is a great idea.

I would also like to suggest using semver for versioning instead of the release date. With semver it becomes easier to warn users that a new version is incompatible with the previous one by making a major update. Moreover, tagging a new version can be automated with github actions, where the bumping can be done according to pull requests' titles, e.g., by adding the tags #major or #minor to the title. Lastly, using semver would make it easier to upload the package to the python package index and then to bioconda.

mirand863 commented 1 year ago

@nbokulich,

Added a new PR #150 to warn users of a future deprecation of the parameters discussed here.

mirand863 commented 1 year ago

Hey @mirand863, I gave it a go and all works as expected, thanks! I have two more cosmetic comments though:

* Could you adjust the action description in the `plugin_setup.py`?  I think the last section becomes irrelevant after dropping the parameters.

* I would suggest just removing the tests that are not being used. They are addressing functionality that was removed and most likely will not be brought back so no need to keep them.

@misialq, Thank you for the great suggestions! I removed the tests and edited the description in plugin_setup.py to remove old parameters and add the new ones.