biocore / qiime

Official QIIME 1 software repository. QIIME 2 (https://qiime2.org) has succeeded QIIME 1 as of January 2018.
GNU General Public License v2.0
285 stars 268 forks source link

Would like support for vsearch (open source) #1962

Open alk224 opened 9 years ago

alk224 commented 9 years ago

usearch is frequently used for chimera checking, but due to the RAM limitation, only small data sets can be assessed, or else the data needs to be broken up into subsearches (my current approach). vsearch uses the same published algorithm but with no limitations as it is open source. I have compared the chimera filtering for a couple of small data sets and get identical results with either the qiime wrapper through identify_chimeric_seqs.py or vsearch run default. Check it out at the link below. Super useful.

https://github.com/torognes/vsearch

gregcaporaso commented 9 years ago

Thanks @alk224, good to know about your comparisons. We're also interested in vsearch - it's something that we could end up supporting in QIIME 2.

gregcaporaso commented 9 years ago

Just a crazy thought since they say it's intended to be a drop-in replacement for usearch 7. If the command line interface for usearch 6 and 7 are the same (which I realize they probably aren't, but I can dream...) vsearch should just work with QIIME if you created a symbolic link to vsearch named usearch6 (i.e., ln -s /usr/local/bin/vsearch /usr/local/bin/usearch6).

Note this is completely untested, just mentioning this in case someone wants to do an experiment.

alk224 commented 9 years ago

The symlink idea crossed my mind, but in looking at the log from identify_chimeric_seqs.py, I wasn't sure if all the options would translate. Doesn't mean it won't work, but I also think if anyone tries this they need the symlink to be called usearch61, and probably also means removing the usearch61 binary from its existing location to avoid conflict (something else I wasn't ready to do).

Anyway, I just ran vsearch with defaults and the command was pretty simple.

colinbrislawn commented 9 years ago

@gregcaporaso

vsearch should just work with QIIME if you created a symbolic link

That worked for all my automated usearch scripts. It should work for qiime.

I think your idea of incorporating vsearch into QIIME 2 is a better idea then 'dropping it in'. As an example, identify_chimeric_seqs.py outputs a list of chimeric sequence IDs, which are then filtered from the input fasta file. Currently, both usearch and vsearch can directly output the chimera checked fasta file. QIIME 2 can take advantage of these more elegant workflows.

Also, vsearch can be used to assign taxonomy, perhaps to replace the uclust taxonomy assigner. torognes/vsearch#73

colinbrislawn commented 9 years ago

@alk224

I wasn't sure if all the options would translate.

Better then translate! vsearch implements most of usearch 7 which is really elegant. Here is ref-based chimera checking.


vsearch --uchime_ref centroids.mc2.fasta \
-db rdp_gold.fa \
--nonchimeras centroids.nochimeras.mc2.fasta
    ### 1 min, ram = almost none

I'm really excited by the software @torognes and @frederic-mahe are developing.

frederic-mahe commented 9 years ago

Thanks for your support @colinbrislawn! We would be very happy to see vsearch embedded in QIIME 2.0.

gregcaporaso commented 9 years ago

@frederic-mahe, the GPL makes it hard for us to "embed" it, as QIIME 2 will be licensed under BSD. Any chance that you guys would consider a BSD-compatible license for vsearch? Or some type of dual licensing approach (which I'm not very familiar with)?

gregcaporaso commented 9 years ago

We are very interested in vsearch though, thanks for your work on it!

torognes commented 9 years ago

Exactly which variant of the BSD license will be used for QIIME 2?

gregcaporaso commented 9 years ago

We'll most likely use the 3-clause, exactly the same as we do for scikit-bio (that license is here). Here's a good discussion that summarizes some of the ideas that led us to switch from GPL.

frederic-mahe commented 9 years ago

Hi, the document @gregcaporaso is pointing to is a bit misleading. It promotes that false idea that the GPL is viral (which sounds frightening), while pointing to an article in limuxinsider saying exactly the opposite:

"Contrary to what some operatives might want us to think, the GPL and other open-source licenses do not have some special magic feature that allows them to infect other software. They have license limitations that apply to derivative works, just like any other copyright license."

The argument that private companies avoid working with code under GPL derives from that assumption, and is fortunately not true: for example, some large private companies (e.g., Intel, Google) are among the major contributors to the Linux kernel (GPL v2).

That being said, what we can do is to use a dual licensing as suggested before.

colinbrislawn commented 9 years ago

This is a complicated issue with a long history.

[GPL software] has license limitations that apply to derivative works, just like any other copyright license.

Specifically, the (A)GPL places limits on how you can 'convey' software: 'convey' meaning "any kind of propagation that enables other parties to make or receive copies" link

USEARCH has a strong, closed source license, which strictly prevents the the QIIME devs from bundling it with qiime. "Licensee will not allow copies of the Software to be made or used by others." link

Although the AGPL is open source, it also places strong limits on distribution: "If you cannot convey a covered work so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not convey it at all." This is the GNU liberty-or-death clause.

So, can we bundle ('convey') Swarm as a part of QIIME while maintaining both their licences?

If not, maybe we could consider a qiime-specific dual license, the same way UCLUST does: http://www.drive5.com/uclust/downloads1_2_22q.html

colinbrislawn commented 9 years ago

Hey, @gregcaporaso, VSEARCH is now licensed under AGPL and 3-clause BSD license, solving this issue. https://github.com/torognes/vsearch/issues/105

Swarm remains under the GPLv3.

gregcaporaso commented 9 years ago

So awesome, thanks for the pointer!

outpaddling commented 7 years ago

Sorry to wake a sleeping dog, but I've discovered a strong interest in using vsearch with qiime as well. I manage research computing resources for a university and the usearch license prevents me from doing a fully-functional, global qiime installation for our researchers.

It's great to hear that you're considering support for vsearch in qiime 2.

In the meantime, might it be possible to create wrapper scripts called "usearch" and "usearch61" that accept all the usearch arguments used by qiime, massage them if necessary, and pass them on to vsearch?

Thanks,

Jason
colinbrislawn commented 7 years ago

Hello Jason,

Some folks on the qiime forum have had luck making an alias called usearch61 that points to the vsearch program. Once the qiime script calls usearch61, the alias should redirect it and run vsearch instead.

Let us know how this workaround works for you!

PS I'm looking forward to qiime2 + vsearch.

outpaddling commented 7 years ago

That seems to work for some usearch61 commands, but not others (see output below). If I link vsearch to "usearch", it reports that --global is not a recognized flag. all_tests.py reports errors unless it finds both usearch and usearch61 in PATH. Are they both actually necessary, or is the all_tests.py misleading me here?

I'm wondering if it's feasible in theory for wrapper scripts to use vsearch to simulate the behavior of usearch 5.2.236 and usearch 6.1, by massaging the arguments as I suggested earlier (e.g. translating or discarding the --global flag). I would assume that vsearch has all the features of these older usearch versions, but I'm not a bioinformatician, so this would be hard for me to determine. If I at least know that this is feasible, I can certainly help with the scripting.

I'm working on the pkgsrc packages needed to provide a complete, ready-to-run qiime installation via "pkg_add py27-qiime" on CentOS, OS X, NetBSD, and just about any other POSIX platforms. It's not far from completion and usearch appears to be the only piece I can't install automatically.

Thanks.

test_identify_chimeras_usearch61_intersection (main.Usearch61Tests) test_identify_chimeras_usearch61_union (main.Usearch61Tests) test_usearch61_chimera_check (main.Usearch61Tests) Overall usearch61 functionality test with default params ... ok test_usearch61_chimera_check_no_denovo (main.Usearch61Tests) Overall usearch61 functionality test, no denovo detection ... ok test_usearch61_chimera_check_no_ref (main.Usearch61Tests) Overall usearch61 functionality test, no ref detection ... ok test_usearch61_chimera_check_split_on_id (main.Usearch61Tests) Overall usearch61 functionality test, splits on SampleID ... ok test_usearch61_length_sorting (main.Usearch610DeNovoOtuPickerTests) test_usearch61_params (main.Usearch610DeNovoOtuPickerTests) usearch61 handles changes to other parameters ... ERROR test_usearch61_sizeorder (main.Usearch610DeNovoOtuPickerTests) test_call_open_reference_usearch61 (main.Usearch61ReferenceOtuPickerTests) usearch61 does open reference OTU picking successfully ... ok test_call_open_reference_with_match_usearch61 (main.Usearch61ReferenceOtuPickerTests) usearch61 does open reference OTU picking successfully ... ok test_closed_reference_usearch61 (main.Usearch61ReferenceOtuPickerTests) usearch61 does closed reference OTU picking successfully ... FAIL test_closed_reference_with_match_usearch61 (main.Usearch61ReferenceOtuPickerTests) usearch61 does closed reference OTU picking successfully ... FAIL test_usearch61_length_sorting (main.Usearch61ReferenceOtuPickerTests) test_usearch61_params (main.Usearch61ReferenceOtuPickerTests) usearch61 handles changes to other parameters ... ERROR test_usearch61_sizeorder (main.Usearch61ReferenceOtuPickerTests) File "/usr/local/lib/python2.7/site-packages/bfillings/usearch.py", line 1969, in usearch61_denovo_cluster ApplicationError: Error running usearch61. Possible causes are unsupported version (current supported version is usearch v6.1.544) is installed or improperly formatted input file was provided ERROR: test_usearch61_params (main.Usearch610DeNovoOtuPickerTests) usearch61 handles changes to other parameters File "/usr/home/bacon/qiime-tests/test_pick_otus.py", line 1417, in test_usearch61_params File "/usr/local/lib/python2.7/site-packages/bfillings/usearch.py", line 1969, in usearch61_denovo_cluster ApplicationError: Error running usearch61. Possible causes are unsupported version (current supported version is usearch v6.1.544) is installed or improperly formatted input file was provided ERROR: test_usearch61_params (main.Usearch61ReferenceOtuPickerTests) usearch61 handles changes to other parameters File "/usr/home/bacon/qiime-tests/test_pick_otus.py", line 1771, in test_usearch61_params File "/usr/local/lib/python2.7/site-packages/bfillings/usearch.py", line 1844, in usearch61_ref_cluster raise ApplicationError('Error running usearch61. Possible causes are ' ApplicationError: Error running usearch61. Possible causes are unsupported version (current supported version is usearch v6.1.544) is installed or improperly formatted input file was provided FAIL: test_usearch61_sizeorder (main.Usearch610DeNovoOtuPickerTests) File "/usr/home/bacon/qiime-tests/test_pick_otus.py", line 1462, in test_usearch61_sizeorder FAIL: test_closed_reference_usearch61 (main.Usearch61ReferenceOtuPickerTests) usearch61 does closed reference OTU picking successfully File "/usr/home/bacon/qiime-tests/test_pick_otus.py", line 1860, in test_closed_reference_usearch61 FAIL: test_closed_reference_with_match_usearch61 (main.Usearch61ReferenceOtuPickerTests) usearch61 does closed reference OTU picking successfully File "/usr/home/bacon/qiime-tests/test_pick_otus.py", line 1885, in test_closed_reference_with_match_usearch61

colinbrislawn commented 7 years ago

Neither usearch or usearch61 are needed to run qiime. An older version of the program, called uclust, is bundled with qiime and will run all default scripts. I would consider usearch an optional dependency.

Changing the flags could work, but not perfectly. While qiime likes usearch 5 and usearch 6, vsearch is designed to implement the function calls of usearch 7, which are a little different.

Given that this is an optional dependency, can you tell me more about your goals for including it or about the team that asked for it?

outpaddling commented 7 years ago

Thanks for the feedback. Our researchers have not yet started testing my qiime build, but they will soon. I'm not sure what features they will need, and they probably aren't either at this point.

The goal is not just to meet the needs of a few local researchers, though. In creating a package for global use, the aim is to maximize available functionality so that it meets the needs of as many people as possible. This also means fewer support calls down the road for me when one of our researchers decides they want to try another feature.

Right now I'm just going through the all_tests.py output and trying to fix as many issues as possible, at least the low-hanging fruit. Most have been easy: add an R-cran dependency, set BLASTMAT, RDP_JAR_PATH, etc. I realize that some errors, like "cannot find qsub" will have to be ignored, since an HPC scheduler is too heavy a dependency for a package like this.

I was aware that vsearch currently aims to implement the usearch 7 interface. I'm just wondering if it would be easy enough to translate usearch 5 and 6 commands into usearch 7 / vsearch commands, at least for the subset of functions used by qiime. If so, I could easily incorporate wrappers into my package so that qiime's usearch functionality would work out-of-the box.

colinbrislawn commented 7 years ago

The goal is not just to meet the needs of a few local researchers, though. In creating a package for global use, the aim is to maximize available functionality so that it meets the needs of as many people as possible. This also means fewer support calls down the road for me when one of our researchers decides they want to try another feature.

I understand need for a single, stable platform for many users. I wonder if building a module or package for your system is the best way to do that.

Have you considered having researchers install their own copy of qiime in a conda environment? This installation does not need admin rights and works independently of other software environments. This independent system works really well for me and is the recommended way to install qiime.

This might be a generational paradigm of dev ops; central server vs distributed systems. SVN vs git. Of course I like the distributed systems, so I want to mention it here, but a centralized solution may be the right fit for you.

i'm just wondering if it would be easy enough to translate usearch 5 and 6 commands into usearch 7 / vsearch commands.

It's possible, but I worry it may not be clear to researchers. The qiime scripts and log files will show usearch commands. How will your scripts make this change clear?

outpaddling commented 7 years ago

Having researchers manage their own computers and/or install their own software is something we're trying to eliminate in all but a few cases.

That said, pkgsrc does not require admin access either. A pkgsrc tree can be bootstrapped in one's home directory in about 10 minutes, so it works for both centralized and distributed models. One can also install multiple trees on the same machine to preserve older tool chains for long-term studies, while installing newer ones as needed. It currently has about 17,000 packages and is growing fast, so most of the dependencies I need for a new package are usually already there. http://pkgsrc.se/statistics.php

Interesting point about log files that I hadn't thought of. I suppose the wrapper scripts could announce themselves as "usearch emulators" in the log each time they're run. I'm an IT guy, not a qiime user, so I'd have to play with to see exactly what your concern is and how to best remedy it.

torognes commented 7 years ago

@outpaddling, if there are small changes I can do to vsearch to increase compatibility with older usearch versions and help you adapt it for use with qiime, please tell me.

outpaddling commented 7 years ago

Fabulous, thanks! I will work with our bioinformatician on this and let you know if we need any assistance.

outpaddling commented 7 years ago

For starters, here are the usearch commands I'm seeing in the qiime all_tests.py output:

This one is generated by rtax:

usearch --quiet --global --iddef 2 --query 1 --db /tmp/RtaxTaxonAssignerTests_lCaJLA.fasta --uc /tmp/MQuAfRVQD5/b --id 0.98 --maxaccepts 1000 --maxrejects 128 --nowordcountreject

Not sure where this one is coming from:

usearch --maxaccepts 1 --id 0.75 --queryalnfract 0.35 --blast6out "/tmp/qiime_parallel_tests_sAv8WQ/out.bl6" --uc "/tmp/qiime_parallel_tests_sAv8WQ/out.uc" --maxrejects 8 --query "/tmp/qiime_parallel_tests_sAv8WQ/qiime_inseqsXOmy27.fasta" --evalue 1e-10 --db "/tmp/qiime_parallel_tests_sAv8WQ/qiime_refseqshq_t1g.fasta" --targetalnfract 0.0 > "/tmp/tmpottxm3xvVoIB7BPqGilN.txt" 2> "/tmp/tmpxhkikCPckWS4qEp3rCus.txt"

There are a number of unsupported command-line flags, such as --global, --nowordcountreject, and --queryalnfract.

Can these be easily supported by vsearch?

Thanks,

Jason
torognes commented 7 years ago

The --query, --evalue and --targetalnfract options are also not recognized by vsearch. Do you know which version of usearch this is supposed to be? usearch 5.2.236 or usearch 6.1 or usearch 7?

I have created an issue in vsearch were we can continue the discussion regarding changes to vsearch: torognes/vsearch#229

outpaddling commented 7 years ago

These "usearch" commands would be usearch 5.2.236. I have also installed usearch 6.1 with the executable name "usearch61", per qiime instructions.

Note that this is not a high priority need for us. Just a "nice to have if it's easy enough" feature. Does anyone have a notion of how long people will still be running qiime 1.x? If you think your time is better spent integrating vsearch directly into qiime 2, you'll have my blessing.

gregcaporaso commented 7 years ago

@outpaddling, it is probably worth focusing on adding relevant support in QIIME 2, rather than QIIME 1. We're ramping up the available functionality in QIIME 2, and plan to stop supporting QIIME 1 in January 2018 in favor of support QIIME 2 only.

outpaddling commented 7 years ago

Does this mean that QIIME 2 will support vsearch as an alternative to usearch? If so, you can count on us for testing and feedback. Thanks.

gregcaporaso commented 7 years ago

@outpaddling, QIIME 2 is plugin-based, so it can support anything that someone writes a plugin for. Unlike QIIME 1, we don't need to integrate or approve these types of methods for them to be integrated into a QIIME deployment since anyone can write and distribute plugins.

That said, it's very likely that we'll have vsearch-based clustering in a plugin before too long. We will have two plugins that will use vsearch in the next release of QIIME 2 - q2-deblur uses it as part of its denoising workflow, and q2-feature-classifier uses it for doing taxonomy assignment. I started experimenting with a vsearch-based clustering plugin, but it hasn't been my top priority, so it's not ready for release yet.

If you have developers on your team who would be interested in writing a vsearch QIIME 2 plugin, that would move things along more quickly. They are pretty straightforward to write, and you would be more than welcome to pick up where I left off on my vsearch-based clustering plugin. Feel free to get in touch by email if you're interested.

outpaddling commented 7 years ago

I'm discussing this two colleagues in bioinformatics who are highly qualified to help with coding and testing. It will just be a question of whether QIIME 2 becomes a priority for them in the near future. In any case, I can develop packages for your existing plugins to make them more visible to those using our package managers. Thanks for the detailed info!

gregcaporaso commented 7 years ago

Thanks @outpaddling, let me know if your group has any questions. We use Slack (https://slack.qiime2.org) for developer communication, so feel free to join if you guys want to get interested in development and have questions, or if you have questions about getting involved in development.