Open nbokulich opened 11 years ago
@davidsoergel, could you check into this when you're working on the other rtax-related issues next week?
The 4 GB limit is intrinsic to the 32-bit version of usearch. The main thing driving memory usage is the size of the reference database, not of the query sets. The only alternatives I know of are:
a) pay big bucks for the 64-bit version,
b) somehow limit the size of the reference database (e.g., by using a 97%-clustered reference set instead of 99%, etc.),
c) split up the reference database into shards, classify against each one individually, and collate the results;
d) use a different version of usearch that may require less memory. For instance, using the default parameters, usearch 4.x requires less memory than usearch 5.x.
e) It is possible that usearch 5 can be tuned using command-line options to use less memory, but I haven't explored that in any detail. Tweaking the usearch command line may require hacking in the rtax scripts, though, which means that Qiime developers can't reasonably provide support for your setup after that.
a) out of curiosity, how much?
OK so the message I'm getting is that we really have to find a free alternative to usearch for a range of reasons including this? Chris and Mihai, any comments on current progress on that front? Has anyone re-evaluated cd-hit or bowtie as alternatives recently?
On Aug 6, 2013, at 5:54 PM, davidsoergel notifications@github.com<mailto:notifications@github.com> wrote:
The 4 GB limit is intrinsic to the 32-bit version of usearch. The main thing driving memory usage is the size of the reference database, not of the query sets. The only alternatives I know of are:
a) pay big bucks for the 64-bit version,
b) somehow limit the size of the reference database (e.g., by using a 97%-clustered reference set instead of 99%, etc.),
c) split up the reference database into shards, classify against each one individually, and collate the results;
d) use a different version of usearch that may require less memory. For instance, using the default parameters, usearch 4.x requires less memory than usearch 5.x.
e) It is possible that usearch 5 can be tuned using command-line options to use less memory, but I haven't explored that in any detail. Tweaking the usearch command line may require hacking in the rtax scripts, though, which means that Qiime developers can't reasonably provide support for your setup after that.
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/949#issuecomment-22220721.
Let's talk sometime next week if that works for you.
Mihai
Rob Knight Rob.Knight@Colorado.EDU wrote:
OK so the message I'm getting is that we really have to find a free alternative to usearch for a range of reasons including this? Chris and Mihai, any comments on current progress on that front? Has anyone re-evaluated cd-hit or bowtie as alternatives recently?
On Aug 6, 2013, at 5:54 PM, davidsoergel notifications@github.com wrote:
The 4 GB limit is intrinsic to the 32-bit version of usearch. The main thing driving memory usage is the size of the reference database, not of the query sets. The only alternatives I know of are:
a) pay big bucks for the 64-bit version,
b) somehow limit the size of the reference database (e.g., by using a 97%-clustered reference set instead of 99%, etc.),
c) split up the reference database into shards, classify against each one individually, and collate the results;
d) use a different version of usearch that may require less memory. For instance, using the default parameters, usearch 4.x requires less memory than usearch 5.x.
e) It is possible that usearch 5 can be tuned using command-line options to use less memory, but I haven't explored that in any detail. Tweaking the usearch command line may require hacking in the rtax scripts, though, which means that Qiime developers can't reasonably provide support for your setup after that.
— Reply to this email directly or view it on GitHub.
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
It's been in submission limbo for the past few months: https://github.com/qiime/qiime/pull/706 @gregcaporaso
We should discuss what more is needed from DNACLUST to serve as a free replacement for USEARCH and what more is needed for submission.
Sorry about that, I lost track of this pull request but will review the code this week.
Greg
On Tue, Aug 6, 2013 at 5:31 PM, cmhill notifications@github.com wrote:
It's been in submission limbo for the past few months: #706https://github.com/qiime/qiime/issues/706 @gregcaporaso https://github.com/gregcaporaso
We should discuss what more is needed from DNACLUST to serve as a free replacement for USEARCH and what more is needed for submission.
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/949#issuecomment-22222154 .
Re 64-bit usearch paid licenses: I can't find the price list online anymore, but a copy I have stashed from a couple years ago indicates it's over $4k per CPU per year (!).
It remains an open bug that the RTAX wrapper apparently does not terminate when the underlying process runs out of memory.
Also: while switching to a different search program like DNACLUST may be a good idea for Qiime in general, I'm afraid RTAX is pretty deeply tied to usearch. It should certainly be possible to rework it to function with a different search engine--in fact, depending on the available options, that could well improve performance or make the code cleaner etc. I won't be able to do this myself, but will be happy to consult with anyone who wants to take it on. I should note that there's no special reason why RTAX needs to be in Perl; in the Qiime context, it might make sense to just rewrite it in Python.
To reduce memory usage in the meantime, I just found this suggestion from @jrvalverde:
However, you may find that running RTax against the latest greengenes requires more memory than the 32bit version can handle (I did have that problem). If that is the case, you may want to try using VAMPS databases instead for the classification.
Yes that would be great and I am in town: Ulla could you coordinate?
Rob
On Aug 6, 2013, at 6:25 PM, "mpop@umiacs.umd.edumailto:mpop@umiacs.umd.edu" mpop@umiacs.umd.edu<mailto:mpop@umiacs.umd.edu> wrote:
Let's talk sometime next week if that works for you.
Mihai
Rob Knight Rob.Knight@Colorado.EDU<mailto:Rob.Knight@Colorado.EDU> wrote: OK so the message I'm getting is that we really have to find a free alternative to usearch for a range of reasons including this? Chris and Mihai, any comments on current progress on that front? Has anyone re-evaluated cd-hit or bowtie as alternatives recently?
On Aug 6, 2013, at 5:54 PM, davidsoergel notifications@github.com<mailto:notifications@github.com> wrote:
The 4 GB limit is intrinsic to the 32-bit version of usearch. The main thing driving memory usage is the size of the reference database, not of the query sets. The only alternatives I know of are:
a) pay big bucks for the 64-bit version,
b) somehow limit the size of the reference database (e.g., by using a 97%-clustered reference set instead of 99%, etc.),
c) split up the reference database into shards, classify against each one individually, and collate the results;
d) use a different version of usearch that may require less memory. For instance, using the default parameters, usearch 4.x requires less memory than usearch 5.x.
e) It is possible that usearch 5 can be tuned using command-line options to use less memory, but I haven't explored that in any detail. Tweaking the usearch command line may require hacking in the rtax scripts, though, which means that Qiime developers can't reasonably provide support for your setup after that.
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/949#issuecomment-22220721.
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
I think Greg just noticed so I'm glad this is maybe prompting some action. Thanks and apologies!
On Aug 6, 2013, at 6:32 PM, "cmhill" notifications@github.com<mailto:notifications@github.com> wrote:
It's been in submission limbo for the past few months: #706https://github.com/qiime/qiime/issues/706 @gregcaporasohttps://github.com/gregcaporaso
We should discuss what more is needed from DNACLUST to serve as a free replacement for USEARCH and what more is needed for submission.
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/949#issuecomment-22222154.
Thanks Greg!
On Aug 6, 2013, at 6:54 PM, "Greg Caporaso" notifications@github.com<mailto:notifications@github.com> wrote:
Sorry about that, I lost track of this pull request but will review the code this week.
Greg
On Tue, Aug 6, 2013 at 5:31 PM, cmhill notifications@github.com<mailto:notifications@github.com> wrote:
It's been in submission limbo for the past few months: #706https://github.com/qiime/qiime/issues/706 @gregcaporaso https://github.com/gregcaporaso
We should discuss what more is needed from DNACLUST to serve as a free replacement for USEARCH and what more is needed for submission.
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/949#issuecomment-22222154 .
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/949#issuecomment-22222949.
Ouch.
On Aug 6, 2013, at 7:14 PM, "davidsoergel" notifications@github.com<mailto:notifications@github.com> wrote:
Re 64-bit usearch paid licenses: I can't find the price list online anymore, but a copy I have stashed from a couple years ago indicates it's over $4k per CPU per year (!).
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/949#issuecomment-22223632.
Thanks for your willingness to consult. It will obviously promote adoption to rewrite using free tools: might you be able to spend at least a little time on that (with help)?
On Aug 6, 2013, at 7:24 PM, "davidsoergel" notifications@github.com<mailto:notifications@github.com> wrote:
Also: while switching to a different search program like DNACLUST may be a good idea for Qiime in general, I'm afraid RTAX is pretty deeply tied to usearch. It should certainly be possible to rework it to function with a different search engine--in fact, depending on the available options, that could well improve performance or make the code cleaner etc. I won't be able to do this myself, but will be happy to consult with anyone who wants to take it on. I should note that there's no special reason why RTAX needs to be in Perl; in the Qiime context, it might make sense to just rewrite it in Python.
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/949#issuecomment-22223930.
Yes, I can spend some time discussing with interested parties how RTAX works, for the sake of easing any modifications or a rewrite. I'm sorry I won't have time to write any code, though, especially if you want to go the Python route.
Moving to the 2.0 milestone. Probably useful to keep open so that RTAX can be updated to better handle out-of-memory failures.
RTAX currently appears to have a memory cap at ~4GB and the qiime wrapper runs indefinitely without outputting an error message.
e.g., the following command runs indefinitely without error output:
However, running the equivalent directly in rtax:
outputs the following:
/share/apps/qiime-1.6.0/bin/usearch --quiet --global --iddef 2 --query 2 --db /home/nbokulic/ref_seq_dbs/silva_18S_104/rep_set/silva_104_rep_set.fasta --uc /tmp/78459.1.all.q/7jr8NYguNC/a --id 0.99 --maxaccepts 1000 --maxrejects 128 --nowordcountreject
Out of memory mymalloc(140204), curr 4.15e+09 bytes
/share/apps/qiime-1.6.0/bin/usearch --quiet --global --iddef 2 --query 2 --db /home/nbokulic/ref_seq_dbs/silva_18S_104/rep_set/silva_104_rep_set.fasta --uc /tmp/78459.1.all.q/7jr8NYguNC/a --id 0.99 --maxaccepts 1000 --maxrejects 128 --nowordcountreject
---Fatal error--- Out of memory, mymalloc(140204), curr 4.15e+09 bytes
When I allot more memory to the job on my system, this same error message is output.