Closed bernt-matthias closed 1 year ago
Ping @abretaud @jj-umn and @bgruening: what do you think about splitting the tool?
@bernt-matthias do you have performance problems?
I guess two tools make technical sense but is this really a problem for our users? How much in UX do we decrease if we split it up? For hard-code people, we can now run this tool twice correct? I guess this is enough to address both use-cases.
One of my users has a FASTA with 12,000,000 sequences. The search phase runs several days (with good CPU usage on 10cores) and the annotation phase runs more than a week using only a few percent of the 10 cores (currently my max run time). Might be that the new --scratch_dir
and --temp_dir
parameters help. I'm happy to test first.
As described here https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.5-to-v2.1.8#Setting_up_large_annotation_jobs
Wondering if I should split this into two tools (maybe also keeping the 'monolithic' one). Advantage would be that admins can set different destinations for the CPU intense search and IO intense annotation stage.
Also adds the
cache
mode. And makes the tests more specific by using regexes instead ofsim_size