Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads
GNU General Public License v3.0
352 stars 52 forks source link

time not found #57

Closed 000generic closed 4 years ago

000generic commented 4 years ago

I'm trying to assemble an ~2.2 Gb genome with 50-60x ONT coverage using NextDenovo on a machine with capacity for 48 threads and 1 Tb RAM. My configuration settings are:

[General] job_type = local job_prefix = nextDenovo task = all # 'all', 'correct', 'assemble' rewrite = yes # yes/no deltmp = yes rerun = 3 parallel_jobs = 40 input_type = raw input_fofn = input.fofn workdir = 1_rundir

[correct_option] read_cutoff = 0.5k seed_cutoff = 7974 blocksize = 2g pa_correction = 20 seed_cutfiles = 20 sort_options = -m 20g -t 10 -k 40 minimap2_options_raw = -x ava-ont -t 8 correction_options = -p 20

[assemble_option] random_round = 20 minimap2_options_cns = -x ava-ont -t 8 -k17 -w17 nextgraph_options = -a 1

when I run things I get:

(base) ::nextdenovo: bash 02-nextdenovo [INFO] 2020-03-29 16:02:27,359 start... [INFO] 2020-03-29 16:02:27,359 logfile: pid148033.log.info [WARNING] 2020-03-29 16:02:27,359 Re-write workdir [INFO] 2020-03-29 16:02:27,359 options: [INFO] 2020-03-29 16:02:27,359 {'sort_threads': 10, 'nodelist': '', 'rewrite': 1, 'blocksize': '2g', 'job_prefix': 'nextDenovo', 'job_type': 'local', 'minimap2_options_raw': '-x ava-ont -t 8', 'cns_threads': 20, 'sort_mem': '20g', 'seed_cutoff': '7974', 'input_fofn': '/data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/input.fofn', 'read_cutoff': '0.5k', 'input_type': 'raw', 'sort_options': '-m 20g -t 10 -k 40', 'parallel_jobs': '40', 'cluster_options': '', 'sge_queue': '', 'ctg_graphdir': '/data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir/03.ctg_graph', 'pa_correction': '20', 'workdir': '/data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir', 'random_round': '20', 'minimap2_threads': (8, 8), 'minimap2_options_cns': '-x ava-ont -t 8 -k17 -w17', 'cns_aligndir': '/data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir/02.cns_align', 'seed_cutfiles': '20', 'raw_aligndir': '/data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir/01.raw_align', 'task': 'all', 'deltmp': 1, 'rerun': 3, 'correction_options': '-p 20 -max_lq_length 10000', 'nextgraph_options': '-a 1'} [INFO] 2020-03-29 16:02:27,360 mkdir: /data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir [INFO] 2020-03-29 16:02:27,360 mkdir: /data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir/01.raw_align [INFO] 2020-03-29 16:02:27,360 mkdir: /data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir/02.cns_align [INFO] 2020-03-29 16:02:27,360 mkdir: /data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir/03.ctg_graph [INFO] 2020-03-29 16:02:27,360 analysis tasks done [INFO] 2020-03-29 16:02:27,360 total jobs: 1 [INFO] 2020-03-29 16:02:27,361 Throw jobID:[148034] jobCmd:[/data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir/01.raw_align/01.db_split.sh.work/db_split0/nextDenovo.sh] in the local_cycle. [ERROR] 2020-03-29 16:02:27,884 db_split failed: please check the following logs: [ERROR] 2020-03-29 16:02:27,884 /data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir/01.raw_align/01.db_split.sh.work/db_split0/nextDenovo.sh.e

and when I check the file I get:

(base) ::nextdenovo: less /data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir/01.raw_align/01.db_split.sh.work/db_split0/nextDenovo.sh.e hostname

Any ideas why this error would be happening? Thank you!

moold commented 4 years ago

It seems your system does not include time command, you can use alias time='/usr/bin/time' if your system includes /usr/bin/time, or you can delete the time command in each work shell.

000generic commented 4 years ago

I added the alias alias time='/usr/bin/time'

Running time now produces:

time Usage: /usr/bin/time [-apvV] [-f format] [-o file] [--append] [--verbose] [--portability] [--format=format] [--output=file] [--version] [--quiet] [--help] command [arg...]

Previously this had produced:

time real 0m0.000s user 0m0.000s sys 0m0.000s

but when I now run nextDenovo, I'm getting the same sort of error:

'[INFO] 2020-04-03 13:57:06,433 start... [INFO] 2020-04-03 13:57:06,433 logfile: pid101715.log.info [WARNING] 2020-04-03 13:57:06,433 Re-write workdir [INFO] 2020-04-03 13:57:06,433 options: [INFO] 2020-04-03 13:57:06,433 {'sort_threads': 10, 'nodelist': '', 'rewrite': 1, 'blocksize': '2g', 'job_prefix': 'nextDenovo', 'job_type': 'local', 'minimap2_options_raw': '-x ava-ont -t 8', 'cns_threads': 20, 'sort_mem': '20g', 'seed_cutoff': '7974', 'input_fofn': '/data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/../reads/ont-paradoxus.fasta', 'read_cutoff': '0.5k', 'input_type': 'raw', 'sort_options': '-m 20g -t 10 -k 40', 'parallel_jobs': '60', 'cluster_options': '', 'sge_queue': '', 'ctg_graphdir': '/data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir/03.ctg_graph', 'pa_correction': '20', 'workdir': '/data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir', 'random_round': '20', 'minimap2_threads': (8, 8), 'minimap2_options_cns': '-x ava-ont -t 8 -k17 -w17', 'cns_aligndir': '/data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir/02.cns_align', 'seed_cutfiles': '20', 'raw_aligndir': '/data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir/01.raw_align', 'task': 'all', 'deltmp': 1, 'rerun': 3, 'correction_options': '-p 20 -max_lq_length 10000', 'nextgraph_options': '-a 1'} [INFO] 2020-04-03 13:57:06,433 mkdir: /data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir [INFO] 2020-04-03 13:57:06,433 mkdir: /data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir/01.raw_align [INFO] 2020-04-03 13:57:06,434 mkdir: /data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir/02.cns_align [INFO] 2020-04-03 13:57:06,434 mkdir: /data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir/03.ctg_graph [INFO] 2020-04-03 13:57:06,434 analysis tasks done [INFO] 2020-04-03 13:57:06,434 total jobs: 1 [INFO] 2020-04-03 13:57:06,435 Throw jobID:[101716] jobCmd:[/data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir/01.raw_align/01.db_split.sh.work/db_split0/nextDenovo.sh] in the local_cycle. [ERROR] 2020-04-03 13:57:06,963 db_split failed: please check the following logs: [ERROR] 2020-04-03 13:57:06,964 /data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir/01.raw_align/01.db_split.sh.work/db_split0/nextDenovo.sh.e

and when I check the indicated log file I get:

less /data/eedsinger/projects/genomes/idiosepius-paradoxus/assembly/nextdenovo/1_rundir/01.raw_align/01.db_split.sh.work/db_split0/nextDenovo.sh.e hostname

So it seems like a time value is now showing up - but its a strange one - and the expected file (?) still does not exist.

If necessary, how would I delete the time command in each work shell?

moold commented 4 years ago

The time command work correctly, but your input file is incorrect, pls ref here to prepare your input file (input.fofn). BTW, you can delete the time conmand at line 138 in the file lib/task_control.py.