Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads
GNU General Public License v3.0
352 stars 52 forks source link

self.drmaa.exit() stuck #36

Closed lskfs closed 4 years ago

lskfs commented 4 years ago

jobs have been finished normally but the program was stuck at the stage of self.drmaa.exit() in script task_control.py.

how can I solve this?

moold commented 4 years ago

Simply, try again. If the error still happen, pls provide more details and logs?

lskfs commented 4 years ago

I have tried several times but it happened in every test. Here is the situation:

I ran NextDenovo2.1-beta.0 with my data in sge mode, it processed normally before the first self.drmaa.exit(), file "01.raw_align/01.db_split.sh.work/db_split0/nextDenovo.sh.done" have been generated, but no "01.raw_align/01.db_split.sh.done". And the program kept sleeping.

I edited the script "task_control.py" as below (add several log.info) to track the situation:

...
230         else:
231             while (1):
232                 if self._check_running:
233                     log.info('Where am I 2') # to tack the stuck
234                     time.sleep(self.interval)
235                 else:
236                     break
237         log.info('Where am I 6') # to tack the stuck
238         time.sleep(5)
239         log.info('Where am I 7') # to tack the stuck
240         self.drmaa.deleteJobTemplate(jt)
241         log.info('Where am I 8') # to tack the stuck
242         self.drmaa.exit()
243         log.info('Where am I 3' + str(Run.RUNNINGTASK['sge'])) # to tack the stuck
244         log.info('Where am I 4' + str(self.tasks)) # to tack the stuck
245         log.info('Where am I 5' + str(self.unfinished_tasks)) # to tack the stuck
...

the log file show (real path were replace with /path/to):

[INFO] 2019-11-29 04:22:46,973 start... [INFO] 2019-11-29 04:22:46,973 logfile: pid30103.log.info [WARNING] 2019-11-29 04:22:46,974 It seems that you are using the default value of "seed_cutoff", it is recommended to use "bin/seq_stat" to calculate this value, because this value will be greatly affected by reads length and sequencing depth, and an inappropriate value can significantly reduce assembly quality. [WARNING] 2019-11-29 04:22:46,974 Re-write workdir [INFO] 2019-11-29 04:22:46,974 options: [INFO] 2019-11-29 04:22:46,974 {'sort_threads': 2, 'nodelist': '', 'rewrite': 1, 'blocksize': '1g', 'job_prefix': 'nextDenovo', 'job_type': 'sge', 'minimap2_options_raw': '-x ava-ont -t 8', 'cns_threads': 15, 'sort_mem': '1g', 'seed_cutoff': '29999', 'input_fofn': '/path/to/software/NextDenovo/test_sge/./input.fofn', 'read_cutoff': '1k', 'input_type': 'raw', 'sort_options': '-m 1g -t 2 -k 50', 'parallel_jobs': '2', 'cluster_options': '-cwd -q st.q -P P17Z19700N0470 -l num_proc={cpu} -l vf={vf}', 'sge_queue': ['st.q'], 'ctg_graphdir': '/path/to/software/NextDenovo/test_sge/./01_rundir/03.ctg_graph', 'pa_correction': '2', 'workdir': '/path/to/software/NextDenovo/test_sge/./01_rundir', 'random_round': '10', 'minimap2_threads': (8, 8), 'minimap2_options_cns': '-x ava-ont -t 8 -k17 -w17', 'cns_aligndir': '/path/to/software/NextDenovo/test_sge/./01_rundir/02.cns_align', 'seed_cutfiles': '2', 'raw_aligndir': '/path/to/software/NextDenovo/test_sge/./01_rundir/01.raw_align', 'task': 'all', 'deltmp': 1, 'rerun': 3, 'correction_options': '-p 15 -max_lq_length 10000', 'nextgraph_options': '-a 1'} [INFO] 2019-11-29 04:22:46,976 mkdir: /path/to/software/NextDenovo/test_sge/./01_rundir [INFO] 2019-11-29 04:22:46,978 mkdir: /path/to/software/NextDenovo/test_sge/./01_rundir/01.raw_align [INFO] 2019-11-29 04:22:46,981 mkdir: /path/to/software/NextDenovo/test_sge/./01_rundir/02.cns_align [INFO] 2019-11-29 04:22:46,983 mkdir: /path/to/software/NextDenovo/test_sge/./01_rundir/03.ctg_graph [INFO] 2019-11-29 04:22:46,987 analysis tasks done [INFO] 2019-11-29 04:22:46,994 total jobs: 1 [INFO] 2019-11-29 04:22:47,043 Throw jobID:[9428864] jobCmd:[/path/to/software/NextDenovo/test_sge/01_rundir/01.raw_align/01.db_split.sh.work/db_split0/nextDenovo.sh] in the sge_cycle. [INFO] 2019-11-29 04:22:47,044 Where am I 2 [INFO] 2019-11-29 04:23:17,077 Where am I 6 [INFO] 2019-11-29 04:23:22,083 Where am I 7 [INFO] 2019-11-29 04:23:22,083 Where am I 8

the program then hangs.

moold commented 4 years ago

If you are sure the subtasks have finished, you can try to delete line: self.drmaa.exit(), or you can manually submit subtasks to the computer cluster.