EichlerLab / smrtsv2

Structural variant caller
MIT License
53 stars 6 forks source link

Exception when executing assemble_group rule on cluster #7

Closed ggstatgen closed 5 years ago

ggstatgen commented 5 years ago

Hi,

I managed to get the pipeline to start running on a non-compatible DRMAA cluster infrastructure. I manage to reach the 'assembly' stage and then I find that a proportion of the cluster jobs fail with the following error dump:

[Wed Feb 13 14:07:39 2019]
Error in rule asm_assemble_group:
    jobid: 0
    output: assemble/group/gr-chr14-76000000-1000000/contig.bam, assemble/group/gr-chr14-76000000-1000000/contig.bam.bai
    log: assemble/group/gr-chr14-76000000-1000000/contig_group.log

RuleException:
RuntimeError in line 162 of <CODE>/smrtsv2/rules/assemble.snakefile:
Failed to assemble group gr-chr14-76000000-1000000: See log assemble/group/gr-chr14-76000000-1000000/contig_group.log
  File "<CODE>/smrtsv2/rules/assemble.snakefile", line 162, in __rule_asm_assemble_group
  File "<CODE>/smrtsv2/dep/conda/build/envs/python3/lib/python3.6/concurrent/futures/thread.py", line 55, in run

I'm wondering if this is something you've stumbled on before. The code block in assemble.snakefile is

                return_code = smrtsvrunner.run_snake_target(
                    'rules/assemble_group.snakefile', None, PROCESS_ENV, SMRTSV_DIR, command,
                    stdout=log_file, stderr=subprocess.STDOUT, cwd=assemble_temp,
                    resources=['threads={:d}'.format(params.threads)]
                )

            if return_code != 0:
                raise RuntimeError('Failed to assemble group {}: See log {}'.format(wildcards.group_id, log.contig_group))

Thanks!

paudano commented 5 years ago

This is just telling you that assemblies failed in a region: chr14:76000000-77000000

The log file will give you more information: assemble/group/gr-chr14-76000000-1000000/contig_group.log

I have found that some assemblies will not complete in the default time given (especially those over centromeres), and so it may not have finished the assembly windows in that region (chr14:76000000-77000000) before the job ran out of time. That's the most common reason for failure I have seen.

What do you see in the log file?

ggstatgen commented 5 years ago

Hi thanks. I have checked the directory and it seems snakemake has actually detected the absence of a .bam file and re-attempted the assembly. It has now successfully completed and I now see 4 files in the directory (two logs, one bam, its .bai).

If I look under gr-chr14-76000000-1000000/log I find 115 .log files, each of them seemingly a canu dump for a local assembly job. So it sounds, like you suggest, it's a complex region with a lot of assembly jobs. Thank you for the advice!