a-ludi / dentist

Close assembly gaps using long-reads at high accuracy.
https://a-ludi.github.io/dentist/
MIT License
47 stars 6 forks source link

Error in rule `collect` #17

Closed muffato closed 3 years ago

muffato commented 3 years ago

Hi,

Following #16 I tried the pipeline with SKIP_LACHEK=1 and now I have this error

Error in rule collect:
    jobid: 5
    output: workdir/pile-ups.db
    log: logs/collect.log (check log file(s) for error message)
    shell:
        dentist collect --config=dentist.json  --threads=4 --auxiliary-threads=2 --mask=dentist-self-H,tan-H,dentist-reads-H workdir/scaffolds_FINAL.dam workdir/non-hifi.1kb.db workdir/scaffolds_FINAL.non-hifi.1kb.las workdir/pile-ups.db 2> logs/collect.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: 370318 logs/cluster/collect/unique/jobid5_4e09f197-d4f2-4b34-83cf-bac0967aa03c.out

Error executing rule collect on cluster (jobid: 5, external: 370318 logs/cluster/collect/unique/jobid5_4e09f197-d4f2-4b34-83cf-bac0967aa03c.out, jobscript: /lustre/scratch116/tol/teams/team308/users/mm49/tmp/non-hifi-reads2/.snakemake/tmp.vs0le48c/snakejob.collect.5.sh). For error details see the cluster log and the log files of the involved rule(s).

logs/collect.log is empty. 370318 logs/cluster/collect/unique/jobid5_4e09f197-d4f2-4b34-83cf-bac0967aa03c.out seems to be containing the output of a snakemake pipeline that has this error:

Error in rule propagate_mask_back_to_reference_block:
    jobid: 946
    output: workdir/.scaffolds_FINAL.dentist-self-H-257B.anno, workdir/.scaffolds_FINAL.dentist-self-H-257B.data
    log: logs/propagate-mask-back-to-reference-block.dentist-self.257.log (check log file(s) for error message)
    shell:
        dentist propagate-mask --config=dentist.json  -m dentist-self-257B workdir/non-hifi.1kb.db workdir/scaffolds_FINAL.dam workdir/non-hifi.1kb.257.scaffolds_FINAL.las dentist-self-H-257B 2> logs/propagate-mask-back-to-reference-block.dentist-self.257.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

And logs/propagate-mask-back-to-reference-block.dentist-self.257.log has this error:

core.exception.AssertError@/etc/../usr/include/dmd/phobos/std/range/primitives.d(2434): Attempting to fetch the front of an empty array of FlatLocalAlignment
----------------
??:? _d_assert_msg [0x55a59fc89657]
??:? dentist.common.alignments.base.FlatLocalAlignment[] std.algorithm.mutation.copy!(std.algorithm.iteration.ChunkByChunkImpl!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda1, dentist.dazzler.LocalAlignmentReader).ChunkByChunkImpl, dentist.common.alignments.base.FlatLocalAlignment[]).copy(std.algorithm.iteration.ChunkByChunkImpl!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda1, dentist.dazzler.LocalAlignmentReader).ChunkByChunkImpl, dentist.common.alignments.base.FlatLocalAlignment[]) [0x55a59f7ae2e8]
??:? dentist.common.alignments.base.FlatLocalAlignment[] dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().bufferChunks!(std.algorithm.iteration.ChunkByChunkImpl!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda1, dentist.dazzler.LocalAlignmentReader).ChunkByChunkImpl).bufferChunks(std.algorithm.iteration.ChunkByChunkImpl!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda1, dentist.dazzler.LocalAlignmentReader).ChunkByChunkImpl, ulong) [0x55a59f8e93a0]
??:? dentist.common.alignments.base.FlatLocalAlignment[] dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda2!(std.algorithm.iteration.ChunkByChunkImpl!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda1, dentist.dazzler.LocalAlignmentReader).ChunkByChunkImpl).__lambda2(std.algorithm.iteration.ChunkByChunkImpl!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda1, dentist.dazzler.LocalAlignmentReader).ChunkByChunkImpl) [0x55a59f8e932d]
??:? @property dentist.common.alignments.base.FlatLocalAlignment[] std.algorithm.iteration.MapResult!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda2, std.algorithm.iteration.ChunkByImpl!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda1, dentist.dazzler.LocalAlignmentReader).ChunkByImpl).MapResult.front() [0x55a59f8e94c8]
??:? @property dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[] std.algorithm.iteration.MapResult!(dentist.commands.propagateMask.MaskPropagator.run().__lambda1, std.algorithm.iteration.MapResult!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda2, std.algorithm.iteration.ChunkByImpl!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda1, dentist.dazzler.LocalAlignmentReader).ChunkByImpl).MapResult).MapResult.front() [0x55a59f8e97be]
??:? dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[][] std.algorithm.mutation.copy!(std.algorithm.iteration.MapResult!(dentist.commands.propagateMask.MaskPropagator.run().__lambda1, std.algorithm.iteration.MapResult!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda2, std.algorithm.iteration.ChunkByImpl!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda1, dentist.dazzler.LocalAlignmentReader).ChunkByImpl).MapResult).MapResult, dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[][]).copy(std.algorithm.iteration.MapResult!(dentist.commands.propagateMask.MaskPropagator.run().__lambda1, std.algorithm.iteration.MapResult!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda2, std.algorithm.iteration.ChunkByImpl!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda1, dentist.dazzler.LocalAlignmentReader).ChunkByImpl).MapResult).MapResult, dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[][]) [0x55a59f7ae725]
??:? void dentist.commands.propagateMask.MaskPropagator.run() [0x55a59f8e78ec]
??:? dentist.commandline.ReturnCode dentist.commandline.runCommand!(3).runCommand(in immutable(char)[][]) [0x55a59f816cbf]
??:? dentist.commandline.ReturnCode dentist.commandline.run(in immutable(char)[][]) [0x55a59f7e2e98]
??:? _Dmain [0x55a59f673704]

Full log: propagate-mask-back-to-reference-block.dentist-self.257.log

a-ludi commented 3 years ago

The provided information not quite conclusive. Here are my thoughts so far:

  1. Snakemake believes your job ID to be 370318 logs/cluster/collect/unique/jobid5_4e09f197-d4f2-4b34-83cf-bac0967aa03c.out. This may or may not be a source of error. This depends on your cluster setup. I recommend fixing this first. Probably you need to adjust your snakemake profile. I am happy help with that.
  2. It is very strange that collect fails because of an error in propagate_mask_back_to_reference_block because it actually depends on the results of the latter, so it should have successfully finished by the time collect is executed. This looks like a snakemake error and may be related to my first thought.
  3. The error in logs/propagate-mask-back-to-reference-block.dentist-self.257.log is definitely from DENTIST but I cannot really tell the exact source. Would it be possible to share the data? I would be happy to fix the error if you could upload the result of
    tar -czf dentist-issue-17.tar.gz \
        dentist.json \
        workdir/non-hifi.1kb.db \
        workdir/.non-hifi.1kb.bps \
        workdir/.non-hifi.1kb.idx \
        workdir/.non-hifi.1kb.dentist-self-257B.anno \
        workdir/.non-hifi.1kb.dentist-self-257B.data \
        workdir/scaffolds_FINAL.dam \
        workdir/.scaffolds_FINAL.bps \
        workdir/.scaffolds_FINAL.idx \
        workdir/.scaffolds_FINAL.hdr \
        workdir/non-hifi.1kb.257.scaffolds_FINAL.las

    here or if the file size is too big at the MPI-CBG cloud.

muffato commented 3 years ago
  1. Snakemake is running over LSF, using Snakemake-Profiles/lsf@c88fcb9fc60ce74596fbf8f516fef31574eef5de. It's not the latest version, and I'll give it a try with master now. However, the format of the cluster job_id may look weird but it's not a parsing error: it's explicitly built that way, cf https://github.com/Snakemake-Profiles/lsf/blob/master/%7B%7Bcookiecutter.profile_name%7D%7D/lsf_submit.py#L208-L211
  2. I think I've uploaded the file (6.9 GB) but I'm not sure (the script I used didn't fail, but did not produce any output). Did you get it ? If not, I'll upload it somewhere else
a-ludi commented 3 years ago
1. Snakemake is running over LSF, using [Snakemake-Profiles/lsf@c88fcb9](https://github.com/Snakemake-Profiles/lsf/commit/c88fcb9fc60ce74596fbf8f516fef31574eef5de). It's not the latest version, and I'll give it a try with master now. However, the format of the cluster job_id may look weird but it's not a parsing error: it's explicitly built that way, cf https://github.com/Snakemake-Profiles/lsf/blob/master/%7B%7Bcookiecutter.profile_name%7D%7D/lsf_submit.py#L208-L211

Good. Then that is probably not an issue.

3. I think I've uploaded the file (6.9 GB) but I'm not sure (the script I used didn't fail, but did not produce any output). Did you get it ? If not, I'll upload it somewhere else

Got it. I will write as soon as I have further news.

muffato commented 3 years ago

FYI: I've got exactly the same error at the same stage using the master branch of the LSF Snakemake profile.

a-ludi commented 3 years ago

Sorry for the long wait, I've had much on my plate. I gave it a try today and could reproduce the error. I am going to track down the source of the error and have a bugfix included in the next release which is planned for this week.

muffato commented 3 years ago

I love the optimism :) Thank you very much for looking into it

a-ludi commented 3 years ago

Update: found and fixed. Now, I just need to finish the release. :D

a-ludi commented 3 years ago

@muffato The new release v2.0.0 is out. Could you try if that helps? It's best if you start a new run from scratch and make the config adjustment you need because the release contains breaking changes. Of course, you should start with the example -- just to make sure. :laughing:

muffato commented 3 years ago

Great, thank you so much ! We're on it, we'll let you know

muffato commented 3 years ago

Hi @a-ludi . The new version worked 👏🏼 . Thank you very much !