ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
518 stars 111 forks source link

cactus-graphmap-join error #917

Closed evcurran closed 1 year ago

evcurran commented 1 year ago

Hi,

I see there was a similar issue to this just posted ( #915 ) but this is giving different error messages.

I ran the following command:

cactus-graphmap-join ./jobstore --vg ./chrom_alignments/*.vg --hal ./chrom_alignments/*.hal --outDir ./arenosa_pg  --outName arenosa_pg --reference Aare --vcf --giraffe clip filter --indexCores 11

And it failed with a series of errors, which look like the following (this is the first occurrence of it):

[2023-02-03T09:19:30+0000] [MainThread] [I] [toil.leader] Issued job 'make_vg_indexes' kind-make_vg_indexes/instance-uwv6nfn7 v1 with job batch system ID: 41 and disk: 24.1 Gi, memory: 2.0 Gi, cores: 11, accelerators: [], preemptable: False
/gpfs01/software/easybuild-uon/software/Python/3.10.4-GCCcore-11.3.0/lib/python3.10/site-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated
  "class": algorithms.Blowfish,
[2023-02-03T09:19:31+0000] [MainThread] [I] [toil.worker] Redirecting logging to /tmp/1976fa4c81fe579488b56609900bcc52/7a27/worker_log.txt
[2023-02-03T09:19:32+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:32.276950: Running the command: "grep -v ^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_1.vg.gfa"
[2023-02-03T09:19:34+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:34.242659: Successfully ran: "grep -v ^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_1.vg.gfa" in 1.9619 seconds
[2023-02-03T09:19:34+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:34.938464: Running the command: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_2.vg.gfa"
[2023-02-03T09:19:39+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:39.615261: Successfully ran: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_2.vg.gfa" in 4.6737 seconds
[2023-02-03T09:19:40+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:40.341354: Running the command: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_3.vg.gfa"
[2023-02-03T09:19:46+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:46.013819: Successfully ran: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_3.vg.gfa" in 5.6704 seconds
[2023-02-03T09:19:46+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:46.730680: Running the command: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_4.vg.gfa"
[2023-02-03T09:19:51+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:51.917532: Successfully ran: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_4.vg.gfa" in 5.1845 seconds
[2023-02-03T09:19:52+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:52.597900: Running the command: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_5.vg.gfa"
[2023-02-03T09:19:57+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:57.324341: Successfully ran: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_5.vg.gfa" in 4.7236 seconds
[2023-02-03T09:19:58+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:58.079283: Running the command: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_6.vg.gfa"
[2023-02-03T09:20:03+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:20:03.975083: Successfully ran: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_6.vg.gfa" in 5.8938 seconds
[2023-02-03T09:20:04+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:20:04.804054: Running the command: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_7.vg.gfa"
[2023-02-03T09:20:10+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:20:10.872173: Successfully ran: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_7.vg.gfa" in 6.0658 seconds
[2023-02-03T09:20:11+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:20:11.615454: Running the command: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_8.vg.gfa"
[2023-02-03T09:20:16+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:20:16.622882: Successfully ran: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_8.vg.gfa" in 5.0052 seconds
[2023-02-03T09:20:16+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:20:16.883402: Running the command: "vg gbwt -G /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/clip.merged.gfa --gbz-format -g /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/clip.merged.gbz"
[2023-02-03T09:22:10+0000] [Thread-1 (daddy)] [E] [toil.batchSystems.singleMachine] Got exit code 1 (indicating failure) from job _toil_worker make_vg_indexes file:/gpfs01/home/mbzec/arenosa/ON_data/graph_building/minigraph_cactus/build/jobstore kind-make_vg_indexes/instance-uwv6nfn7.
[2023-02-03T09:22:10+0000] [MainThread] [W] [toil.leader] Job failed with exit value 1: 'make_vg_indexes' kind-make_vg_indexes/instance-uwv6nfn7 v1
Exit reason: None
[2023-02-03T09:22:10+0000] [MainThread] [W] [toil.leader] The job seems to have left a log file, indicating failure: 'make_vg_indexes' kind-make_vg_indexes/instance-uwv6nfn7 v2
[2023-02-03T09:22:10+0000] [MainThread] [W] [toil.leader] Log from job "kind-make_vg_indexes/instance-uwv6nfn7" follows:
=========>
    [2023-02-03T09:19:31+0000] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG---
    [2023-02-03T09:19:31+0000] [MainThread] [I] [toil] Running Toil version 5.8.0-79792b70098c4c18d1d2c2832b72085893f878d1 on host mmem008.int.augusta.nottingham.ac.uk.
    [2023-02-03T09:19:31+0000] [MainThread] [I] [toil.worker] Working on job 'make_vg_indexes' kind-make_vg_indexes/instance-uwv6nfn7 v1
    [2023-02-03T09:19:31+0000] [MainThread] [I] [toil.worker] Loaded body Job('make_vg_indexes' kind-make_vg_indexes/instance-uwv6nfn7 v1) from description 'make_vg_indexes' kind-make_vg_indexes/instance-uwv6nfn7 v1
    [2023-02-03T09:19:32+0000] [MainThread] [I] [cactus.shared.common] Running the command ['grep', '-v', '^W     _MINIGRAPH_', '/tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_1.vg.gfa']
    [2023-02-03T09:19:32+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:32.276950: Running the command: "grep -v ^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_1.vg.gfa"
    [2023-02-03T09:19:34+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:34.242659: Successfully ran: "grep -v ^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_1.vg.gfa" in 1.9619 seconds
    [2023-02-03T09:19:34+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:34.938464: Running the command: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_2.vg.gfa"
    [2023-02-03T09:19:39+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:39.615261: Successfully ran: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_2.vg.gfa" in 4.6737 seconds
    [2023-02-03T09:19:40+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:40.341354: Running the command: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_3.vg.gfa"
    [2023-02-03T09:19:46+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:46.013819: Successfully ran: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_3.vg.gfa" in 5.6704 seconds
    [2023-02-03T09:19:46+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:46.730680: Running the command: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_4.vg.gfa"
    [2023-02-03T09:19:51+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:51.917532: Successfully ran: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_4.vg.gfa" in 5.1845 seconds
    [2023-02-03T09:19:52+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:52.597900: Running the command: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_5.vg.gfa"
    [2023-02-03T09:19:57+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:57.324341: Successfully ran: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_5.vg.gfa" in 4.7236 seconds
    [2023-02-03T09:19:58+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:19:58.079283: Running the command: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_6.vg.gfa"
    [2023-02-03T09:20:03+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:20:03.975083: Successfully ran: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_6.vg.gfa" in 5.8938 seconds
    [2023-02-03T09:20:04+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:20:04.804054: Running the command: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_7.vg.gfa"
    [2023-02-03T09:20:10+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:20:10.872173: Successfully ran: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_7.vg.gfa" in 6.0658 seconds
    [2023-02-03T09:20:11+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:20:11.615454: Running the command: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_8.vg.gfa"
    [2023-02-03T09:20:16+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:20:16.622882: Successfully ran: "grep -v ^H\|^W     _MINIGRAPH_ /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_8.vg.gfa" in 5.0052 seconds
    [2023-02-03T09:20:16+0000] [MainThread] [I] [toil-rt] 2023-02-03 09:20:16.883402: Running the command: "vg gbwt -G /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/clip.merged.gfa --gbz-format -g /tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/clip.merged.gbz"
    [2023-02-03T09:22:09+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
    [2023-02-03T09:22:09+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-wx9kwdvu/file-55f042b6e7b34b8ea0bfb54d0b90073d/scaffold_1.vg.gfa' to path '/tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_1.vg.gfa'
    [2023-02-03T09:22:09+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-210585q0/file-7f05f4e0992b4201a148dbcdbd0323d4/scaffold_2.vg.gfa' to path '/tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_2.vg.gfa'
    [2023-02-03T09:22:09+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-qi0db160/file-51fb43faddd544dea57d696c8893ace4/scaffold_3.vg.gfa' to path '/tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_3.vg.gfa'
    [2023-02-03T09:22:09+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-12wd_gek/file-71ce61860a9d47d5a63cacd393f26712/scaffold_4.vg.gfa' to path '/tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_4.vg.gfa'
    [2023-02-03T09:22:09+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-f8lf_fjo/file-98398e00a9e14cfe853d55e7f054620c/scaffold_5.vg.gfa' to path '/tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_5.vg.gfa'
    [2023-02-03T09:22:09+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-nnmdfpb5/file-307c99f04c0e42f3a1b1e8c972a77a18/scaffold_6.vg.gfa' to path '/tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_6.vg.gfa'
    [2023-02-03T09:22:09+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-5yrhp5_g/file-7ba457cf0f324ea0a55cf7998dce67f1/scaffold_7.vg.gfa' to path '/tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_7.vg.gfa'
    [2023-02-03T09:22:09+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-igeemixg/file-ff6344878a9740bd96fc94163a1c4d34/scaffold_8.vg.gfa' to path '/tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/scaffold_8.vg.gfa'
    Traceback (most recent call last):
      File "/gpfs01/home/mbzec/.local/lib/python3.10/site-packages/toil/worker.py", line 403, in workerScript
        job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer)
      File "/gpfs01/home/mbzec/.local/lib/python3.10/site-packages/toil/job.py", line 2727, in _runner
        returnValues = self._run(jobGraph=None, fileStore=fileStore)
      File "/gpfs01/home/mbzec/.local/lib/python3.10/site-packages/toil/job.py", line 2644, in _run
        return self.run(fileStore)
      File "/gpfs01/home/mbzec/.local/lib/python3.10/site-packages/toil/job.py", line 2875, in run
        rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
      File "/gpfs01/home/mbzec/.local/lib/python3.10/site-packages/cactus/refmap/cactus_graphmap_join.py", line 533, in make_vg_indexes
        cactus_call(parameters=['vg', 'gbwt', '-G', merge_gfa_path, '--gbz-format', '-g', gbz_path])
      File "/gpfs01/home/mbzec/.local/lib/python3.10/site-packages/cactus/shared/common.py", line 824, in cactus_call
        raise RuntimeError("{}Command {} exited {}: {}".format(sigill_msg, call, process.returncode, out))
    RuntimeError: Command ['vg', 'gbwt', '-G', '/tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/clip.merged.gfa', '--gbz-format', '-g', '/tmp/1976fa4c81fe579488b56609900bcc52/7a27/9b42/tmpv3ogjxjh/clip.merged.gbz'] exited 134: stdout=None, stderr=terminate called after throwing an instance of 'std::runtime_error'
      what():  MetadataBuilder: Invalid haplotype field contig_12781_1_1
    ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
    Stack trace path: /tmp/vg_crash_FFSyPp/stacktrace.txt
    Please include the stack trace file in your bug report!

    [2023-02-03T09:22:10+0000] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host mmem008.int.augusta.nottingham.ac.uk

Thank you for any help with this!

glennhickey commented 1 year ago

Something like this might occur if you somehow switched Cactus and/or vg versions while running different commands of the pipeline. (this notion of metadata which it's complaining about has been evolving in both cactus and vg in recent versions)

If that's not the case, it's a bug. Cactus uses some naming conventions to keep track of metadata in the contigs, and it could be that one of your input contigs is named in such a way that it's getting confused.

Would you be able to share any of your data so I could reproduce? Ideally it'd be the smallest .vg in chrom_alignments/ (provided that's enough to trigger the error).

evcurran commented 1 year ago

I can't see how different versions of cactus/vg were used during the different stages, so perhaps it's a bug. Here is the smallest vg, let me know if you need anything else!

https://drive.google.com/file/d/1pvPWzBOuECm_Duci2ealEzL_kf9-Xu-k/view?usp=sharing

glennhickey commented 1 year ago

I can't open this file.

ls -l scaffold_5.vg
-rw-r--r-- 1 hickey cgl 502530048 Feb  3 07:31 scaffold_5.vg

md5sum scaffold_5.vg
f812752bb4bd9f3c8e8df19bf75cb241  scaffold_5.vg
vg stats -F scaffold_5.vg 
terminate called after throwing an instance of 'std::bad_alloc'
evcurran commented 1 year ago

Apologies, it seems the file didn't upload properly to google drive, md5sum indicates the upload should have worked this time:

https://drive.google.com/file/d/1K3SVFvkuz29JpJanrymHJYIcshGO47Ar/view?usp=sharing

glennhickey commented 1 year ago

Could you please share scaffold_5.hal ?

evcurran commented 1 year ago

Of course:

https://drive.google.com/file/d/1lQw3B1ueq8fwKIczMx8J9A9ZUmhHpYpy/view?usp=sharing

glennhickey commented 1 year ago

OK, the issue is that in your input fasta files, the contigs have names like

ING_3178#1#contig_8405_1_1_1

Where cactus would only expect contig_8405_1_1_1.

It may have worked with older version of vg, but with the newer vg in recent cactus releases, the # character is causing problems.

Unfortunately, the only work-around is to clean up the names, then start again. You can follow the procedure described here.

cactus-prepare ./seqfile --outDir pp --seqFileOnly
cactus-preprocess ./seqfile pp/seqfile --pangenome

Then work with pp/seqfile for the rest of the pipeline.

I'll try to make this more prominent in the documentation, as well as adding an error earlier on if "#" characters are found asap. Thanks for pointing this out.

evcurran commented 1 year ago

OK, that's great, thank you for your help with this, greatly appreciated!