ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
511 stars 112 forks source link

Job failed at clip_vg when building a pangenome #986

Closed miketuoba closed 1 year ago

miketuoba commented 1 year ago

Hi there,

I am running into an issue when using cactus-pangenome, where the job failed at clip_vg. Similar errors also occurred before when running graphmap-join, where jobs either failed at clip_vg or make_vg_indexes.

The command used for running this job is:

cactus-pangenome ~/scratch/v2.5.0/jobstore ~/scratch/v2.5.0/can_genome/can_genomes_chrom.txt --outDir ~/scratch/can_genome/v2.5.0/can_genome_chrom_output --outName can_genome_chrom --reference GCA_900626175.2 --vcf --giraffe --gfa --gbz

The error message is as follows: [2023-04-06T18:29:35-0700] [MainThread] [I] [toil-rt] 2023-04-06 18:29:35.562242: Successfully ran: "awk {print $0 " UZAU01000917.1"} /tmp/d495be3b7ba25a5c87e16527eb012684/26b5/74d3/tmpqqjfcvf_/UZAU01000917.1.vg.clip.bed" in 0.005 seconds [2023-04-06T18:29:35-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:35: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-6b5xe192/cleanup/file-2fe07999d1814b37bc2c888ef388caa5/stream used 128.68% disk (20.0 KiB [20480B] used, 15.5 KiB [15916B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-5sb44sue/cleanup/file-f67ce3cfb49e479486d27baddd74e0f0/stream used 180.03% disk (20.0 KiB [20480B] used, 11.1 KiB [11376B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-yt3z6h4h/cleanup/file-dbe26dd001e04d2fa585fee1a3eaacd9/stream used 131.82% disk (36.0 KiB [36864B] used, 27.3 KiB [27966B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clipvg/instance-6bhvmwk/cleanup/file-0dbe53719b67488c9fd2c120bd55c1b3/stream used 150.23% disk (28.0 KiB [28672B] used, 18.6 KiB [19086B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-aska8_s0/cleanup/file-4c61cb2f1e194616863ff17403e931e8/stream used 124.54% disk (28.0 KiB [28672B] used, 22.5 KiB [23022B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-1bbeazs9/cleanup/file-5d859748dc644f07b1a8fe59fd3da43c/stream used 180.09% disk (20.0 KiB [20480B] used, 11.1 KiB [11372B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-ayjyq915/cleanup/file-d8015161bb864e50b84570df092a6898/stream used 119.34% disk (60.0 KiB [61440B] used, 50.3 KiB [51484B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-imo36zk0/cleanup/file-8a46e814516a43f897b236afbe81b053/stream used 150.23% disk (28.0 KiB [28672B] used, 18.6 KiB [19086B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-klfwiok6/cleanup/file-f56c8ccc8c01469093c7a997b7bfeb57/stream used 107.69% disk (44.0 KiB [45056B] used, 40.9 KiB [41838B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-55ioj3wv/cleanup/file-8495ca465ed4454085a579e9ce6456ac/stream used 128.66% disk (20.0 KiB [20480B] used, 15.5 KiB [15918B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-90jqp7mg/cleanup/file-ae6804c89b36410bb78c2ceb88172b04/stream used 124.55% disk (28.0 KiB [28672B] used, 22.5 KiB [23020B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clipvg/instance-dxfobwj/cleanup/file-29006cb345a5449e822a1c30464871d5/stream used 152.95% disk (20.0 KiB [20480B] used, 13.1 KiB [13390B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-g1t5wush/cleanup/file-469b81bf483e404aafc0c1b1728be2c8/stream used 150.23% disk (28.0 KiB [28672B] used, 18.6 KiB [19086B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-j06dytmr/cleanup/file-94aef3aa9c384c879dbe5fdc967b951c/stream used 119.34% disk (60.0 KiB [61440B] used, 50.3 KiB [51484B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-0h9odvyi/cleanup/file-ea6df09eab8c4fbbb68309303fc8fe49/stream used 107.70% disk (44.0 KiB [45056B] used, 40.9 KiB [41834B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-9t1rqtlx/cleanup/file-53f6ebe56bbc4dd688430abfd86ebc92/stream used 109.59% disk (68.0 KiB [69632B] used, 62.0 KiB [63536B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-0wyl9_v6/cleanup/file-7e5ad0377d4a49469edb7a0150d1fa3e/stream used 128.66% disk (20.0 KiB [20480B] used, 15.5 KiB [15918B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-nfktj7je/cleanup/file-bab73b1412a64cac97fa02f3f72a03c7/stream used 109.60% disk (68.0 KiB [69632B] used, 62.0 KiB [63532B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-n6g9alj2/cleanup/file-ac34955459b14bd0b1f9f6fd8f2bb038/stream used 128.64% disk (20.0 KiB [20480B] used, 15.5 KiB [15920B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-zrjpx5qi/cleanup/file-7f0a5f0500ad43a2846008de0feb9012/stream used 180.09% disk (20.0 KiB [20480B] used, 11.1 KiB [11372B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-abkb8rm0/cleanup/file-5c01717b82874c018b9a91bdb7cdddd7/stream used 131.83% disk (36.0 KiB [36864B] used, 27.3 KiB [27964B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-w70hm14w/cleanup/file-b4d70ec5a31a40f285e36c19bed129a1/stream used 152.95% disk (20.0 KiB [20480B] used, 13.1 KiB [13390B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-157tyopk/cleanup/file-830a5acccd7f4133b57e649ca6b2a281/stream used 124.55% disk (28.0 KiB [28672B] used, 22.5 KiB [23020B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-7pu576w5/cleanup/file-167377361984475db5b335d20e001846/stream used 150.23% disk (28.0 KiB [28672B] used, 18.6 KiB [19086B] requested). [2023-04-06T18:29:36-0700] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-06-2023 18:29:36: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job files/for-job/kind-clip_vg/instance-94nuv5_b/cleanup/file-4a4e5fc4472d4f3bbe7217f2e3845bee/stream used 119.34% disk (60.0 KiB [61440B] used, 50.3 KiB [51484B] requested). [2023-04-06T18:29:36-0700] [Thread-1 ] [E] [toil.batchSystems.singleMachine] Got exit code 1 (indicating failure) from job _toil_worker clip_vg file:/home/tuobawen/scratch/v2.5.0/jobstore kind-clip_vg/instance-3a7f2kn6. [2023-04-06T18:29:36-0700] [MainThread] [W] [toil.leader] Job failed with exit value 1: 'clip_vg' kind-clip_vg/instance-3a7f2kn6 v1 Exit reason: None [2023-04-06T18:29:36-0700] [MainThread] [W] [toil.leader] The job seems to have left a log file, indicating failure: 'clip_vg' kind-clip_vg/instance-3a7f2kn6 v2 [2023-04-06T18:29:36-0700] [MainThread] [W] [toil.leader] Log from job "kind-clip_vg/instance-3a7f2kn6" follows: =========> [2023-04-06T18:29:33-0700] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG--- [2023-04-06T18:29:33-0700] [MainThread] [I] [toil] Running Toil version 5.9.2-54bfe0b146b76ecc6221de384c255e1be89547c6 on host cdr517.int.cedar.computecanada.ca. [2023-04-06T18:29:33-0700] [MainThread] [I] [toil.worker] Working on job 'clip_vg' kind-clip_vg/instance-3a7f2kn6 v1 [2023-04-06T18:29:35-0700] [MainThread] [I] [toil.worker] Loaded body Job('clip_vg' kind-clip_vg/instance-3a7f2kn6 v1) from description 'clip_vg' kind-clip_vg/instance-3a7f2kn6 v1 [2023-04-06T18:29:35-0700] [MainThread] [I] [cactus.shared.common] Running the command ['bash', '-c', 'set -eo pipefail && clip-vg /tmp/d495be3b7ba25a5c87e16527eb012684/2ab2/cc5f/tmpzpvyz_5g/LR213627.1.vg -f -e GCA_900626175.2 -d MINIGRAPH -u 10000 -a MINIGRAPH -o /tmp/d495be3b7ba25a5c87e16527eb012684/2ab2/cc5f/tmpzpvyz_5g/LR213627.1.vg.clip.bed | vg clip -d 1 - -P GCA_900626175.2 | vg clip -s - -P GCA_900626175.2'] [2023-04-06T18:29:35-0700] [MainThread] [I] [toil-rt] 2023-04-06 18:29:35.363597: Running the command: "bash -c set -eo pipefail && clip-vg /tmp/d495be3b7ba25a5c87e16527eb012684/2ab2/cc5f/tmpzpvyz_5g/LR213627.1.vg -f -e GCA_900626175.2 -d MINIGRAPH -u 10000 -a MINIGRAPH -o /tmp/d495be3b7ba25a5c87e16527eb012684/2ab2/cc5f/tmpzpvyz_5g/LR213627.1.vg.clip.bed | vg clip -d 1 - -P GCA_900626175.2 | vg clip -s - -P GCA_900626175.2" [2023-04-06T18:29:36-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files: [2023-04-06T18:29:36-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-join_vg/instance-ob3y9fzb/file-0275ef334652471aa1ba83248c745458/LR213627.1.vg' to path '/tmp/d495be3b7ba25a5c87e16527eb012684/2ab2/cc5f/tmpzpvyz_5g/LR213627.1.vg' Traceback (most recent call last): File "/home/tuobawen/cactus-bin-v2.5.0/cactus_env/lib/python3.8/site-packages/toil/worker.py", line 403, in workerScript job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer) File "/home/tuobawen/cactus-bin-v2.5.0/cactus_env/lib/python3.8/site-packages/toil/job.py", line 2743, in _runner returnValues = self._run(jobGraph=None, fileStore=fileStore) File "/home/tuobawen/cactus-bin-v2.5.0/cactus_env/lib/python3.8/site-packages/toil/job.py", line 2660, in _run return self.run(fileStore) File "/home/tuobawen/cactus-bin-v2.5.0/cactus_env/lib/python3.8/site-packages/toil/job.py", line 2888, in run rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs) File "/home/tuobawen/cactus-bin-v2.5.0/cactus_env/lib/python3.8/site-packages/cactus/refmap/cactus_graphmap_join.py", line 429, in clip_vg cactus_call(parameters=cmd, outfile=clipped_path) File "/home/tuobawen/cactus-bin-v2.5.0/cactus_env/lib/python3.8/site-packages/cactus/shared/common.py", line 839, in cactus_call raise RuntimeError("{}Command {} exited {}: {}".format(sigill_msg, call, process.returncode, out)) RuntimeError: Command ['bash', '-c', 'set -eo pipefail && clip-vg /tmp/d495be3b7ba25a5c87e16527eb012684/2ab2/cc5f/tmpzpvyz_5g/LR213627.1.vg -f -e GCA_900626175.2 -d MINIGRAPH -u 10000 -a MINIGRAPH -o /tmp/d495be3b7ba25a5c87e16527eb012684/2ab2/cc5f/tmpzpvyz_5g/LR213627.1.vg.clip.bed | vg clip -d 1 - -P GCA_900626175.2 | vg clip -s - -P GCA_900626175.2'] exited 1: stdout=None, stderr=terminate called after throwing an instance of 'std::length_error' what(): vector::reserve ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug. Stack trace path: /tmp/vg_crash_1QMdmn/stacktrace.txt Please include the stack trace file in your bug report! error[VPKG::load_one]: Correct input type not found in standard input while loading handlegraph::MutablePathMutableHandleGraph

[2023-04-06T18:29:36-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host cdr517.int.cedar.computecanada.ca

<========= [2023-04-06T18:29:36-0700] [MainThread] [W] [toil.job] Due to failure we are reducing the remaining try count of job 'clip_vg' kind-clip_vg/instance-3a7f2kn6 v2 with ID kind-clip_vg/instance-3a7f2kn6 to 1 [2023-04-06T18:29:36-0700] [MainThread] [W] [toil.job] We have increased the default memory of the failed job 'clip_vg' kind-clip_vg/instance-3a7f2kn6 v2 to 2147483648 bytes [2023-04-06T18:29:36-0700] [MainThread] [W] [toil.job] We have increased the disk of the failed job 'clip_vg' kind-clip_vg/instance-3a7f2kn6 v2 to the default of 2147483648 bytes [2023-04-06T18:29:36-0700] [MainThread] [I] [toil.leader] Issued job 'clip_vg' kind-clip_vg/instance-3a7f2kn6 v3 with job batch system ID: 2346 and disk: 2.0 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False [2023-04-06T18:29:36-0700] [MainThread] [I] [toil-rt] 2023-04-06 18:29:36.631040: Successfully ran: "bash -c set -eo pipefail && clip-v

If you need more info from me, do feel free to ask.

Thanks!

glennhickey commented 1 year ago

That's a strange one. Are you able to share any of this data?

If so, you can run your command again, but add --restart --disableCaching --cleanWorkDir never. It should quickly crash the same way, but leaving LR213627.1.vg on disk this time. If you can then share that file with me I can debug it.

Otherwise, I'm not too sure what to suggest. I can only speculate that this is the line in question: https://github.com/vgteam/vg/blob/master/src/clip.cpp#L471 which would mean you somehow have a bazillion edges in your graph and this is tripping it up.

miketuoba commented 1 year ago

Hi Glenn,

Thanks so much for your kind reply. I got a rerun and saved out the LR213627.1.vg file attached here for your reference (file is zipped so as to fulfill the upload requirement).

Thanks again! LR213627.1.zip

glennhickey commented 1 year ago

This file is completely corrupt

vg validate LR213627.1.vg 
terminate called after throwing an instance of 'std::runtime_error'
  what():  Compiled with page size 256 but loading vector with page size 140736216222472

So the error must be happening upstream. Would you be able to share the corresponding file from the "chrom-alignments" subdirectory in your output directory? I'll try to reproduce it from there...

miketuoba commented 1 year ago

This file is completely corrupt

vg validate LR213627.1.vg 
terminate called after throwing an instance of 'std::runtime_error'
  what():  Compiled with page size 256 but loading vector with page size 140736216222472

So the error must be happening upstream. Would you be able to share the corresponding file from the "chrom-alignments" subdirectory in your output directory? I'll try to reproduce it from there...

For this run using cactus-pangenome, I don't have an output folder generated. However, I do have one similar file under "chrom-alignments" subdirectory from one of my previous runs using v2.4.3. I can totally share that with you. However, the file is too large (382 MB) to be uploaded here. Is there any other way I can share the file with you? Thanks so much!

glennhickey commented 1 year ago

Oh, I just noticed something. This could very well be the same issue as https://github.com/ComparativeGenomicsToolkit/cactus/issues/993 where "."'s in the reference name cause problems. I just fixed this in https://github.com/ComparativeGenomicsToolkit/cactus/pull/994 but until the next release, you should follow the advice here

miketuoba commented 1 year ago

Hi Glenn,

Thanks so much for letting me know! I just saw you had a new release just now, and I'll definitely give it a shot now using the new version.

Thanks again so much!