cactus-graphmap-join error when indexing with tabix

brettChapman commented 1 year ago

Hi

I'm getting an error about tbi indexing during a cactus-graphmap-join run.

I run the following command:

cactus-graphmap-join /cactus/jobStore --indexCores 30 --vg ${original_folder}/*H/*.vg --outDir ${original_folder}/barley-pg --outName barley-pg --reference Morex_v3 --wlineSep "." --clipLength 10000 --clipNonMinigraph --vcf --giraffe --gfaffix --disableCaching --workDir=/cactus/workDir --clean always --cleanWorkDir always --defaultDisk 1000G --maxDisk 1000G --maxCores 32 --maxMemory 126G --defaultMemory 126G

I get this error during the run, but it doesn't kill off the whole job, only one of the toil workers fails:

[2023-01-30T17:26:21+0000] [MainThread] [W] [toil.leader] Job failed with exit value 1: 'make_vcf' kind-make_vcf/instance-buqavpq9 v1
Exit reason: None
[2023-01-30T17:26:22+0000] [MainThread] [W] [toil.leader] The job seems to have left a log file, indicating failure: 'make_vcf' kind-make_vcf/instance-buqavpq9 v2
[2023-01-30T17:26:22+0000] [MainThread] [W] [toil.leader] Log from job "kind-make_vcf/instance-buqavpq9" follows:
=========>
        [2023-01-27T23:18:07+0000] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG---
        [2023-01-27T23:18:07+0000] [MainThread] [I] [toil] Running Toil version 5.6.0-c34146a6437e4407a61e946e968bcce67a0ebbca on host node-5.
        [2023-01-27T23:18:07+0000] [MainThread] [I] [toil.worker] Working on job 'make_vcf' kind-make_vcf/instance-buqavpq9 v1
        [2023-01-27T23:18:09+0000] [MainThread] [I] [toil.worker] Loaded body Job('make_vcf' kind-make_vcf/instance-buqavpq9 v1) from description 'make_vcf' kind-make_vcf/instance-buqavpq9 v1
        [2023-01-27T23:28:32+0000] [MainThread] [I] [cactus.shared.common] Running the command ['bgzip', '-fd', '/cactus/workDir/f0932e5ae3f857bb9f9de2edeb1e5a23/884c/3277/tmp5nikx0gc/barley-pg.trans.gz'>
        [2023-01-27T23:28:59+0000] [MainThread] [W] [toil.lib.humanize] Deprecated toil method.  Please use "toil.lib.conversions.bytes2human()" instead."
        [2023-01-27T23:28:59+0000] [MainThread] [I] [cactus.shared.common] Running the command ['bash', '-c', 'set -eo pipefail && vg deconstruct /cactus/workDir/f0932e5ae3f857bb9f9de2edeb1e5a23/884c/327>
        [2023-01-30T17:22:27+0000] [MainThread] [W] [toil.lib.humanize] Deprecated toil method.  Please use "toil.lib.conversions.bytes2human()" instead."
        [2023-01-30T17:22:29+0000] [MainThread] [I] [cactus.shared.common] Running the command ['tabix', '-p', 'vcf', '/cactus/workDir/f0932e5ae3f857bb9f9de2edeb1e5a23/884c/3277/tmp5nikx0gc/merged.vcf.gz>
        [2023-01-30T17:24:41+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
        [2023-01-30T17:24:41+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_indexes/instance-t69qts7c/file-48fe7a45fa9e4f31b753a0fec97ebab0/merged.xg' t>
        [2023-01-30T17:24:41+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_indexes/instance-t69qts7c/file-7dd04e875c8a476b8fcf680223ddc192/merged.gbwt'>
        [2023-01-30T17:24:41+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_indexes/instance-t69qts7c/file-5cbe0779c0704df8bf82a57a1e817673/merged.snarl>
        [2023-01-30T17:24:41+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_indexes/instance-t69qts7c/file-4b9a0cf4d16d49f2ba56e26ca8558289/merged.trans>
        Traceback (most recent call last):
          File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/worker.py", line 405, in workerScript
            job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer)
          File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/job.py", line 2399, in _runner
            returnValues = self._run(jobGraph=None, fileStore=fileStore)
          File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/job.py", line 2317, in _run
            return self.run(fileStore)
          File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/job.py", line 2540, in run
            rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
          File "/home/cactus/cactus_env/lib/python3.10/site-packages/cactus/refmap/cactus_graphmap_join.py", line 557, in make_vcf
            cactus_call(parameters=['tabix', '-p', 'vcf', vcf_path])
          File "/home/cactus/cactus_env/lib/python3.10/site-packages/cactus/shared/common.py", line 866, in cactus_call
            raise RuntimeError("Command {} exited {}: {}".format(call, process.returncode, out))
        RuntimeError: Command /usr/bin/time -v tabix -p vcf /cactus/workDir/f0932e5ae3f857bb9f9de2edeb1e5a23/884c/3277/tmp5nikx0gc/merged.vcf.gz exited 1: stdout=None, stderr=[E::hts_idx_check_range] Reg>
        tbx_index_build failed: /cactus/workDir/f0932e5ae3f857bb9f9de2edeb1e5a23/884c/3277/tmp5nikx0gc/merged.vcf.gz
        Command exited with non-zero status 1
                Command being timed: "tabix -p vcf /cactus/workDir/f0932e5ae3f857bb9f9de2edeb1e5a23/884c/3277/tmp5nikx0gc/merged.vcf.gz"
                User time (seconds): 85.55
                System time (seconds): 4.11
                Percent of CPU this job got: 68%
                Elapsed (wall clock) time (h:mm:ss or m:ss): 2:11.29
                Average shared text size (kbytes): 0
                Average unshared data size (kbytes): 0
                Average stack size (kbytes): 0
                Average total size (kbytes): 0
                Maximum resident set size (kbytes): 62100
                Average resident set size (kbytes): 0
                Major (requiring I/O) page faults: 3
                Minor (reclaiming a frame) page faults: 15190
                Voluntary context switches: 2908
                Involuntary context switches: 391
                Swaps: 0
                File system inputs: 4338128
                File system outputs: 0
                Socket messages sent: 0
                Socket messages received: 0
                Signals delivered: 0
                Page size (bytes): 4096
                Exit status: 1

        [2023-01-30T17:24:44+0000] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host node-5
<=========
cwltool is not installed.
[2023-01-30T17:26:24+0000] [MainThread] [I] [toil.worker] Redirecting logging to /cactus/workDir/f0932e5ae3f857bb9f9de2edeb1e5a23/6e02/worker_log.txt
[2023-01-30T17:26:25+0000] [MainThread] [W] [toil.job] Due to failure we are reducing the remaining try count of job 'make_vcf' kind-make_vcf/instance-buqavpq9 v2 with ID kind-make_vcf/instance-buqavpq9 >
[2023-01-30T17:26:25+0000] [MainThread] [W] [toil.job] We have increased the disk of the failed job 'make_vcf' kind-make_vcf/instance-buqavpq9 v2 to the default of 1000000000000 bytes
[2023-01-30T17:26:26+0000] [MainThread] [I] [toil.leader] Issued job 'make_vcf' kind-make_vcf/instance-buqavpq9 v3 with job batch system ID: 21 and cores: 30, disk: 931.3 Gi, and memory: 117.3 Gi
[2023-01-30T17:55:08+0000] [MainThread] [I] [toil.leader] 1 jobs are running, 1 jobs are issued and waiting to run

Thanks for any help you can provide.

glennhickey commented 1 year ago

This is a new one for me. The relevant part of the log is

E::hts_idx_check_range] Reg> tbx_index_build failed: /cactus/workDir/f0932e5ae3f857bb9f9de2edeb1e5a23/884c/3277/tmp5nikx0gc/merged.vcf.gz

The only explanation I could think of is that the input VCF is somehow corrupt. Are you sure there were no errors up stream of this?

It's probably worth running again with --restart (after making sure you have enough disk space).

I'd recommend using --disableCaching --cleanWorkdir never (without -clean always --cleanWorkDir always) which would keep "merged.vcf.gz" around for you to look at for clues.

I also strongly suggest upgrading to the latest Cactus release ASAP. I don't recall any changes that were specifically targeted at this issue (which I haven't seen before), but it does fix many other bugs and use a much newer version of vg which may help here. The big catch is that you can't run cactus-graphmap-join from verions >= 2.3.0 with output from cactus-align on older versions: you'd need to rerun cactus-align (or cactus-align-batch too). Also, the interface to cactus-graphmap-join has be simplified, so you'd have to check the docs before using it...

jdamas13 commented 1 year ago

I am getting a similar error, but in my case, I believe it is because I'm working with very long chromosomes. I usually have to use csi index instead of tbi.

RuntimeError: Command /usr/bin/time -v tabix -p vcf /tmp/20fcb443cab359c7a9dad96a91865805/4dc7/47f6/tmpew8b3s4b/merged.vcf.gz exited 1: stdout=None, stderr=[E::hts_idx_check_range] Region 536885449..536885452 cannot be stored in a tbi index. Try using a csi index with min_shift = 14, n_lvls >= 6
        tbx_index_build failed: /tmp/20fcb443cab359c7a9dad96a91865805/4dc7/47f6/tmpew8b3s4b/merged.vcf.gz
        Command exited with non-zero status 1
                Command being timed: "tabix -p vcf /tmp/20fcb443cab359c7a9dad96a91865805/4dc7/47f6/tmpew8b3s4b/merged.vcf.gz"

glennhickey commented 1 year ago

Goodness, will try to add support soon. In the meantime, i think you'll have to make the VCF yourself if you want one:

run with --gbz instead of --vcf
then run vg deconstruct <graph.gbz> -r <graph.snarls> -P <reference sample> -a -t <cores> | bgzip --threads <cores> (best to use the version of vg included in cactus) to make the VCF
also recommended to use vcfbub --input <graph.vcf.gz> --max-ref-length 100000 --max-level-0 | bgzip to make a simplified VCF (this is what graphmap-join would have done)

brettChapman commented 1 year ago

VCF indexing would also be an issue for me because my plant chromosome lengths are too long and I need to use CSI indexing as well. I'll most likely get the same error with cactus-graphmap-join as well. I'll omit the --vcf flag until there's an update.

glennhickey commented 1 year ago

Don't have tons of time now but am preparing a new release, and just made tabix failures a warning (instead of fatal error) via https://github.com/ComparativeGenomicsToolkit/cactus/pull/979/commits/093b9badb3298361c2f89b518672fe0f0a877f21. So this issue will at least be less annoying as of this release...

jdamas13 commented 1 year ago

Thanks, Glenn! I'll create the vcf as you suggested in the meanwhile.

brettChapman commented 1 year ago

Hi @glennhickey I've ran Minigraph-cactus again, and then cactus-graphmap-join, left the --vcf option off:

I'm now working with a heavily reduced pangenome size, including only a selection 5 different genomes (from original 20). Running each chromosome separately, then merging them using the join function.

srun -n 1 singularity exec --cleanenv \
                        --no-home \
                        --overlay ${JOBSTORE_IMAGE} \
                        --bind ${CACTUS_SCRATCH}/tmp:/tmp \
                        ${CACTUS_IMAGE} cactus-graphmap-join /cactus/jobStore --indexCores 16 --vg ${original_folder}/*H/*.vg --outDir ${original_folder}/barley-pg --outName barley-pg --reference Morex_v3 --clip 10000 --giraffe full clip filter --chrom-vg --gbz --gfa --disableCaching --workDir=/cactus/workDir --clean always --cleanWorkDir always --defaultDisk 1000G --maxDisk 1000G --maxCores 32 --maxMemory 126G --defaultMemory 126G

I'm using a build from Docker hub dated from February 3rd.

I eventually get this error. It looks like its struggling to index using vg.

[2023-05-27T22:14:13+0000] [MainThread] [I] [toil.common] Successfully deleted the job store: FileJobStore(/cactus/jobStore)
Traceback (most recent call last):
  File "/home/cactus/cactus_env/bin/cactus-graphmap-join", line 8, in <module>
    sys.exit(main())
  File "/home/cactus/cactus_env/lib/python3.10/site-packages/cactus/refmap/cactus_graphmap_join.py", line 194, in main
    graphmap_join(options)
  File "/home/cactus/cactus_env/lib/python3.10/site-packages/cactus/refmap/cactus_graphmap_join.py", line 226, in graphmap_join
    wf_output = toil.start(Job.wrapJobFn(graphmap_join_workflow, options, config, vg_ids, hal_ids))
  File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/common.py", line 1017, in start
    return self._runMainLoop(rootJobDescription)
  File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/common.py", line 1461, in _runMainLoop
    jobCache=self._jobCache).run()
  File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/leader.py", line 330, in run
    raise FailedJobsException(self.jobStore, failed_jobs, exit_code=self.recommended_fail_exit_code)
toil.leader.FailedJobsException: The job store '/cactus/jobStore' contains 4 failed jobs: 'Job' kind-graphmap_join_workflow/instance-5ejmt78h v3, 'make_vg_indexes' kind-make_vg_indexes/instance-7nd327t2 v6, 'Job' k>
Log from job "'make_vg_indexes' kind-make_vg_indexes/instance-7nd327t2 v6" follows:
=========>
[2023-05-20T19:23:07+0000] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG---
        [2023-05-20T19:23:07+0000] [MainThread] [I] [toil] Running Toil version 5.8.0-79792b70098c4c18d1d2c2832b72085893f878d1 on host node-11.
        [2023-05-20T19:23:07+0000] [MainThread] [I] [toil.worker] Working on job 'make_vg_indexes' kind-make_vg_indexes/instance-7nd327t2 v4
        [2023-05-20T19:23:08+0000] [MainThread] [I] [toil.worker] Loaded body Job('make_vg_indexes' kind-make_vg_indexes/instance-7nd327t2 v4) from description 'make_vg_indexes' kind-make_vg_indexes/instance-7nd327>
        [2023-05-20T19:28:09+0000] [MainThread] [I] [cactus.shared.common] Running the command ['grep', '-v', '^W     _MINIGRAPH_', '/cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/barley_pan>
        [2023-05-20T19:28:09+0000] [MainThread] [I] [toil-rt] 2023-05-20 19:28:09.659310: Running the command: "grep -v ^W     _MINIGRAPH_ /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/barl>
        [2023-05-20T19:28:27+0000] [MainThread] [W] [toil.lib.humanize] Deprecated toil method.  Please use "toil.lib.conversions.bytes2human()" instead."
        [2023-05-20T19:28:27+0000] [MainThread] [I] [toil-rt] 2023-05-20 19:28:27.372914: Successfully ran: "grep -v '^W     _MINIGRAPH_' /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/barle>
        [2023-05-20T19:36:30+0000] [MainThread] [I] [toil-rt] 2023-05-20 19:36:30.674846: Running the command: "grep -v ^H\|^W     _MINIGRAPH_ /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/>
        [2023-05-20T19:37:06+0000] [MainThread] [W] [toil.lib.humanize] Deprecated toil method.  Please use "toil.lib.conversions.bytes2human()" instead."
        [2023-05-20T19:37:06+0000] [MainThread] [I] [toil-rt] 2023-05-20 19:37:06.149221: Successfully ran: "grep -v '^H\|^W     _MINIGRAPH_' /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/b>
        [2023-05-20T19:45:04+0000] [MainThread] [I] [toil-rt] 2023-05-20 19:45:04.839426: Running the command: "grep -v ^H\|^W     _MINIGRAPH_ /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/>
        [2023-05-20T19:45:45+0000] [MainThread] [W] [toil.lib.humanize] Deprecated toil method.  Please use "toil.lib.conversions.bytes2human()" instead."
        [2023-05-20T19:45:45+0000] [MainThread] [I] [toil-rt] 2023-05-20 19:45:45.249368: Successfully ran: "grep -v '^H\|^W     _MINIGRAPH_' /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/b>
        [2023-05-20T19:49:08+0000] [MainThread] [I] [toil-rt] 2023-05-20 19:49:08.095246: Running the command: "grep -v ^H\|^W     _MINIGRAPH_ /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/>
        [2023-05-20T19:49:35+0000] [MainThread] [W] [toil.lib.humanize] Deprecated toil method.  Please use "toil.lib.conversions.bytes2human()" instead."
        [2023-05-20T19:49:35+0000] [MainThread] [I] [toil-rt] 2023-05-20 19:49:35.951295: Successfully ran: "grep -v '^H\|^W     _MINIGRAPH_' /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/b>
        [2023-05-20T19:56:46+0000] [MainThread] [I] [toil-rt] 2023-05-20 19:56:46.044243: Running the command: "grep -v ^H\|^W     _MINIGRAPH_ /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/>
        [2023-05-20T19:57:19+0000] [MainThread] [W] [toil.lib.humanize] Deprecated toil method.  Please use "toil.lib.conversions.bytes2human()" instead."
        [2023-05-20T19:57:19+0000] [MainThread] [I] [toil-rt] 2023-05-20 19:57:19.308827: Successfully ran: "grep -v '^H\|^W     _MINIGRAPH_' /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/b>
        [2023-05-20T20:04:16+0000] [MainThread] [I] [toil-rt] 2023-05-20 20:04:16.517717: Running the command: "grep -v ^H\|^W     _MINIGRAPH_ /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/>
        [2023-05-20T20:04:46+0000] [MainThread] [W] [toil.lib.humanize] Deprecated toil method.  Please use "toil.lib.conversions.bytes2human()" instead."
        [2023-05-20T20:04:46+0000] [MainThread] [I] [toil-rt] 2023-05-20 20:04:46.968221: Successfully ran: "grep -v '^H\|^W     _MINIGRAPH_' /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/b>
        [2023-05-20T20:12:25+0000] [MainThread] [I] [toil-rt] 2023-05-20 20:12:25.119664: Running the command: "grep -v ^H\|^W     _MINIGRAPH_ /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/>
        [2023-05-20T20:12:54+0000] [MainThread] [W] [toil.lib.humanize] Deprecated toil method.  Please use "toil.lib.conversions.bytes2human()" instead."
        [2023-05-20T20:12:54+0000] [MainThread] [I] [toil-rt] 2023-05-20 20:12:54.410958: Successfully ran: "grep -v '^H\|^W     _MINIGRAPH_' /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/b>
        [2023-05-20T20:12:54+0000] [MainThread] [I] [toil-rt] 2023-05-20 20:12:54.971630: Running the command: "vg gbwt -G /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/full.merged.gfa --gb>
        [2023-05-20T20:31:11+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-05-20T20:31:11+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-6vu84z02/file-e8bed1e0ee5f4a009b96c4632f1f35a8/barley_pangenome_chr1H.v>
        [2023-05-20T20:31:11+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-47g8e_uq/file-645a4dcb2d084bf0b8df58b2ac15f1c9/barley_pangenome_chr2H.v>
        [2023-05-20T20:31:11+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-okavp214/file-ee2c6634bf0d4bbdb697f8b6052aa902/barley_pangenome_chr3H.v>
        [2023-05-20T20:31:11+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-kir7bvk5/file-de6f2c7eb0254ad48e3db026f2135bad/barley_pangenome_chr4H.v>
        [2023-05-20T20:31:11+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-vgpxtvj0/file-125868b731484765b9d55d0660a281f6/barley_pangenome_chr5H.v>
        [2023-05-20T20:31:11+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-i8nqljgx/file-860a3b21c9494ffdae5c5538ea7fc508/barley_pangenome_chr6H.v>
        [2023-05-20T20:31:11+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-vzsfyaj6/file-c56a321036874ba1bfc2d82c8e773828/barley_pangenome_chr7H.v>
        Traceback (most recent call last):
          File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/worker.py", line 403, in workerScript
            job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer)
          File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/job.py", line 2727, in _runner
            returnValues = self._run(jobGraph=None, fileStore=fileStore)
          File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/job.py", line 2644, in _run
            return self.run(fileStore)
          File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/job.py", line 2875, in run
            rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
          File "/home/cactus/cactus_env/lib/python3.10/site-packages/cactus/refmap/cactus_graphmap_join.py", line 533, in make_vg_indexes
            cactus_call(parameters=['vg', 'gbwt', '-G', merge_gfa_path, '--gbz-format', '-g', gbz_path])
          File "/home/cactus/cactus_env/lib/python3.10/site-packages/cactus/shared/common.py", line 839, in cactus_call
            raise RuntimeError("{}Command {} exited {}: {}".format(sigill_msg, call, process.returncode, out))
        RuntimeError: Command /usr/bin/time -v vg gbwt -G /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/full.merged.gfa --gbz-format -g /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e>
          what():  MetadataBuilder: Duplicate path for  sample _MINIGRAPH_ contig s1 phase 0 count 0
        ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
        Stack trace path: /tmp/vg_crash_Gr5aXN/stacktrace.txt
        Please include the stack trace file in your bug report!
        Command exited with non-zero status 134
Command being timed: "vg gbwt -G /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/full.merged.gfa --gbz-format -g /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmp>
                User time (seconds): 596.09
                System time (seconds): 42.62
                Percent of CPU this job got: 58%
                Elapsed (wall clock) time (h:mm:ss or m:ss): 18:16.69
                Average shared text size (kbytes): 0
                Average unshared data size (kbytes): 0
                Average stack size (kbytes): 0
                Average total size (kbytes): 0
                Maximum resident set size (kbytes): 96789080
                Average resident set size (kbytes): 0
                Major (requiring I/O) page faults: 68586
                Minor (reclaiming a frame) page faults: 25226196
                Voluntary context switches: 89310
                Involuntary context switches: 1773
                Swaps: 0
                File system inputs: 44736512
                File system outputs: 8
                Socket messages sent: 0
                Socket messages received: 0
                Signals delivered: 0
                Page size (bytes): 4096
                Exit status: 134

        [2023-05-20T20:31:14+0000] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host node-11
<=========
srun: error: node-11: task 0: Exited with exit code 1

Thanks

glennhickey commented 1 year ago

I think the issue is in ${original_folder}/*H/*.vg. The wildcard must be picking up multiple versions of the same chromsome, which is triggering the duplicate path error you are seeing.

I'd strongly recommend switching to the latest release and considering the new interface which will make issues like this much more avoidable (albeit at the cost of some modularity).

brettChapman commented 1 year ago

Hi @glennhickey ok thanks. Could I simply run the join function with the new release or rerun everything again, including generating each chromosome graph?

glennhickey commented 1 year ago

Just to be clear: I think the error in question is caused by the wildcards in your command, and not the release version.

But yes, I think you should be able to upgrade to the latest release and rerun join. Compatibility for doing this broke in v2.3.0 but I think both your versions are newer.

Still, there have been a number of fixes, so if it's not too inconvenient, you will probably get at least a slightly better result by rerunning from scratch.

brettChapman commented 1 year ago

Hi @glennhickey if the wild card is the main problem, how should I pass the *.vg graphs to the --vg parameter?

In my new run I am putting all the vg graphs into a single folder and simply passing /vg_graphs/*.vg to the --vg parameter. Would that work?

glennhickey commented 1 year ago

It's not the wildcard in and of itself that's the problem. It's your use of two of them to pull all the vg files from potentially several directories that worries me ${original_folder}/*H/*.vg. Since you are getting an error message that would be caused by, say, passing in two chr1.vg files to cactus-graphmap-join. I guess you can verify this by looking at the log of your failing run: at the beginning it logs every input file.

brettChapman commented 1 year ago

Hi @glennhickey I had a look at the log file, and the *.vg files appear ok and not repeated, so it was unlikely the problem.

I noticed that the new behavior of minigraph-cactus is to order the alignment based on mash distance. Is it possible to disable this behavior and align based on the order of the input guide tree? We have predetermined the best order, and would like to use that order.

Thanks.

glennhickey commented 1 year ago

I see. Well, if you are able to share data that reproduces that error (preferably with the latest release), I'd be very curious to take a look.

Disable mash and use the input order by setting minigraphSortInput="0" in the cactus_progressive_config.xml

You can either do that directly in your cactus installation, or make a copy, edit it, then pass it with --configFile to cactus-pangenome / cactus-minigraph.

brettChapman commented 1 year ago

Thanks. I'll rerun again without mash distance, and using the latest build. If I have a similar error again I'll share the data with you.

brettChapman commented 1 year ago

Hi @glennhickey

I've managed to get the graph complete with the join method, using my own alignment order.

I've found that as I'm manually joining the chromosome graphs through this method I ended up having chr1H based as the reference at the end and all other paths (including all other Morex chromosomes as the haplotypes - I ran vg paths --metadata). I'm now rerunning with --reference Morex_v3_chr1H Morex_v3_chr2H.....etc. Should this now use all chromosomes from Morex as a reference?

I've been trying to run vg rna to generate a splice graph, and I think it's been crashing (https://github.com/vgteam/vg/issues/3997) as only chromosome 1H on Morex is the reference.

Thanks.

glennhickey commented 1 year ago

--reference should be based on the sample name in your original seqfile. It will be used to check prefixes of path names in the various vg files (which should look like # etc. If you find yourself specifying it per-chromosome, something must be wrong.

brettChapman commented 1 year ago

The seqfile for chromosome 1H looks like this:

Morex_v3_chr1H  /data/pangenome_phase2/minigraph-cactus_splice_graph/1H/Morex_v3_chr1H.fasta
FT11_chr1H      /data/pangenome_phase2/minigraph-cactus_splice_graph/1H/FT11_chr1H.fasta
RGT_Planet_chr1H        /data/pangenome_phase2/minigraph-cactus_splice_graph/1H/RGT_Planet_chr1H.fasta
HOR_13942_chr1H /data/pangenome_phase2/minigraph-cactus_splice_graph/1H/HOR_13942_chr1H.fasta
Akashinriki_chr1H       /data/pangenome_phase2/minigraph-cactus_splice_graph/1H/Akashinriki_chr1H.fasta

And for each consecutive chromosome its the same. I then run join on each of the resulting VG graphs and specify --reference Morex_v3_chr1H Morex_v3_chr2H Morex_v3_chr3H Morex_v3_chr4H .....

Is that right?

Should my seqfile look different? I noticed on the updated documentation it no longer requires the .0 at the end of each path to specify haplotype.

brettChapman commented 1 year ago

For each of the chromosome graphs before getting to the join stage I do specify the Morex reference e.g. --reference Morex_v3_chr1H

brettChapman commented 1 year ago

Prior to this I was running the join stage with --reference Morex_v3 but it appears it only gave me Morex_v3_chr1H as the single reference, and all other paths were haplotypes

brettChapman commented 1 year ago

I've decided to try and rerun each chromosome like this prior to running join with --reference Morex_v3 hopefully its correct

Morex_v3  /data/pangenome_phase2/minigraph-cactus_splice_graph/1H/Morex_v3_chr1H.fasta
FT11      /data/pangenome_phase2/minigraph-cactus_splice_graph/1H/FT11_chr1H.fasta
RGT_Planet        /data/pangenome_phase2/minigraph-cactus_splice_graph/1H/RGT_Planet_chr1H.fasta
HOR_13942 /data/pangenome_phase2/minigraph-cactus_splice_graph/1H/HOR_13942_chr1H.fasta
Akashinriki       /data/pangenome_phase2/minigraph-cactus_splice_graph/1H/Akashinriki_chr1H.fasta

This way each sample name is the same across the VG graphs, instead of each chromosome being its own sample.

glennhickey commented 1 year ago

In cactus-graphmap-join, as far as I can remember anyway, --reference is treated as a prefix. So in your case --reference Morex should be enough to treat anything beginning with Morex as the reference.

I'm not quite sure why you seem to have manually split your chromosomes, but I don't see why this should cause issues with join...

brettChapman commented 1 year ago

Thanks, I'll try using --reference with a prefix, and see how it goes.

I manually split my chromosomes to reduce memory overhead, as I run each chromosome on a different node of my cluster. Scaling up to many more genomes with included all chromosomes would push the memory limits of a single node on my cluster.

glennhickey commented 1 year ago

Got it. As of this week, Cactus supports Slurm, so if your cluster uses Slurm, then hopefully going forward you won't need to do that manually.

amizeranschi commented 1 year ago

Hi @glennhickey

Any chance of supporting HTCondor as well, in addition to Slurm?

amizeranschi commented 1 year ago

Understandable, @glennhickey, and thanks for the feedback.

At this point, HTCondor is beginning to feel like a crutch and it looks like we should be looking towards Slurm. I mean, everything out there seems to be supporting Slurm nowadays.

glennhickey commented 1 year ago

Yeah, sorry, for us, it's really hard to support a cluster environment that we don't have access to.

We just got a Slurm cluster, so we're focusing on that. This is in part selfish: we're moving most of our production from the cloud to our Slurm cluster; but also by necessity: we can only meaningfully debug problems we can reproduce.

ComparativeGenomicsToolkit / cactus

cactus-graphmap-join error when indexing with tabix #915