ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
526 stars 111 forks source link

Error in cactus-graphmap-join : ['vg clip -sS - -P CamPar;] exited 141: stderr=vg: invalid option -- 'S' #1457

Closed MatteoSebastianelli closed 3 months ago

MatteoSebastianelli commented 3 months ago

Hello,

I am running the HPRC pangenome pipeline step by step for the first time and managed to run all the initial 4 steps successfully. However I am getting a "vg: invalid option -- 'S'" error in the last step. It seems that the error occurs while clipping the pangenome but I am not sure how I can troubleshoot this. I'd appreciate any help on this and apologies if this topic has been covered elsewhere!

Best regards, Matteo

Below I paste some additional details.

Here is the command I used:

cactus-graphmap-join $PACBIO/pangenome/job_store --vg $PACBIO/pangenome/chrom-alignments/.vg --hal $PACBIO/pangenome/chrom-alignments/.hal \ --vcf --odgi --chrom-og --viz --chrom-vg --gbz --giraffe --outDir $PACBIO/pangenome/finch_pg_final --outName finch_pangenome --reference CamPar

And the tail of the job.err file:

<========= Log from job "'clip_vg' kind-clip_vg/instance-ndbv27xy v6" follows: =========> [2024-08-02T21:44:15+0200] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG--- [2024-08-02T21:44:15+0200] [MainThread] [I] [toil] Running Toil version 6.1.0-3f9cba3766e52866ea80d0934498f8c8f3129c3f on host r1234.uppmax.uu.se. [2024-08-02T21:44:15+0200] [MainThread] [I] [toil.worker] Working on job 'clip_vg' kind-clip_vg/instance-ndbv27xy v4 [2024-08-02T21:44:16+0200] [MainThread] [I] [toil.worker] Loaded body Job('clip_vg' kind-clip_vg/instance-ndbv27xy v4) from description 'clip_vg' kind-clip_vg/instance-ndbv27xy v4 [2024-08-02T21:44:16+0200] [MainThread] [I] [cactus.shared.common] Running the command ['bash', '-c', 'set -eo pipefail && clip-vg /scratch/49030143/toilwf-a88df06839555fb5be656276961094cb/8720/7b67/tmppshwa2h5/21.vg -f -e CamPar -d MINIGRAPH -u 10000 -a MINIGRAPH -o /scratch/49030143/toilwf-a88df06839555fb5be656276961094cb/8720/7b67/tmppshwa2h5/21.vg.clip.bed | vg clip -d 1 - -P CamPar | vg clip -sS - -P CamPar'] [2024-08-02T21:44:16+0200] [MainThread] [I] [toil-rt] 2024-08-02 21:44:16.664370: Running the command: "bash -c set -eo pipefail && clip-vg /scratch/49030143/toilwf-a88df06839555fb5be656276961094cb/8720/7b67/tmppshwa2h5/21.vg -f -e CamPar -d MINIGRAPH -u 10000 -a MINIGRAPH -o /scratch/49030143/toilwf-a88df06839555fb5be656276961094cb/8720/7b67/tmppshwa2h5/21.vg.clip.bed | vg clip -d 1 - -P CamPar | vg clip -sS - -P CamPar" [2024-08-02T21:45:29+0200] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files: [2024-08-02T21:45:29+0200] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-join_vg/instance-e0kfa3xp/file-9adf92b714d543f681e576de587b8c1b/21.vg' to path '/scratch/49030143/toilwf-a88df06839555fb5be656276961094cb/8720/7b67/tmppshwa2h5/21.vg' Traceback (most recent call last): File "/sw/bioinfo/cactus/2.8.2/rackham/cactus_env/lib/python3.8/site-packages/toil/worker.py", line 409, in workerScript job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer) File "/sw/bioinfo/cactus/2.8.2/rackham/cactus_env/lib/python3.8/site-packages/toil/job.py", line 2845, in _runner returnValues = self._run(jobGraph=None, fileStore=fileStore) File "/sw/bioinfo/cactus/2.8.2/rackham/cactus_env/lib/python3.8/site-packages/toil/job.py", line 2761, in _run return self.run(fileStore) File "/sw/bioinfo/cactus/2.8.2/rackham/cactus_env/lib/python3.8/site-packages/toil/job.py", line 2990, in run rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs) File "/sw/bioinfo/cactus/2.8.2/rackham/cactus_env/lib/python3.8/site-packages/cactus/refmap/cactus_graphmap_join.py", line 680, in clip_vg cactus_call(parameters=cmd, outfile=clipped_path, job_memory=job.memory) File "/sw/bioinfo/cactus/2.8.2/rackham/cactus_env/lib/python3.8/site-packages/cactus/shared/common.py", line 910, in cactus_call raise RuntimeError("{}Command {} exited {}: {}".format(sigill_msg, call, process.returncode, out)) RuntimeError: Command ['bash', '-c', 'set -eo pipefail && clip-vg /scratch/49030143/toilwf-a88df06839555fb5be656276961094cb/8720/7b67/tmppshwa2h5/21.vg -f -e CamPar -d MINIGRAPH -u 10000 -a MINIGRAPH -o /scratch/49030143/toilwf-a88df06839555fb5be656276961094cb/8720/7b67/tmppshwa2h5/21.vg.clip.bed | vg clip -d 1 - -P CamPar | vg clip -sS - -P CamPar'] exited 141: stderr=vg: invalid option -- 'S' usage: vg [options] Chop out variation within path intervals of a vg graph

input options: 
    -b, --bed FILE            BED regions corresponding to path intervals of the graph to target
    -r, --snarls FILE         Snarls from vg snarls (recomputed if not given unless -d and -P used).
depth clipping options: 
    -d, --depth N             Clip out nodes and edges with path depth below N
stub clipping options:
    -s, --stubs               Clip out all stubs (nodes with degree-0 sides that aren't on reference)
snarl complexity clipping options: [default mode]
    -n, --max-nodes N         Only clip out snarls with > N nodes
    -e, --max-edges N         Only clip out snarls with > N edges
    -N  --max-nodes-shallow N Only clip out snarls with > N nodes not including nested snarls
    -E  --max-edges-shallow N Only clip out snarls with > N edges not including nested snarls
    -a, --max-avg-degree N    Only clip out snarls with average degree > N
    -l, --max-reflen-prop F   Ignore snarls whose reference traversal spans more than F (0<=F<=1) of the whole reference path
    -L, --max-reflen N        Ignore snarls whose reference traversal spans more than N bp
big deletion edge clipping options:
    -D, --max-deletion-edge N Clip out all edges whose endpoints have distance > N on a reference path
    -c, --context N           Search up to at most N steps from reference paths for candidate deletion edges [1]
general options: 
    -P, --path-prefix STRING  Do not clip out alleles on paths beginning with given prefix (such references must be specified either with -P or -b). Multiple allowed
    -m, --min-fragment-len N  Don't write novel path fragment if it is less than N bp long
    -B, --output-bed          Write BED-style file of affected intervals instead of clipped graph. 
                              Columns 4-9 are: snarl node-count edge-count shallow-node-count shallow-edge-count avg-degree
    -t, --threads N           number of threads to use [default: all available]
    -v, --verbose             Print some logging messages

[2024-08-02T21:45:29+0200] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host r1234.uppmax.uu.se

<========= Log from job "'clip_vg' kind-clip_vg/instance-9nnrm1hr v6" follows: =========> [2024-08-02T21:36:45+0200] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG--- [2024-08-02T21:36:45+0200] [MainThread] [I] [toil] Running Toil version 6.1.0-3f9cba3766e52866ea80d0934498f8c8f3129c3f on host r1234.uppmax.uu.se. [2024-08-02T21:36:45+0200] [MainThread] [I] [toil.worker] Working on job 'clip_vg' kind-clip_vg/instance-9nnrm1hr v4 [2024-08-02T21:36:46+0200] [MainThread] [I] [toil.worker] Loaded body Job('clip_vg' kind-clip_vg/instance-9nnrm1hr v4) from description 'clip_vg' kind-clip_vg/instance-9nnrm1hr v4 [2024-08-02T21:36:46+0200] [MainThread] [I] [cactus.shared.common] Running the command ['bash', '-c', 'set -eo pipefail && clip-vg /scratch/49030143/toilwf-a88df06839555fb5be656276961094cb/3259/0918/tmp3vq5pnlp/24.vg -f -e CamPar -d MINIGRAPH -u 10000 -a MINIGRAPH -o /scratch/49030143/toilwf-a88df06839555fb5be656276961094cb/3259/0918/tmp3vq5pnlp/24.vg.clip.bed | vg clip -d 1 - -P CamPar | vg clip -sS - -P CamPar'] [2024-08-02T21:36:46+0200] [MainThread] [I] [toil-rt] 2024-08-02 21:36:46.628925: Running the command: "bash -c set -eo pipefail && clip-vg /scratch/49030143/toilwf-a88df06839555fb5be656276961094cb/3259/0918/tmp3vq5pnlp/24.vg -f -e CamPar -d MINIGRAPH -u 10000 -a MINIGRAPH -o /scratch/49030143/toilwf-a88df06839555fb5be656276961094cb/3259/0918/tmp3vq5pnlp/24.vg.clip.bed | vg clip -d 1 - -P CamPar | vg clip -sS - -P CamPar" [2024-08-02T21:37:51+0200] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files: [2024-08-02T21:37:51+0200] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-join_vg/instance-e0kfa3xp/file-9dec55f5e4394648b3cad119935e4a78/24.vg' to path '/scratch/49030143/toilwf-a88df06839555fb5be656276961094cb/3259/0918/tmp3vq5pnlp/24.vg' Traceback (most recent call last): File "/sw/bioinfo/cactus/2.8.2/rackham/cactus_env/lib/python3.8/site-packages/toil/worker.py", line 409, in workerScript job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer) File "/sw/bioinfo/cactus/2.8.2/rackham/cactus_env/lib/python3.8/site-packages/toil/job.py", line 2845, in _runner returnValues = self._run(jobGraph=None, fileStore=fileStore) File "/sw/bioinfo/cactus/2.8.2/rackham/cactus_env/lib/python3.8/site-packages/toil/job.py", line 2761, in _run return self.run(fileStore) File "/sw/bioinfo/cactus/2.8.2/rackham/cactus_env/lib/python3.8/site-packages/toil/job.py", line 2990, in run rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs) File "/sw/bioinfo/cactus/2.8.2/rackham/cactus_env/lib/python3.8/site-packages/cactus/refmap/cactus_graphmap_join.py", line 680, in clip_vg cactus_call(parameters=cmd, outfile=clipped_path, job_memory=job.memory) File "/sw/bioinfo/cactus/2.8.2/rackham/cactus_env/lib/python3.8/site-packages/cactus/shared/common.py", line 910, in cactus_call raise RuntimeError("{}Command {} exited {}: {}".format(sigill_msg, call, process.returncode, out)) RuntimeError: Command ['bash', '-c', 'set -eo pipefail && clip-vg /scratch/49030143/toilwf-a88df06839555fb5be656276961094cb/3259/0918/tmp3vq5pnlp/24.vg -f -e CamPar -d MINIGRAPH -u 10000 -a MINIGRAPH -o /scratch/49030143/toilwf-a88df06839555fb5be656276961094cb/3259/0918/tmp3vq5pnlp/24.vg.clip.bed | vg clip -d 1 - -P CamPar | vg clip -sS - -P CamPar'] exited 141: stderr=vg: invalid option -- 'S' usage: vg [options] Chop out variation within path intervals of a vg graph

input options: 
    -b, --bed FILE            BED regions corresponding to path intervals of the graph to target
    -r, --snarls FILE         Snarls from vg snarls (recomputed if not given unless -d and -P used).
depth clipping options: 
    -d, --depth N             Clip out nodes and edges with path depth below N
stub clipping options:
    -s, --stubs               Clip out all stubs (nodes with degree-0 sides that aren't on reference)
snarl complexity clipping options: [default mode]
    -n, --max-nodes N         Only clip out snarls with > N nodes
    -e, --max-edges N         Only clip out snarls with > N edges
    -N  --max-nodes-shallow N Only clip out snarls with > N nodes not including nested snarls
    -E  --max-edges-shallow N Only clip out snarls with > N edges not including nested snarls
    -a, --max-avg-degree N    Only clip out snarls with average degree > N
    -l, --max-reflen-prop F   Ignore snarls whose reference traversal spans more than F (0<=F<=1) of the whole reference path
    -L, --max-reflen N        Ignore snarls whose reference traversal spans more than N bp
big deletion edge clipping options:
    -D, --max-deletion-edge N Clip out all edges whose endpoints have distance > N on a reference path
    -c, --context N           Search up to at most N steps from reference paths for candidate deletion edges [1]
general options: 
    -P, --path-prefix STRING  Do not clip out alleles on paths beginning with given prefix (such references must be specified either with -P or -b). Multiple allowed
    -m, --min-fragment-len N  Don't write novel path fragment if it is less than N bp long
    -B, --output-bed          Write BED-style file of affected intervals instead of clipped graph. 
                              Columns 4-9 are: snarl node-count edge-count shallow-node-count shallow-edge-count avg-degree
    -t, --threads N           number of threads to use [default: all available]
    -v, --verbose             Print some logging messages

[2024-08-02T21:37:51+0200] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host r1234.uppmax.uu.se

<=========

glennhickey commented 3 months ago

cactus v2.8.2 includes vg v1.56.0 which has a -S option in vg clip

stub clipping options:
    -s, --stubs               Clip out all stubs (nodes with degree-0 sides that aren't on reference)
    -S, --stubbify-paths      Clip out all edges necessary to ensure selected reference paths have exactly two stubs

You must have installed cactus wrong and/or have the wrong vg on your path. The best way to install Cactus is to use the BIN-INSTALL instructions on the release page -- I suggest your try doing that.