Open brettChapman opened 1 year ago
This is a new one for me. The relevant part of the log is
E::hts_idx_check_range] Reg> tbx_index_build failed: /cactus/workDir/f0932e5ae3f857bb9f9de2edeb1e5a23/884c/3277/tmp5nikx0gc/merged.vcf.gz
The only explanation I could think of is that the input VCF is somehow corrupt. Are you sure there were no errors up stream of this?
It's probably worth running again with --restart
(after making sure you have enough disk space).
I'd recommend using --disableCaching --cleanWorkdir never
(without -clean always --cleanWorkDir always
) which would keep "merged.vcf.gz" around for you to look at for clues.
I also strongly suggest upgrading to the latest Cactus release ASAP. I don't recall any changes that were specifically targeted at this issue (which I haven't seen before), but it does fix many other bugs and use a much newer version of vg which may help here. The big catch is that you can't run cactus-graphmap-join
from verions >= 2.3.0 with output from cactus-align
on older versions: you'd need to rerun cactus-align
(or cactus-align-batch
too). Also, the interface to cactus-graphmap-join
has be simplified, so you'd have to check the docs before using it...
I am getting a similar error, but in my case, I believe it is because I'm working with very long chromosomes. I usually have to use csi index instead of tbi.
RuntimeError: Command /usr/bin/time -v tabix -p vcf /tmp/20fcb443cab359c7a9dad96a91865805/4dc7/47f6/tmpew8b3s4b/merged.vcf.gz exited 1: stdout=None, stderr=[E::hts_idx_check_range] Region 536885449..536885452 cannot be stored in a tbi index. Try using a csi index with min_shift = 14, n_lvls >= 6
tbx_index_build failed: /tmp/20fcb443cab359c7a9dad96a91865805/4dc7/47f6/tmpew8b3s4b/merged.vcf.gz
Command exited with non-zero status 1
Command being timed: "tabix -p vcf /tmp/20fcb443cab359c7a9dad96a91865805/4dc7/47f6/tmpew8b3s4b/merged.vcf.gz"
Goodness, will try to add support soon. In the meantime, i think you'll have to make the VCF yourself if you want one:
--gbz
instead of --vcf
vg deconstruct <graph.gbz> -r <graph.snarls> -P <reference sample> -a -t <cores> | bgzip --threads <cores>
(best to use the version of vg included in cactus) to make the VCFvcfbub --input <graph.vcf.gz> --max-ref-length 100000 --max-level-0 | bgzip
to make a simplified VCF (this is what graphmap-join would have done)VCF indexing would also be an issue for me because my plant chromosome lengths are too long and I need to use CSI indexing as well. I'll most likely get the same error with cactus-graphmap-join as well. I'll omit the --vcf flag until there's an update.
Don't have tons of time now but am preparing a new release, and just made tabix failures a warning (instead of fatal error) via https://github.com/ComparativeGenomicsToolkit/cactus/pull/979/commits/093b9badb3298361c2f89b518672fe0f0a877f21. So this issue will at least be less annoying as of this release...
Thanks, Glenn! I'll create the vcf as you suggested in the meanwhile.
Hi @glennhickey I've ran Minigraph-cactus again, and then cactus-graphmap-join, left the --vcf option off:
I'm now working with a heavily reduced pangenome size, including only a selection 5 different genomes (from original 20). Running each chromosome separately, then merging them using the join function.
srun -n 1 singularity exec --cleanenv \
--no-home \
--overlay ${JOBSTORE_IMAGE} \
--bind ${CACTUS_SCRATCH}/tmp:/tmp \
${CACTUS_IMAGE} cactus-graphmap-join /cactus/jobStore --indexCores 16 --vg ${original_folder}/*H/*.vg --outDir ${original_folder}/barley-pg --outName barley-pg --reference Morex_v3 --clip 10000 --giraffe full clip filter --chrom-vg --gbz --gfa --disableCaching --workDir=/cactus/workDir --clean always --cleanWorkDir always --defaultDisk 1000G --maxDisk 1000G --maxCores 32 --maxMemory 126G --defaultMemory 126G
I'm using a build from Docker hub dated from February 3rd.
I eventually get this error. It looks like its struggling to index using vg.
[2023-05-27T22:14:13+0000] [MainThread] [I] [toil.common] Successfully deleted the job store: FileJobStore(/cactus/jobStore)
Traceback (most recent call last):
File "/home/cactus/cactus_env/bin/cactus-graphmap-join", line 8, in <module>
sys.exit(main())
File "/home/cactus/cactus_env/lib/python3.10/site-packages/cactus/refmap/cactus_graphmap_join.py", line 194, in main
graphmap_join(options)
File "/home/cactus/cactus_env/lib/python3.10/site-packages/cactus/refmap/cactus_graphmap_join.py", line 226, in graphmap_join
wf_output = toil.start(Job.wrapJobFn(graphmap_join_workflow, options, config, vg_ids, hal_ids))
File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/common.py", line 1017, in start
return self._runMainLoop(rootJobDescription)
File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/common.py", line 1461, in _runMainLoop
jobCache=self._jobCache).run()
File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/leader.py", line 330, in run
raise FailedJobsException(self.jobStore, failed_jobs, exit_code=self.recommended_fail_exit_code)
toil.leader.FailedJobsException: The job store '/cactus/jobStore' contains 4 failed jobs: 'Job' kind-graphmap_join_workflow/instance-5ejmt78h v3, 'make_vg_indexes' kind-make_vg_indexes/instance-7nd327t2 v6, 'Job' k>
Log from job "'make_vg_indexes' kind-make_vg_indexes/instance-7nd327t2 v6" follows:
=========>
[2023-05-20T19:23:07+0000] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG---
[2023-05-20T19:23:07+0000] [MainThread] [I] [toil] Running Toil version 5.8.0-79792b70098c4c18d1d2c2832b72085893f878d1 on host node-11.
[2023-05-20T19:23:07+0000] [MainThread] [I] [toil.worker] Working on job 'make_vg_indexes' kind-make_vg_indexes/instance-7nd327t2 v4
[2023-05-20T19:23:08+0000] [MainThread] [I] [toil.worker] Loaded body Job('make_vg_indexes' kind-make_vg_indexes/instance-7nd327t2 v4) from description 'make_vg_indexes' kind-make_vg_indexes/instance-7nd327>
[2023-05-20T19:28:09+0000] [MainThread] [I] [cactus.shared.common] Running the command ['grep', '-v', '^W _MINIGRAPH_', '/cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/barley_pan>
[2023-05-20T19:28:09+0000] [MainThread] [I] [toil-rt] 2023-05-20 19:28:09.659310: Running the command: "grep -v ^W _MINIGRAPH_ /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/barl>
[2023-05-20T19:28:27+0000] [MainThread] [W] [toil.lib.humanize] Deprecated toil method. Please use "toil.lib.conversions.bytes2human()" instead."
[2023-05-20T19:28:27+0000] [MainThread] [I] [toil-rt] 2023-05-20 19:28:27.372914: Successfully ran: "grep -v '^W _MINIGRAPH_' /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/barle>
[2023-05-20T19:36:30+0000] [MainThread] [I] [toil-rt] 2023-05-20 19:36:30.674846: Running the command: "grep -v ^H\|^W _MINIGRAPH_ /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/>
[2023-05-20T19:37:06+0000] [MainThread] [W] [toil.lib.humanize] Deprecated toil method. Please use "toil.lib.conversions.bytes2human()" instead."
[2023-05-20T19:37:06+0000] [MainThread] [I] [toil-rt] 2023-05-20 19:37:06.149221: Successfully ran: "grep -v '^H\|^W _MINIGRAPH_' /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/b>
[2023-05-20T19:45:04+0000] [MainThread] [I] [toil-rt] 2023-05-20 19:45:04.839426: Running the command: "grep -v ^H\|^W _MINIGRAPH_ /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/>
[2023-05-20T19:45:45+0000] [MainThread] [W] [toil.lib.humanize] Deprecated toil method. Please use "toil.lib.conversions.bytes2human()" instead."
[2023-05-20T19:45:45+0000] [MainThread] [I] [toil-rt] 2023-05-20 19:45:45.249368: Successfully ran: "grep -v '^H\|^W _MINIGRAPH_' /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/b>
[2023-05-20T19:49:08+0000] [MainThread] [I] [toil-rt] 2023-05-20 19:49:08.095246: Running the command: "grep -v ^H\|^W _MINIGRAPH_ /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/>
[2023-05-20T19:49:35+0000] [MainThread] [W] [toil.lib.humanize] Deprecated toil method. Please use "toil.lib.conversions.bytes2human()" instead."
[2023-05-20T19:49:35+0000] [MainThread] [I] [toil-rt] 2023-05-20 19:49:35.951295: Successfully ran: "grep -v '^H\|^W _MINIGRAPH_' /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/b>
[2023-05-20T19:56:46+0000] [MainThread] [I] [toil-rt] 2023-05-20 19:56:46.044243: Running the command: "grep -v ^H\|^W _MINIGRAPH_ /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/>
[2023-05-20T19:57:19+0000] [MainThread] [W] [toil.lib.humanize] Deprecated toil method. Please use "toil.lib.conversions.bytes2human()" instead."
[2023-05-20T19:57:19+0000] [MainThread] [I] [toil-rt] 2023-05-20 19:57:19.308827: Successfully ran: "grep -v '^H\|^W _MINIGRAPH_' /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/b>
[2023-05-20T20:04:16+0000] [MainThread] [I] [toil-rt] 2023-05-20 20:04:16.517717: Running the command: "grep -v ^H\|^W _MINIGRAPH_ /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/>
[2023-05-20T20:04:46+0000] [MainThread] [W] [toil.lib.humanize] Deprecated toil method. Please use "toil.lib.conversions.bytes2human()" instead."
[2023-05-20T20:04:46+0000] [MainThread] [I] [toil-rt] 2023-05-20 20:04:46.968221: Successfully ran: "grep -v '^H\|^W _MINIGRAPH_' /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/b>
[2023-05-20T20:12:25+0000] [MainThread] [I] [toil-rt] 2023-05-20 20:12:25.119664: Running the command: "grep -v ^H\|^W _MINIGRAPH_ /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/>
[2023-05-20T20:12:54+0000] [MainThread] [W] [toil.lib.humanize] Deprecated toil method. Please use "toil.lib.conversions.bytes2human()" instead."
[2023-05-20T20:12:54+0000] [MainThread] [I] [toil-rt] 2023-05-20 20:12:54.410958: Successfully ran: "grep -v '^H\|^W _MINIGRAPH_' /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/b>
[2023-05-20T20:12:54+0000] [MainThread] [I] [toil-rt] 2023-05-20 20:12:54.971630: Running the command: "vg gbwt -G /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/full.merged.gfa --gb>
[2023-05-20T20:31:11+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-05-20T20:31:11+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-6vu84z02/file-e8bed1e0ee5f4a009b96c4632f1f35a8/barley_pangenome_chr1H.v>
[2023-05-20T20:31:11+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-47g8e_uq/file-645a4dcb2d084bf0b8df58b2ac15f1c9/barley_pangenome_chr2H.v>
[2023-05-20T20:31:11+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-okavp214/file-ee2c6634bf0d4bbdb697f8b6052aa902/barley_pangenome_chr3H.v>
[2023-05-20T20:31:11+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-kir7bvk5/file-de6f2c7eb0254ad48e3db026f2135bad/barley_pangenome_chr4H.v>
[2023-05-20T20:31:11+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-vgpxtvj0/file-125868b731484765b9d55d0660a281f6/barley_pangenome_chr5H.v>
[2023-05-20T20:31:11+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-i8nqljgx/file-860a3b21c9494ffdae5c5538ea7fc508/barley_pangenome_chr6H.v>
[2023-05-20T20:31:11+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-vg_to_gfa/instance-vzsfyaj6/file-c56a321036874ba1bfc2d82c8e773828/barley_pangenome_chr7H.v>
Traceback (most recent call last):
File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/worker.py", line 403, in workerScript
job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer)
File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/job.py", line 2727, in _runner
returnValues = self._run(jobGraph=None, fileStore=fileStore)
File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/job.py", line 2644, in _run
return self.run(fileStore)
File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/job.py", line 2875, in run
rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
File "/home/cactus/cactus_env/lib/python3.10/site-packages/cactus/refmap/cactus_graphmap_join.py", line 533, in make_vg_indexes
cactus_call(parameters=['vg', 'gbwt', '-G', merge_gfa_path, '--gbz-format', '-g', gbz_path])
File "/home/cactus/cactus_env/lib/python3.10/site-packages/cactus/shared/common.py", line 839, in cactus_call
raise RuntimeError("{}Command {} exited {}: {}".format(sigill_msg, call, process.returncode, out))
RuntimeError: Command /usr/bin/time -v vg gbwt -G /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/full.merged.gfa --gbz-format -g /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e>
what(): MetadataBuilder: Duplicate path for sample _MINIGRAPH_ contig s1 phase 0 count 0
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Stack trace path: /tmp/vg_crash_Gr5aXN/stacktrace.txt
Please include the stack trace file in your bug report!
Command exited with non-zero status 134
Command being timed: "vg gbwt -G /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmpf8p4d9px/full.merged.gfa --gbz-format -g /cactus/workDir/63894aefc949535a8e8c971570fec58d/044e/6e63/tmp>
User time (seconds): 596.09
System time (seconds): 42.62
Percent of CPU this job got: 58%
Elapsed (wall clock) time (h:mm:ss or m:ss): 18:16.69
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 96789080
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 68586
Minor (reclaiming a frame) page faults: 25226196
Voluntary context switches: 89310
Involuntary context switches: 1773
Swaps: 0
File system inputs: 44736512
File system outputs: 8
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 134
[2023-05-20T20:31:14+0000] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host node-11
<=========
srun: error: node-11: task 0: Exited with exit code 1
Thanks
I think the issue is in ${original_folder}/*H/*.vg
. The wildcard must be picking up multiple versions of the same chromsome, which is triggering the duplicate path error you are seeing.
I'd strongly recommend switching to the latest release and considering the new interface which will make issues like this much more avoidable (albeit at the cost of some modularity).
Hi @glennhickey ok thanks. Could I simply run the join function with the new release or rerun everything again, including generating each chromosome graph?
Just to be clear: I think the error in question is caused by the wildcards in your command, and not the release version.
But yes, I think you should be able to upgrade to the latest release and rerun join. Compatibility for doing this broke in v2.3.0 but I think both your versions are newer.
Still, there have been a number of fixes, so if it's not too inconvenient, you will probably get at least a slightly better result by rerunning from scratch.
Hi @glennhickey if the wild card is the main problem, how should I pass the *.vg graphs to the --vg parameter?
In my new run I am putting all the vg graphs into a single folder and simply passing /vg_graphs/*.vg to the --vg parameter. Would that work?
It's not the wildcard in and of itself that's the problem. It's your use of two of them to pull all the vg files from potentially several directories that worries me ${original_folder}/*H/*.vg
. Since you are getting an error message that would be caused by, say, passing in two chr1.vg
files to cactus-graphmap-join. I guess you can verify this by looking at the log of your failing run: at the beginning it logs every input file.
Hi @glennhickey I had a look at the log file, and the *.vg files appear ok and not repeated, so it was unlikely the problem.
I noticed that the new behavior of minigraph-cactus is to order the alignment based on mash distance. Is it possible to disable this behavior and align based on the order of the input guide tree? We have predetermined the best order, and would like to use that order.
Thanks.
I see. Well, if you are able to share data that reproduces that error (preferably with the latest release), I'd be very curious to take a look.
Disable mash and use the input order by setting minigraphSortInput="0"
in the cactus_progressive_config.xml
You can either do that directly in your cactus installation, or make a copy, edit it, then pass it with --configFile
to cactus-pangenome / cactus-minigraph
.
Thanks. I'll rerun again without mash distance, and using the latest build. If I have a similar error again I'll share the data with you.
Hi @glennhickey
I've managed to get the graph complete with the join method, using my own alignment order.
I've found that as I'm manually joining the chromosome graphs through this method I ended up having chr1H based as the reference at the end and all other paths (including all other Morex chromosomes as the haplotypes - I ran vg paths --metadata). I'm now rerunning with --reference Morex_v3_chr1H Morex_v3_chr2H.....etc. Should this now use all chromosomes from Morex as a reference?
I've been trying to run vg rna to generate a splice graph, and I think it's been crashing (https://github.com/vgteam/vg/issues/3997) as only chromosome 1H on Morex is the reference.
Thanks.
--reference
should be based on the sample name in your original seqfile
. It will be used to check prefixes of path names in the various vg files (which should look like
The seqfile for chromosome 1H looks like this:
Morex_v3_chr1H /data/pangenome_phase2/minigraph-cactus_splice_graph/1H/Morex_v3_chr1H.fasta
FT11_chr1H /data/pangenome_phase2/minigraph-cactus_splice_graph/1H/FT11_chr1H.fasta
RGT_Planet_chr1H /data/pangenome_phase2/minigraph-cactus_splice_graph/1H/RGT_Planet_chr1H.fasta
HOR_13942_chr1H /data/pangenome_phase2/minigraph-cactus_splice_graph/1H/HOR_13942_chr1H.fasta
Akashinriki_chr1H /data/pangenome_phase2/minigraph-cactus_splice_graph/1H/Akashinriki_chr1H.fasta
And for each consecutive chromosome its the same. I then run join on each of the resulting VG graphs and specify --reference Morex_v3_chr1H Morex_v3_chr2H Morex_v3_chr3H Morex_v3_chr4H .....
Is that right?
Should my seqfile look different? I noticed on the updated documentation it no longer requires the .0 at the end of each path to specify haplotype.
For each of the chromosome graphs before getting to the join stage I do specify the Morex reference e.g. --reference Morex_v3_chr1H
Prior to this I was running the join stage with --reference Morex_v3
but it appears it only gave me Morex_v3_chr1H as the single reference, and all other paths were haplotypes
I've decided to try and rerun each chromosome like this prior to running join with --reference Morex_v3
hopefully its correct
Morex_v3 /data/pangenome_phase2/minigraph-cactus_splice_graph/1H/Morex_v3_chr1H.fasta
FT11 /data/pangenome_phase2/minigraph-cactus_splice_graph/1H/FT11_chr1H.fasta
RGT_Planet /data/pangenome_phase2/minigraph-cactus_splice_graph/1H/RGT_Planet_chr1H.fasta
HOR_13942 /data/pangenome_phase2/minigraph-cactus_splice_graph/1H/HOR_13942_chr1H.fasta
Akashinriki /data/pangenome_phase2/minigraph-cactus_splice_graph/1H/Akashinriki_chr1H.fasta
This way each sample name is the same across the VG graphs, instead of each chromosome being its own sample.
In cactus-graphmap-join
, as far as I can remember anyway, --reference
is treated as a prefix. So in your case --reference Morex
should be enough to treat anything beginning with Morex
as the reference.
I'm not quite sure why you seem to have manually split your chromosomes, but I don't see why this should cause issues with join...
Thanks, I'll try using --reference
with a prefix, and see how it goes.
I manually split my chromosomes to reduce memory overhead, as I run each chromosome on a different node of my cluster. Scaling up to many more genomes with included all chromosomes would push the memory limits of a single node on my cluster.
Got it. As of this week, Cactus supports Slurm, so if your cluster uses Slurm, then hopefully going forward you won't need to do that manually.
Hi @glennhickey
Any chance of supporting HTCondor as well, in addition to Slurm?
Understandable, @glennhickey, and thanks for the feedback.
At this point, HTCondor is beginning to feel like a crutch and it looks like we should be looking towards Slurm. I mean, everything out there seems to be supporting Slurm nowadays.
Yeah, sorry, for us, it's really hard to support a cluster environment that we don't have access to.
We just got a Slurm cluster, so we're focusing on that. This is in part selfish: we're moving most of our production from the cloud to our Slurm cluster; but also by necessity: we can only meaningfully debug problems we can reproduce.
Hi
I'm getting an error about tbi indexing during a cactus-graphmap-join run.
I run the following command:
I get this error during the run, but it doesn't kill off the whole job, only one of the toil workers fails:
Thanks for any help you can provide.