Closed francicco closed 2 years ago
This seems to be a similar issue to #688 and (your) #585. As mentioned in #688, it looks like it's crashing when parsing the XML file (--params
). I can only speculate that it's due to some kind of OS compatibility issue (which OS are you using?) -- if so, the examples/evolverMammals
test should fail in the same way (does it?).
In recent releases, cactus_consolidated
comes with debug symbols built in, so you could (as touched on in #688) try to run it in gdb:
mkdir -p ./temp
--disableCaching --cleanWorkDir never --workDir ./temp
'--sequences' 'Mmes /tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpf2c_fbdz.tmp Mpol /tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpn5vzfate.tmp' '--speciesTree' '(Mmes:1.0Mpol:1.0)Anc0;' '--logLevel' 'INFO' '--alignments' '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpmxlgglzd.tmp' '--params' '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpx2mv_oud.tmp' '--outputFile' '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpf5nzug92.tmp' '--outputHalFastaFile' '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpn6zj3ag7.tmp' '--outputReferenceFile' '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpcx3kxzz_.tmp' '--referenceEvent' 'Anc0' '--secondaryAlignments' '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpigeb36_j.tmp' '--threads' '256'
gdb cactus_consolidated
then inside type run ARGUMENTS
where ARGUEMNTS
is what you copied above
Hi Glenn,
Thanks a lot for replying. These are the specs for the cluster I'm using:
BluePebble a non-homogenous cluster, meaning that the nodes vary slightly.
Standard Nodes
Approx 170 nodes in use with 24, 28 or 32 cores (majority are 24 cores)
The majority of nodes have 192GB of RAM
Next generation 100 Gbit Ethernet switches (nodes are connected at 10 Gbit or 25 Gbit, no InfiniBand interconnect)
GPU Nodes
18 GPU enabled nodes: NVIDIA RTX2080Ti (each GPU node has 4 cgpu cards and 96GB RAM)
Operating System
GNU/Linux ([CentOS 7](https://www.centos.org/centos-linux/))
Job scheduler
[Slurm](https://slurm.schedmd.com/)
The test data sometimes works, not always, I wonder if some nodes work differently. Now for example it didn't
=========>
[2022-06-30T07:25:02+0100] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG---
[2022-06-30T07:25:02+0100] [MainThread] [I] [toil] Running Toil version 5.6.0-c34146a6437e4407a61e946e968bcce67a0ebbca on host bp1-compute194.data.bp.acrc.priv.
[2022-06-30T07:25:02+0100] [MainThread] [I] [toil.worker] Working on job 'CactusConsolidated' kind-CactusConsolidated/instance-0eugzohk v1
[2022-06-30T07:25:03+0100] [MainThread] [I] [toil.worker] Loaded body Job('CactusConsolidated' kind-CactusConsolidated/instance-0eugzohk v1) from description 'CactusConsolidated' kind-CactusConsolidated/instance-0eugzohk v1
[2022-06-30T07:25:03+0100] [MainThread] [I] [toil.statsAndLogging] Alignments file: /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpvhi5g9m3.tmp
[2022-06-30T07:25:03+0100] [MainThread] [W] [root] Deprecated toil method. Please call "logging.getLevelName" directly.
[2022-06-30T07:25:03+0100] [MainThread] [I] [cactus.shared.common] Running the command ['cactus_consolidated', '--sequences', 'simCow_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp947xc5g7.tmp simDog_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpjlc53zs8.tmp simHuman_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp6l93ispp.tmp simMouse_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpfrgnibon.tmp simRat_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmplrz4xzc3.tmp', '--speciesTree', '((simHuman_chr6:0.144018,(simMouse_chr6:0.084509,simRat_chr6:0.091589)mr:0.271974)Anc1:0.020593,(simCow_chr6:0.18908,simDog_chr6:0.16303)Anc2:0.032898)Anc0;', '--logLevel', 'INFO', '--alignments', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpvhi5g9m3.tmp', '--params', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpg265ulb5.tmp', '--outputFile', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpcyi5gmn9.tmp', '--outputHalFastaFile', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpuytjrnpl.tmp', '--outputReferenceFile', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp49qlph8y.tmp', '--outgroupEvents', 'simHuman_chr6 simMouse_chr6 simRat_chr6', '--referenceEvent', 'Anc2', '--secondaryAlignments', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp5o9bllxp.tmp', '--threads', '10']
[2022-06-30T07:25:03+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2022-06-30T07:25:03+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-ProgressiveUp/instance-lh6hgfs9/cleanup/file-cbd70502ca24483f8493fa5185664d0a/tmp1jkp3f6n.tmp' to path '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp947xc5g7.tmp'
[2022-06-30T07:25:03+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-ProgressiveUp/instance-lh6hgfs9/cleanup/file-45eb64e42f46494787f9906043b8fee3/tmps8owdt2f.tmp' to path '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpjlc53zs8.tmp'
[2022-06-30T07:25:03+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-TrimAndRecurseOnOutgroups/instance-oqake763/file-78d49b6ce21c4e4f8662015782cac285/tmphvkn32y4.tmp' to path '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp6l93ispp.tmp'
[2022-06-30T07:25:03+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-TrimAndRecurseOnOutgroups/instance-zfphg50z/file-c81b6fcba7274b37880ff7eab564f761/tmppzrsg6va.tmp' to path '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpfrgnibon.tmp'
[2022-06-30T07:25:03+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-TrimAndRecurseOnOutgroups/instance-sgmxpepc/file-b275c643915442eba4ffe7319a87211c/tmpt8b1t9hy.tmp' to path '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmplrz4xzc3.tmp'
[2022-06-30T07:25:03+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-mappingQualityRescoring/instance-58omarjs/file-2605b2eda35242789c03603f32574929/tmphyz4_gwu.tmp' to path '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpvhi5g9m3.tmp'
[2022-06-30T07:25:03+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-mappingQualityRescoring/instance-58omarjs/file-b5d7b8fc7ee140249f94c6b743659ae5/tmpbhhvxwmd.tmp' to path '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp5o9bllxp.tmp'
Traceback (most recent call last):
File "/user/work/tk19812/software/cactus-bin-v2.1.0/cactus_env/lib/python3.8/site-packages/toil/worker.py", line 405, in workerScript
job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer)
File "/user/work/tk19812/software/cactus-bin-v2.1.0/cactus_env/lib/python3.8/site-packages/cactus/shared/common.py", line 932, in _runner
super(RoundedJob, self)._runner(*args, jobStore=jobStore,
File "/user/work/tk19812/software/cactus-bin-v2.1.0/cactus_env/lib/python3.8/site-packages/toil/job.py", line 2399, in _runner
returnValues = self._run(jobGraph=None, fileStore=fileStore)
File "/user/work/tk19812/software/cactus-bin-v2.1.0/cactus_env/lib/python3.8/site-packages/toil/job.py", line 2317, in _run
return self.run(fileStore)
File "/user/work/tk19812/software/cactus-bin-v2.1.0/cactus_env/lib/python3.8/site-packages/cactus/pipeline/cactus_workflow.py", line 408, in run
messages = runCactusConsolidated(seqMap=seqMap,
File "/user/work/tk19812/software/cactus-bin-v2.1.0/cactus_env/lib/python3.8/site-packages/cactus/shared/common.py", line 184, in runCactusConsolidated
masterMessages = cactus_call(check_output=True, returnStdErr=True, realtimeStderrPrefix='cactus_consolidated({})'.format(referenceEvent),
File "/user/work/tk19812/software/cactus-bin-v2.1.0/cactus_env/lib/python3.8/site-packages/cactus/shared/common.py", line 868, in cactus_call
raise RuntimeError("Command {} signaled {}: {}".format(call, signal.Signals(-process.returncode).name, out))
RuntimeError: Command ['cactus_consolidated', '--sequences', 'simCow_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp947xc5g7.tmp simDog_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpjlc53zs8.tmp simHuman_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp6l93ispp.tmp simMouse_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpfrgnibon.tmp simRat_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmplrz4xzc3.tmp', '--speciesTree', '((simHuman_chr6:0.144018,(simMouse_chr6:0.084509,simRat_chr6:0.091589)mr:0.271974)Anc1:0.020593,(simCow_chr6:0.18908,simDog_chr6:0.16303)Anc2:0.032898)Anc0;', '--logLevel', 'INFO', '--alignments', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpvhi5g9m3.tmp', '--params', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpg265ulb5.tmp', '--outputFile', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpcyi5gmn9.tmp', '--outputHalFastaFile', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpuytjrnpl.tmp', '--outputReferenceFile', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp49qlph8y.tmp', '--outgroupEvents', 'simHuman_chr6 simMouse_chr6 simRat_chr6', '--referenceEvent', 'Anc2', '--secondaryAlignments', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp5o9bllxp.tmp', '--threads', '10'] signaled SIGSEGV: stdout=, stderr=Params file: /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpg265ulb5.tmp
Output file string : /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpcyi5gmn9.tmp
Output hal fasta file string : /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpuytjrnpl.tmp
Output reference fasta file string : /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp49qlph8y.tmp
Sequence files and events: simCow_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp947xc5g7.tmp simDog_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpjlc53zs8.tmp simHuman_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp6l93ispp.tmp simMouse_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpfrgnibon.tmp simRat_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmplrz4xzc3.tmp
Alignments file: /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpvhi5g9m3.tmp
Secondary alignments file: /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp5o9bllxp.tmp
Constraint alignments file: (null)
Species tree: ((simHuman_chr6:0.144018,(simMouse_chr6:0.084509,simRat_chr6:0.091589)mr:0.271974)Anc1:0.020593,(simCow_chr6:0.18908,simDog_chr6:0.16303)Anc2:0.032898)Anc0;
Outgroup events: simHuman_chr6 simMouse_chr6 simRat_chr6
Reference event: Anc2
[2022-06-30T07:25:03+0100] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host bp1-compute194.data.bp.acrc.priv
<=========
I tried your suggestion to run cactus_consolidated
but I can't find gdb
Help please! Best, F
Hi,
I found out that in some nodes of the cluster cactus_consolidated
crashes
Attached:
slurm-2216850.out.gz
another type of error, I think might be related to memory, because cactus_consolidated
was running.
Cheers F
I'm a little confused. That first error (SIGSEGV when parsing XML) strikes me as some kind of system incompatibility (that I unfortunately do not yet understand enough to debug). The latest error in your attached log (SIGKILL during bar) is indeed almost certainly due to lack of memory. The two errors are unrelated -- I guess you have some nodes in your cluster that are incompatible with the Cactus binaries and some that don't have enough memory to run your data?
I guess you have some nodes in your cluster that are incompatible with the Cactus binaries and some that don't have enough memory to run your data?
Yes, I came to the same conclusion. What would be the best way to deal with that incompatibility? F
Either not use those nodes or try docker/singularity for the binaries.
That is the node that has enough memory (10 genomes of 400Mb each), and I cannot use docker or singularity. The IT doesn't want to install either of them. Compiling the binaries would solve the problem?
Thanks a lot F
If you are able to compile the binaries on that particular node (or machine with same OS/system libraries) I would expect that to solve the problem.
You can also try (renaming then) using this cactus_consolidated
:
http://public.gi.ucsc.edu/~hickey/debug/cactus_consolidated.pic
It was made adding --with-pic
to libxml2's configuration.
Or this one http://public.gi.ucsc.edu/~hickey/debug/cactus_consolidated.12
It was made by upgrading libxml2 from 2.7.2 to 2.9.12
Glann,
Thanks a lot, I'll try those two first (are already binaries, right?). I'll let you know
In case I'll try to compile it. Can I only compile cactus_consolidated
?
Can help me with that? Previous version of cacuts were so hard to compile...
Cheer F
@glennhickey I'm so glad to inform you that /cactus_consolidated.12 did work! At least for the test run Thanks a lot for your suport! F
Hi,
I'm still struggling to try to solve this problem. I used several version of pre-compiled distribution, but the problem persists Is there a solution to that?
Thanks a lot F