ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
503 stars 111 forks source link

Cactus_consolidated error #742

Closed francicco closed 2 years ago

francicco commented 2 years ago

Hi,

I'm still struggling to try to solve this problem. I used several version of pre-compiled distribution, but the problem persists Is there a solution to that?

=========>
        [2022-06-25T23:25:33+0100] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG---
        [2022-06-25T23:25:33+0100] [MainThread] [I] [toil] Running Toil version 5.6.0-c34146a6437e4407a61e946e968bcce67a0ebbca on host bp1-compute194.data.bp.acrc.priv.
        [2022-06-25T23:25:33+0100] [MainThread] [I] [toil.worker] Working on job 'CactusConsolidated' kind-CactusConsolidated/instance-1ulhngoe v1
        [2022-06-25T23:25:33+0100] [MainThread] [I] [toil.worker] Loaded body Job('CactusConsolidated' kind-CactusConsolidated/instance-1ulhngoe v1) from description 'CactusConsolidated' kind-CactusConsol
idated/instance-1ulhngoe v1
        [2022-06-25T23:25:33+0100] [MainThread] [I] [toil.statsAndLogging] Alignments file: /tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpmxlgglzd.tmp
        [2022-06-25T23:25:33+0100] [MainThread] [W] [root] Deprecated toil method.  Please call "logging.getLevelName" directly.
        [2022-06-25T23:25:33+0100] [MainThread] [I] [cactus.shared.common] Running the command ['cactus_consolidated', '--sequences', 'Mmes /tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpf2c_fbdz.tmp Mpol /tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpn5vzfate.tmp', '--speciesTree', '(Mmes:1.0,Mpol:1.0)Anc0;', '--logLevel', 'INFO', '--alignments', '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpmxlgglzd.tmp', '--params', '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpx2mv_oud.tmp', '--outputFile', '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpf5nzug92.tmp', '--outputHalFastaFile', '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpn6zj3ag7.tmp', '--outputReferenceFile', '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpcx3kxzz_.tmp', '--referenceEvent', 'Anc0', '--secondaryAlignments', '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpigeb36_j.tmp', '--threads', '256']
        [2022-06-25T23:25:34+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
        [2022-06-25T23:25:34+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-ProgressiveUp/instance-y9a83lsr/cleanup/file-0cab6a9c93bc4ba9b8a186b540ccee21/tmpm_fiis81.tmp' to path '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpf2c_fbdz.tmp'
        [2022-06-25T23:25:34+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-ProgressiveUp/instance-y9a83lsr/cleanup/file-58234b49b90f4431ae784bac56b6dffe/tmpomfgpfkt.tmp' to path '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpn5vzfate.tmp'
        [2022-06-25T23:25:34+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-mappingQualityRescoring/instance-8j54ywgg/file-c008f3a611da48bfa900dbc0f49d13d2/tmpp67cncdj.tmp' to path '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpmxlgglzd.tmp'
        [2022-06-25T23:25:34+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-mappingQualityRescoring/instance-8j54ywgg/file-94ade84a28fe470b8c91e3a9396231d2/tmpwt827926.tmp' to path '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpigeb36_j.tmp'
        [2022-06-25T23:25:34+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] LOG-TO-MASTER: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job 'CactusConsolidated' kind-CactusConsolidated/instance-1ulhngoe v1 used 181.73% disk (3.0 GiB [3194167296B] used, 1.6 GiB [1757668671B] requested).
        Traceback (most recent call last):
          File "/user/work/tk19812/software/cactus-bin-v2.1.0/cactus_env/lib/python3.8/site-packages/toil/worker.py", line 405, in workerScript
            job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer)
          File "/user/work/tk19812/software/cactus-bin-v2.1.0/cactus_env/lib/python3.8/site-packages/cactus/shared/common.py", line 932, in _runner
            super(RoundedJob, self)._runner(*args, jobStore=jobStore,
          File "/user/work/tk19812/software/cactus-bin-v2.1.0/cactus_env/lib/python3.8/site-packages/toil/job.py", line 2399, in _runner
            returnValues = self._run(jobGraph=None, fileStore=fileStore)
          File "/user/work/tk19812/software/cactus-bin-v2.1.0/cactus_env/lib/python3.8/site-packages/toil/job.py", line 2317, in _run
            return self.run(fileStore)
          File "/user/work/tk19812/software/cactus-bin-v2.1.0/cactus_env/lib/python3.8/site-packages/cactus/pipeline/cactus_workflow.py", line 408, in run
            messages = runCactusConsolidated(seqMap=seqMap,
          File "/user/work/tk19812/software/cactus-bin-v2.1.0/cactus_env/lib/python3.8/site-packages/cactus/shared/common.py", line 184, in runCactusConsolidated
            masterMessages = cactus_call(check_output=True, returnStdErr=True, realtimeStderrPrefix='cactus_consolidated({})'.format(referenceEvent),
          File "/user/work/tk19812/software/cactus-bin-v2.1.0/cactus_env/lib/python3.8/site-packages/cactus/shared/common.py", line 868, in cactus_call
            raise RuntimeError("Command {} signaled {}: {}".format(call, signal.Signals(-process.returncode).name, out))
        RuntimeError: Command ['cactus_consolidated', '--sequences', 'Mmes /tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpf2c_fbdz.tmp Mpol /tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpn5vzfate.tmp', '--speciesTree', '(Mmes:1.0,Mpol:1.0)Anc0;', '--logLevel', 'INFO', '--alignments', '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpmxlgglzd.tmp', '--params', '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpx2mv_oud.tmp', '--outputFile', '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpf5nzug92.tmp', '--outputHalFastaFile', '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpn6zj3ag7.tmp', '--outputReferenceFile', '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpcx3kxzz_.tmp', '--referenceEvent', 'Anc0', '--secondaryAlignments', '/tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpigeb36_j.tmp', '--threads', '256'] signaled SIGSEGV: stdout=, stderr=Params file: /tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpx2mv_oud.tmp
        Output file string : /tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpf5nzug92.tmp
        Output hal fasta file string : /tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpn6zj3ag7.tmp
        Output reference fasta file string : /tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpcx3kxzz_.tmp
        Sequence files and events: Mmes /tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpf2c_fbdz.tmp Mpol /tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpn5vzfate.tmp
        Alignments file: /tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpmxlgglzd.tmp
        Secondary alignments file: /tmp/e7537398fc435fccafb0dc51ddf0f58e/2eef/49c0/tmpigeb36_j.tmp
        Constraint alignments file: (null)
        Species tree: (Mmes:1.0,Mpol:1.0)Anc0;
        Outgroup events: (null)
        Reference event: Anc0

        [2022-06-25T23:25:34+0100] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host bp1-compute194.data.bp.acrc.priv
<=========

Thanks a lot F

glennhickey commented 2 years ago

This seems to be a similar issue to #688 and (your) #585. As mentioned in #688, it looks like it's crashing when parsing the XML file (--params). I can only speculate that it's due to some kind of OS compatibility issue (which OS are you using?) -- if so, the examples/evolverMammals test should fail in the same way (does it?).

In recent releases, cactus_consolidated comes with debug symbols built in, so you could (as touched on in #688) try to run it in gdb:

francicco commented 2 years ago

Hi Glenn,

Thanks a lot for replying. These are the specs for the cluster I'm using:

BluePebble a non-homogenous cluster, meaning that the nodes vary slightly.

Standard Nodes

Approx 170 nodes in use with 24, 28 or 32 cores (majority are 24 cores)

The majority of nodes have 192GB of RAM

Next generation 100 Gbit Ethernet switches (nodes are connected at 10 Gbit or 25 Gbit, no InfiniBand interconnect)

GPU Nodes

18 GPU enabled nodes: NVIDIA RTX2080Ti (each GPU node has 4 cgpu cards and 96GB RAM)

Operating System

GNU/Linux ([CentOS 7](https://www.centos.org/centos-linux/))

Job scheduler

[Slurm](https://slurm.schedmd.com/)

The test data sometimes works, not always, I wonder if some nodes work differently. Now for example it didn't

=========>
    [2022-06-30T07:25:02+0100] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG---
    [2022-06-30T07:25:02+0100] [MainThread] [I] [toil] Running Toil version 5.6.0-c34146a6437e4407a61e946e968bcce67a0ebbca on host bp1-compute194.data.bp.acrc.priv.
    [2022-06-30T07:25:02+0100] [MainThread] [I] [toil.worker] Working on job 'CactusConsolidated' kind-CactusConsolidated/instance-0eugzohk v1
    [2022-06-30T07:25:03+0100] [MainThread] [I] [toil.worker] Loaded body Job('CactusConsolidated' kind-CactusConsolidated/instance-0eugzohk v1) from description 'CactusConsolidated' kind-CactusConsolidated/instance-0eugzohk v1
    [2022-06-30T07:25:03+0100] [MainThread] [I] [toil.statsAndLogging] Alignments file: /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpvhi5g9m3.tmp
    [2022-06-30T07:25:03+0100] [MainThread] [W] [root] Deprecated toil method.  Please call "logging.getLevelName" directly.
    [2022-06-30T07:25:03+0100] [MainThread] [I] [cactus.shared.common] Running the command ['cactus_consolidated', '--sequences', 'simCow_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp947xc5g7.tmp simDog_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpjlc53zs8.tmp simHuman_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp6l93ispp.tmp simMouse_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpfrgnibon.tmp simRat_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmplrz4xzc3.tmp', '--speciesTree', '((simHuman_chr6:0.144018,(simMouse_chr6:0.084509,simRat_chr6:0.091589)mr:0.271974)Anc1:0.020593,(simCow_chr6:0.18908,simDog_chr6:0.16303)Anc2:0.032898)Anc0;', '--logLevel', 'INFO', '--alignments', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpvhi5g9m3.tmp', '--params', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpg265ulb5.tmp', '--outputFile', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpcyi5gmn9.tmp', '--outputHalFastaFile', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpuytjrnpl.tmp', '--outputReferenceFile', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp49qlph8y.tmp', '--outgroupEvents', 'simHuman_chr6 simMouse_chr6 simRat_chr6', '--referenceEvent', 'Anc2', '--secondaryAlignments', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp5o9bllxp.tmp', '--threads', '10']
    [2022-06-30T07:25:03+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
    [2022-06-30T07:25:03+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-ProgressiveUp/instance-lh6hgfs9/cleanup/file-cbd70502ca24483f8493fa5185664d0a/tmp1jkp3f6n.tmp' to path '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp947xc5g7.tmp'
    [2022-06-30T07:25:03+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-ProgressiveUp/instance-lh6hgfs9/cleanup/file-45eb64e42f46494787f9906043b8fee3/tmps8owdt2f.tmp' to path '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpjlc53zs8.tmp'
    [2022-06-30T07:25:03+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-TrimAndRecurseOnOutgroups/instance-oqake763/file-78d49b6ce21c4e4f8662015782cac285/tmphvkn32y4.tmp' to path '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp6l93ispp.tmp'
    [2022-06-30T07:25:03+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-TrimAndRecurseOnOutgroups/instance-zfphg50z/file-c81b6fcba7274b37880ff7eab564f761/tmppzrsg6va.tmp' to path '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpfrgnibon.tmp'
    [2022-06-30T07:25:03+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-TrimAndRecurseOnOutgroups/instance-sgmxpepc/file-b275c643915442eba4ffe7319a87211c/tmpt8b1t9hy.tmp' to path '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmplrz4xzc3.tmp'
    [2022-06-30T07:25:03+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-mappingQualityRescoring/instance-58omarjs/file-2605b2eda35242789c03603f32574929/tmphyz4_gwu.tmp' to path '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpvhi5g9m3.tmp'
    [2022-06-30T07:25:03+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-mappingQualityRescoring/instance-58omarjs/file-b5d7b8fc7ee140249f94c6b743659ae5/tmpbhhvxwmd.tmp' to path '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp5o9bllxp.tmp'
    Traceback (most recent call last):
      File "/user/work/tk19812/software/cactus-bin-v2.1.0/cactus_env/lib/python3.8/site-packages/toil/worker.py", line 405, in workerScript
        job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer)
      File "/user/work/tk19812/software/cactus-bin-v2.1.0/cactus_env/lib/python3.8/site-packages/cactus/shared/common.py", line 932, in _runner
        super(RoundedJob, self)._runner(*args, jobStore=jobStore,
      File "/user/work/tk19812/software/cactus-bin-v2.1.0/cactus_env/lib/python3.8/site-packages/toil/job.py", line 2399, in _runner
        returnValues = self._run(jobGraph=None, fileStore=fileStore)
      File "/user/work/tk19812/software/cactus-bin-v2.1.0/cactus_env/lib/python3.8/site-packages/toil/job.py", line 2317, in _run
        return self.run(fileStore)
      File "/user/work/tk19812/software/cactus-bin-v2.1.0/cactus_env/lib/python3.8/site-packages/cactus/pipeline/cactus_workflow.py", line 408, in run
        messages = runCactusConsolidated(seqMap=seqMap,
      File "/user/work/tk19812/software/cactus-bin-v2.1.0/cactus_env/lib/python3.8/site-packages/cactus/shared/common.py", line 184, in runCactusConsolidated
        masterMessages = cactus_call(check_output=True, returnStdErr=True, realtimeStderrPrefix='cactus_consolidated({})'.format(referenceEvent),
      File "/user/work/tk19812/software/cactus-bin-v2.1.0/cactus_env/lib/python3.8/site-packages/cactus/shared/common.py", line 868, in cactus_call
        raise RuntimeError("Command {} signaled {}: {}".format(call, signal.Signals(-process.returncode).name, out))
    RuntimeError: Command ['cactus_consolidated', '--sequences', 'simCow_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp947xc5g7.tmp simDog_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpjlc53zs8.tmp simHuman_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp6l93ispp.tmp simMouse_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpfrgnibon.tmp simRat_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmplrz4xzc3.tmp', '--speciesTree', '((simHuman_chr6:0.144018,(simMouse_chr6:0.084509,simRat_chr6:0.091589)mr:0.271974)Anc1:0.020593,(simCow_chr6:0.18908,simDog_chr6:0.16303)Anc2:0.032898)Anc0;', '--logLevel', 'INFO', '--alignments', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpvhi5g9m3.tmp', '--params', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpg265ulb5.tmp', '--outputFile', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpcyi5gmn9.tmp', '--outputHalFastaFile', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpuytjrnpl.tmp', '--outputReferenceFile', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp49qlph8y.tmp', '--outgroupEvents', 'simHuman_chr6 simMouse_chr6 simRat_chr6', '--referenceEvent', 'Anc2', '--secondaryAlignments', '/user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp5o9bllxp.tmp', '--threads', '10'] signaled SIGSEGV: stdout=, stderr=Params file: /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpg265ulb5.tmp
    Output file string : /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpcyi5gmn9.tmp
    Output hal fasta file string : /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpuytjrnpl.tmp
    Output reference fasta file string : /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp49qlph8y.tmp
    Sequence files and events: simCow_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp947xc5g7.tmp simDog_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpjlc53zs8.tmp simHuman_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp6l93ispp.tmp simMouse_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpfrgnibon.tmp simRat_chr6 /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmplrz4xzc3.tmp
    Alignments file: /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmpvhi5g9m3.tmp
    Secondary alignments file: /user/work/tk19812/software/cactus-bin-v2.1.0/fcd3a00f1d8a5630bcdcbb8c15e27cca/7469/ec7e/tmp5o9bllxp.tmp
    Constraint alignments file: (null)
    Species tree: ((simHuman_chr6:0.144018,(simMouse_chr6:0.084509,simRat_chr6:0.091589)mr:0.271974)Anc1:0.020593,(simCow_chr6:0.18908,simDog_chr6:0.16303)Anc2:0.032898)Anc0;
    Outgroup events: simHuman_chr6 simMouse_chr6 simRat_chr6
    Reference event: Anc2

    [2022-06-30T07:25:03+0100] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host bp1-compute194.data.bp.acrc.priv
<=========

I tried your suggestion to run cactus_consolidated but I can't find gdb

Help please! Best, F

francicco commented 2 years ago

Hi,

I found out that in some nodes of the cluster cactus_consolidated crashes

Attached: slurm-2216850.out.gz another type of error, I think might be related to memory, because cactus_consolidated was running.

Cheers F

glennhickey commented 2 years ago

I'm a little confused. That first error (SIGSEGV when parsing XML) strikes me as some kind of system incompatibility (that I unfortunately do not yet understand enough to debug). The latest error in your attached log (SIGKILL during bar) is indeed almost certainly due to lack of memory. The two errors are unrelated -- I guess you have some nodes in your cluster that are incompatible with the Cactus binaries and some that don't have enough memory to run your data?

francicco commented 2 years ago

I guess you have some nodes in your cluster that are incompatible with the Cactus binaries and some that don't have enough memory to run your data?

Yes, I came to the same conclusion. What would be the best way to deal with that incompatibility? F

glennhickey commented 2 years ago

Either not use those nodes or try docker/singularity for the binaries.

francicco commented 2 years ago

That is the node that has enough memory (10 genomes of 400Mb each), and I cannot use docker or singularity. The IT doesn't want to install either of them. Compiling the binaries would solve the problem?

Thanks a lot F

glennhickey commented 2 years ago

If you are able to compile the binaries on that particular node (or machine with same OS/system libraries) I would expect that to solve the problem.

You can also try (renaming then) using this cactus_consolidated: http://public.gi.ucsc.edu/~hickey/debug/cactus_consolidated.pic

It was made adding --with-pic to libxml2's configuration.

Or this one http://public.gi.ucsc.edu/~hickey/debug/cactus_consolidated.12

It was made by upgrading libxml2 from 2.7.2 to 2.9.12

francicco commented 2 years ago

Glann,

Thanks a lot, I'll try those two first (are already binaries, right?). I'll let you know In case I'll try to compile it. Can I only compile cactus_consolidated? Can help me with that? Previous version of cacuts were so hard to compile...

Cheer F

francicco commented 2 years ago

@glennhickey I'm so glad to inform you that /cactus_consolidated.12 did work! At least for the test run Thanks a lot for your suport! F