ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
505 stars 111 forks source link

Fail to run Cactus after updating to 2.8.0 #1327

Closed BitaoQiu closed 6 months ago

BitaoQiu commented 6 months ago

Dear Cactus developers, I encounter this problem after updating to v.2.8.0 and it seems to be due to the implementation of RedMask. Could you please provide guidance?

My code: cactus ./js evolverTermites.txt evolverTermites.hal --workDir /home/fr/fr_fr/fr_bq1000/quality/genome_alignment/ --maxMemory 200Gi --maxCores 64 --consMemory 32Gi

Error report:

[2024-03-26T18:57:24+0100] [Thread-1 (daddy)] [E] [toil.batchSystems.singleMachine] Got exit code 1 (indicating failure) from job _toil_worker RedMaskJob file:/gpfs/bwfor/work/ws/fr_bq1000-bq_space/fr_bq1000-quality-1699205402/genome_alignment/two_sp/js kind-RedMaskJob/in
stance-ullctb9e.
[2024-03-26T18:57:24+0100] [MainThread] [W] [toil.leader] Job failed with exit value 1: 'RedMaskJob' kind-RedMaskJob/instance-ullctb9e v1
Exit reason: None
[2024-03-26T18:57:24+0100] [MainThread] [W] [toil.leader] The job seems to have left a log file, indicating failure: 'RedMaskJob' kind-RedMaskJob/instance-ullctb9e v2
[2024-03-26T18:57:24+0100] [MainThread] [W] [toil.leader] Log from job "kind-RedMaskJob/instance-ullctb9e" follows:
=========>
        [2024-03-26T18:57:17+0100] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG---
        [2024-03-26T18:57:17+0100] [MainThread] [I] [toil] Running Toil version 6.0.0-0e2a07a20818e593bfdfde3cc51ca4ad809fde96 on host m02n10.
        [2024-03-26T18:57:17+0100] [MainThread] [I] [toil.worker] Working on job 'RedMaskJob' kind-RedMaskJob/instance-ullctb9e v1
        [2024-03-26T18:57:17+0100] [MainThread] [I] [toil.worker] Loaded body Job('RedMaskJob' kind-RedMaskJob/instance-ullctb9e v1) from description 'RedMaskJob' kind-RedMaskJob/instance-ullctb9e v1
        [2024-03-26T18:57:17+0100] [MainThread] [W] [toil.common] XDG_RUNTIME_DIR is set to nonexistent directory /run/user/902086; your environment may be out of spec!
        [2024-03-26T18:57:17+0100] [MainThread] [I] [cactus.shared.common] Running the command ['cactus_softmask2hardmask', '-b', '/home/fr/fr_fr/fr_bq1000/quality/genome_alignment/4ff865a4287555f7bea5e6fcee02704e/212d/6063/tmpcvls01ml/red-in-Csec/Csec.fa']
        [2024-03-26T18:57:17+0100] [MainThread] [I] [toil-rt] 2024-03-26 18:57:17.311902: Running the command: "cactus_softmask2hardmask -b /home/fr/fr_fr/fr_bq1000/quality/genome_alignment/4ff865a4287555f7bea5e6fcee02704e/212d/6063/tmpcvls01ml/red-in-Csec/Csec.fa"
        [2024-03-26T18:57:24+0100] [MainThread] [I] [toil-rt] 2024-03-26 18:57:24.781890: Successfully ran: "cactus_softmask2hardmask -b /home/fr/fr_fr/fr_bq1000/quality/genome_alignment/4ff865a4287555f7bea5e6fcee02704e/212d/6063/tmpcvls01ml/red-in-Csec/Csec.fa" in 7.4545
 seconds
        [2024-03-26T18:57:24+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
        [2024-03-26T18:57:24+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-PreprocessSequence/instance-1w9t1j4l/cleanup/file-ae90f432160f4cf98967eb02d9d7b458/tmp6uf_2g47.tmp' to path '/home/fr/fr_fr/fr_bq1000/quality/genome
_alignment/4ff865a4287555f7bea5e6fcee02704e/212d/6063/tmpcvls01ml/red-in-Csec/Csec.fa'
        Traceback (most recent call last):
          File "/gpfs/bwfor/home/fr/fr_fr/fr_bq1000/bin/cactus-bin-v2.8.0/venv-cactus-v2.8.0/lib/python3.11/site-packages/toil/worker.py", line 407, in workerScript
            job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer)
          File "/gpfs/bwfor/home/fr/fr_fr/fr_bq1000/bin/cactus-bin-v2.8.0/venv-cactus-v2.8.0/lib/python3.11/site-packages/cactus/shared/common.py", line 975, in _runner
            super(RoundedJob, self)._runner(*args, jobStore=jobStore,
          File "/gpfs/bwfor/home/fr/fr_fr/fr_bq1000/bin/cactus-bin-v2.8.0/venv-cactus-v2.8.0/lib/python3.11/site-packages/toil/job.py", line 2829, in _runner
            returnValues = self._run(jobGraph=None, fileStore=fileStore)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
          File "/gpfs/bwfor/home/fr/fr_fr/fr_bq1000/bin/cactus-bin-v2.8.0/venv-cactus-v2.8.0/lib/python3.11/site-packages/toil/job.py", line 2746, in _run
            return self.run(fileStore)
                   ^^^^^^^^^^^^^^^^^^^
          File "/gpfs/bwfor/home/fr/fr_fr/fr_bq1000/bin/cactus-bin-v2.8.0/venv-cactus-v2.8.0/lib/python3.11/site-packages/cactus/preprocessor/redMasking.py", line 55, in run
            pre_mask_size = int(cactus_call(parameters=['awk', '{sum += $3-$2} END {print sum}', bed_path],
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        ValueError: invalid literal for int() with base 10: ''
        [2024-03-26T18:57:24+0100] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host m02n10
glennhickey commented 6 months ago

There are two workarounds. If your data is all completely unmasked, then do (where the path is relative to your cactus installation)

cat  src/cactus/cactus_progressive_config.xml | sed -e 's/unmask="0"/unmask="1"/' > config-unmask.xml

then run cactus with --configFile config-unmask.xml

Otherwise, if some inputs have masking you want to keep, then you will have go into each intput assembly and verify that it has at least 1 lower-case character. You can just make the first lower case without changing results.

It is always recommended to run RepeatMasker on your input so that would be even better. Still, even though it's best to mask your input sequences, it's still a bad bug for it to crash like this on unmasked data. Thanks for bringing it up, and it'll definitely be fixed in the next release.

BitaoQiu commented 6 months ago

Many thanks!

On 26. Mar 2024, at 21:48, Glenn Hickey @.***> wrote:

There are two workarounds. If your data is all completely unmasked, then do (where the path is relative to your cactus installation)

cat src/cactus/cactus_progressive_config.xml | sed -e 's/unmask="0"/unmask="1"/' > config-unmask.xml then run cactus with --configFile config-unmask.xml

Otherwise, if some inputs have masking you want to keep, then you will have go into each intput assembly and verify that it has at least 1 lower-case character. You can just make the first lower case without changing results.

It is always recommended to run RepeatMasker on your input so that would be even better. Still, even though it's best to mask your input sequences, it's still a bad bug for it to crash like this on unmasked data. Thanks for bringing it up, and it'll definitely be fixed in the next release.

— Reply to this email directly, view it on GitHub https://github.com/ComparativeGenomicsToolkit/cactus/issues/1327#issuecomment-2021443396, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ5EFNWIKD3EKOGBESOCEDY2HNKRAVCNFSM6AAAAABFJR3WJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRRGQ2DGMZZGY. You are receiving this because you authored the thread.