ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
520 stars 111 forks source link

Assertion `i == paf->query_end' failed #1476

Open macmanes opened 2 months ago

macmanes commented 2 months ago

Having an error message which persists after a restart using

TOIL_SLURM_ARGS="--partition=macmanes --exclude=node116,node115" \
cactus $HOME/jobs2 $HOME/final_cactus_input.txt mammals2.hal \
--batchSystem slurm --batchLogsDir batch-logs --coordinationDir $HOME/cactus_jobs2 \
--consCores 40 --doubleMem true --maxMemory 500G --maxJobs=1000 \
--restart --caching false --cleanWorkDir never --workDir work_test/

I had added the parts --restart --caching false --cleanWorkDir never --workDir work_test/ after the initial failure (with the same error).

The error:

 paffy: impl/paf.c:618: increase_alignment_level_counts: Assertion `i == paf->query_end' failed.

Full error message

Log from job "'tile_alignments' kind-tile_alignments/instance-u2e1gpkv v24" follows:
=========>
        [2024-09-08T11:08:30-0400] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG---
        [2024-09-08T11:08:30-0400] [MainThread] [I] [toil] Running Toil version 7.0.0-d569ea5711eb310ffd5703803f7250ebf7c19576 on host node139.rcchpc.
        [2024-09-08T11:08:30-0400] [MainThread] [I] [toil.worker] Working on job 'tile_alignments' kind-tile_alignments/instance-u2e1gpkv v22
        [2024-09-08T11:08:31-0400] [MainThread] [I] [toil.worker] Loaded body Job('tile_alignments' kind-tile_alignments/instance-u2e1gpkv v22) from description 'tile_alignments' kind-tile_alignments/instance-u2e1gpkv v22
        [2024-09-08T11:08:44-0400] [MainThread] [W] [root] Deprecated toil method.  Please call "logging.getLevelName" directly.
        [2024-09-08T11:08:44-0400] [MainThread] [I] [cactus.shared.common] Running the command ['paffy', 'tile', '-i', '/mnt/gpfs01/home/macmaneslab/macmanes/work_test/toilwf-1a33d0b35e15559cae21ef2455d23974/3259/job/tmp28nxipvt/chained_Anc64', '--logLevel', 'INFO']
        [2024-09-08T11:08:44-0400] [MainThread] [I] [toil-rt] 2024-09-08 11:08:44.121346: Running the command: "paffy tile -i /mnt/gpfs01/home/macmaneslab/macmanes/work_test/toilwf-1a33d0b35e15559cae21ef2455d23974/3259/job/tmp28nxipvt/chained_Anc64 --logLevel INFO"
        [2024-09-08T11:24:25-0400] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
        [2024-09-08T11:24:25-0400] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-chain_one_alignment/instance-h6tc5jw7/file-7545362bdee3409c96d4d314d79c9cc5/Anc64-GCF_001704415.2_ARS1.2_vs_GCF_016772045.2_ARS-UI_Ramb_v3.0.chained.paf' to path '/mnt/gpfs01/home/macmaneslab/macmanes/work_test/toilwf-1a33d0b35e15559cae21ef2455d23974/3259/job/tmp28nxipvt/chained_Anc64_0.paf' (gone!)
        [2024-09-08T11:24:25-0400] [MainThread] [C] [toil.worker] Worker crashed with traceback:
        Traceback (most recent call last):
          File "/mnt/gpfs01/software/anaconda/colsa/envs/cactus-2.9.0/lib/python3.8/site-packages/toil/worker.py", line 438, in workerScript
            job._runner(jobGraph=None, jobStore=job_store, fileStore=fileStore, defer=defer)
          File "/mnt/gpfs01/software/anaconda/colsa/envs/cactus-2.9.0/lib/python3.8/site-packages/toil/job.py", line 2984, in _runner
            returnValues = self._run(jobGraph=None, fileStore=fileStore)
          File "/mnt/gpfs01/software/anaconda/colsa/envs/cactus-2.9.0/lib/python3.8/site-packages/toil/job.py", line 2895, in _run
            return self.run(fileStore)
          File "/mnt/gpfs01/software/anaconda/colsa/envs/cactus-2.9.0/lib/python3.8/site-packages/toil/job.py", line 3158, in run
            rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
          File "/mnt/gpfs01/software/anaconda/colsa/envs/cactus-2.9.0/lib/python3.8/site-packages/cactus/paf/local_alignment.py", line 415, in tile_alignments
            cactus_call(parameters=['paffy', 'tile', "-i", chained_paf_path, "--logLevel", getLogLevelString()],
          File "/mnt/gpfs01/software/anaconda/colsa/envs/cactus-2.9.0/lib/python3.8/site-packages/cactus/shared/common.py", line 912, in cactus_call
            raise RuntimeError("{}Command {} signaled {}: {}".format(sigill_msg, call, signal.Signals(-process.returncode).name, out))
        RuntimeError: Command ['paffy', 'tile', '-i', '/mnt/gpfs01/home/macmaneslab/macmanes/work_test/toilwf-1a33d0b35e15559cae21ef2455d23974/3259/job/tmp28nxipvt/chained_Anc64', '--logLevel', 'INFO'] signaled SIGABRT: stderr=Input file string : /mnt/gpfs01/home/macmaneslab/macmanes/work_test/toilwf-1a33d0b35e15559cae21ef2455d23974/3259/job/tmp28nxipvt/chained_Anc64
        Output file string : (null)
        paffy: impl/paf.c:618: increase_alignment_level_counts: Assertion `i == paf->query_end' failed.

        [2024-09-08T11:24:25-0400] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host node139.rcchpc
<=========

not sure what that chained_Anc64 file this is the input to paffy is supposed to look like, but its 9.7G large and looks like this (1st bit)

id=GCF_001704415.2_ARS1.2|NC_030810.1   120038259       11710452        12148948        +       id=GCF_016772045.2_ARS-UI_Ramb_v3.0|NC_056054.1 278617202       11615659        12054262        427294  440248  255     AS:i:39846250   cn:i:1  s1:i:10225188944        cg:Z:42=4I6=1X18=2X23=2X81=1X15=1X94=1X22=1X19=1X4=2X22=1X1=1X25=1X3=1X8=1X8=1X51=1X76=1X30=1X88=4I57=1X56=1X83=9I224=1X98=1X35=1D12=1X87=1X88=1X26=1X1=1X25=1X17=2X136=1X77=1X39=1X23=1X195=1X48=1X14=1X64=1X25=2X3=1X74=1X7=1X6=1X74=1X267=1X12=1X36=1X124=2I151=1X33=1X25=1I57=1X7=1X108=1X101=1X23=1X133=1X242=1X85=1X56=1X52=1X3=1X14=1X115=1X9=1X165=1X12=2X12=1X3=1X48=2I20=1X9=1X4=3I34=1X5=1X3=1X24=1X26=1X5=1X7=1X90=1X54=1X18=1X21=2X33=14D7=1X3=1X5=1X23=1X16=1X253=1X70=1X90=1X185=
glennhickey commented 2 months ago

9.8Gb is a huge paf. I suspect some masking issues in the input is leading to a lot of trouble (an age-old problem in Cactus that we are still working on). Did you mask your input with RepeatMasker?

macmanes commented 2 months ago

yup - the 94 mammals are soft-masked with RepeatMasker (mammal database) and 4 outgroup birds are similarly treated (except with the Aves database).