glennhickey / progressiveCactus

Distribution package for the Prgressive Cactus multiple genome aligner. Dependencies are linked as submodules
Other
79 stars 26 forks source link

job is completely failed #117

Open amizeranschi opened 5 years ago

amizeranschi commented 5 years ago

Hi,

I'm trying to align 7 genomes (between 700 Mb and 1 Gb each) on a 32-core machine with 128 GB of RAM. The job ran for several dies and recently crashed. Is this due to not having enough resources or could the reason be something else?

Got message from job at time: 1551036998.35 : Starting caf phase target with index 0 at 1551036889.34 seconds (recursing = 1)
Got message from job at time: 1551036998.35 : Adding an oversize flower for target class <class 'cactus.pipeline.cactus_workflow.CactusCafWrapperLarge'> and stats flower name: 0 total bases: 5841524595 total-ends: 279032 total-caps: 279032 max-end-degree: 1 max-adjacency-length: 104987321 total-blocks: 0 total-groups: 1 total-edges: 139516 total-free-ends: 279032 total-attached-ends: 0 total-chains: 0 total-link groups: 0
The job seems to have left a log file, indicating failure: /export/home/ncit/external/a.mizeranschi/cactus-csat/csatWork/jobTree/jobs/t1/job
Reporting file: /export/home/ncit/external/a.mizeranschi/cactus-csat/csatWork/jobTree/jobs/t1/log.txt
log.txt:    ---JOBTREE SLAVE OUTPUT LOG---
log.txt:    Traceback (most recent call last):
log.txt:      File "/export/home/ncit/external/a.mizeranschi/progressiveCactus/submodules/jobTree/src/jobTreeSlave.py", line 271, in main
log.txt:        defaultMemory=defaultMemory, defaultCpu=defaultCpu, depth=depth)
log.txt:      File "/export/home/ncit/external/a.mizeranschi/progressiveCactus/submodules/jobTree/scriptTree/stack.py", line 153, in execute
log.txt:        self.target.run()
log.txt:      File "/export/home/ncit/external/a.mizeranschi/progressiveCactus/submodules/cactus/pipeline/cactus_workflow.py", line 590, in run
log.txt:        self.runCactusCafInWorkflow(alignmentFile=self.phaseNode.attrib["alignments"])
log.txt:      File "/export/home/ncit/external/a.mizeranschi/progressiveCactus/submodules/cactus/pipeline/cactus_workflow.py", line 556, in runCactusCafInWorkflow
log.txt:        maxRecoverableChainLength=self.getOptionalPhaseAttrib("maxRecoverableChainLength", int))
log.txt:      File "/export/home/ncit/external/a.mizeranschi/progressiveCactus/submodules/cactus/shared/common.py", line 292, in runCactusCaf
log.txt:        masterMessages = popenCatch(command, stdinString=flowerNames)
log.txt:      File "/export/home/ncit/external/a.mizeranschi/progressiveCactus/submodules/sonLib/bioio.py", line 212, in popenCatch
log.txt:        raise RuntimeError("Command: %s with stdin string '%s' exited with non-zero status %i" % (command, stdinString, sts))
log.txt:    RuntimeError: Command: cactus_caf --cactusDisk '<st_kv_database_conf type="kyoto_tycoon">           <kyoto_tycoon database_dir="/export/home/ncit/external/a.mizeranschi/cactus-csat/csatWork/progressiveAlignment/Anc0/Anc0/Anc0_DB" database_name="Anc0.kch" host="172.16.13.37" in_memory="1" port="1978" snapshot="0" />        </st_kv_database_conf>' --logLevel CRITICAL --alignments /export/home/ncit/external/a.mizeranschi/cactus-csat/csatWork/jobTree/jobs/t1/gTD3/tmp_dP6KGDyvLx/alignments.cigar --annealingRounds '128' --deannealingRounds '2 8' --trim '0 0' --minimumTreeCoverage 0.0 --blockTrim 5 --minimumDegree 2 --minimumIngroupDegree 1 --minimumOutgroupDegree 0 --alignmentFilter relaxedSingleCopyOutgroup --lastzArguments '--step=1 --ambiguous=iupac,100,100 --ydrop=3000 --identity=25.0' --minimumSequenceLengthForBlast 30 --maxAdjacencyComponentSizeRatio 50.0  --minLengthForChromosome 1000000 --proportionOfUnalignedBasesForNewChromosome 0.8 --maximumMedianSequenceLengthBetweenLinkedEnds 1000 --realign --realignArguments '--gapGamma 0.0 --matchGamma 0.9 --diagonalExpansion 4 --splitMatrixBiggerThanThis 10 --constraintDiagonalTrim 0 --alignAmbiguityCharacters --splitIndelsLongerThanThis 99' --phylogenyNumTrees 30 --phylogenyRootingMethod 'bestRecon' --phylogenyScoringMethod 'reconCost' --phylogenyBreakpointScalingFactor 1.0 --phylogenySkipSingleCopyBlocks --phylogenyMaxBaseDistance 100 --phylogenyMaxBlockDistance 50   --phylogenyTreeBuildingMethod guidedNeighborJoining,splitDecomposition --phylogenyCostPerDupPerBase 0.00 --phylogenyCostPerLossPerBase 0.02 --referenceEventHeader 'Anc0' --phylogenyDoSplitsWithSupportHigherThanThisAllAtOnce 0.44 --numTreeBuildingThreads 2  --minimumBlockDegreeToCheckSupport 10 --minimumBlockHomologySupport 0.05  --removeRecoverableChains unequalNumberOfIngroupCopies --minimumNumberOfSpecies 1 --phylogenyHomologyUnitType 'chain' --phylogenyDistanceCorrectionMethod 'jukesCantor' --maxRecoverableChainsIterations 5 --maxRecoverableChainLength 500000 with stdin string ' 1 0 ' exited with non-zero status -9
log.txt:    Exiting the slave because of a failed job on host haswell-wn37.grid.pub.ro
log.txt:    Due to failure we are reducing the remaining retry count of job /export/home/ncit/external/a.mizeranschi/cactus-csat/csatWork/jobTree/jobs/t1/job to 0
log.txt:    We have set the default memory of the failed job to 137438953472 bytes
Job: /export/home/ncit/external/a.mizeranschi/cactus-csat/csatWork/jobTree/jobs/t1/job is completely failed