glennhickey / progressiveCactus

Distribution package for the Prgressive Cactus multiple genome aligner. Dependencies are linked as submodules
Other
79 stars 26 forks source link

Trying to align 4 mice genomes, Cactus crashes #111

Closed iminkin closed 6 years ago

iminkin commented 6 years ago

Hi, I am trying to align 4 genomes. Here is the input file:

((NZO_HlLtJ:0.01,CAST_EiJ:0.01):0.01,(129S1_SvImJ:0.01,GRCm38:0.01):0.01):0.01; 129S1_SvImJ /mice/GCA_001624185.1_129S1_SvImJ_v1_genomic.fna GRCm38 /mice/GCA_000001635.8_GRCm38.p6_genomic.fna CAST_EiJ /mice/GCA_001624445.1_CAST_EiJ_v1_genomic.fna NZO_HlLtJ /mice/GCA_001624745.1_NZO_HlLtJ_v1_genomic.fna

Genbank accession IDs are in the file names. I could align two, but with four Cactus crashes. I also tried different trees/branch lengths, outcome is the same.

Cactus.log error message:

he job seems to have left a log file, indicating failure: /research/ium125/progressiveCactus/bin/gmice_4_workdir/jobTree/jobs/t1/job Reporting file: /research/ium125/progressiveCactus/bin/gmice_4_workdir/jobTree/jobs/t1/log.txt log.txt: ---JOBTREE SLAVE OUTPUT LOG--- log.txt: File does not exist: /research/ium125/progressiveCactus/bin/gmice_4_workdir/progressiveAlignment/Anc2/Anc2.fa log.txt: log.txt: Traceback (most recent call last): log.txt: File "/research/ium125/progressiveCactus/submodules/jobTree/src/jobTreeSlave.py", line 271, in main log.txt: defaultMemory=defaultMemory, defaultCpu=defaultCpu, depth=depth) log.txt: File "/research/ium125/progressiveCactus/submodules/jobTree/scriptTree/stack.py", line 153, in execute log.txt: self.target.run() log.txt: File "/research/ium125/progressiveCactus/submodules/cactus/pipeline/cactus_workflow.py", line 432, in run log.txt: makeEventHeadersAlphaNumeric=self.getOptionalPhaseAttrib("makeEventHeadersAlphaNumeric", bool, False)) log.txt: File "/research/ium125/progressiveCactus/submodules/cactus/shared/common.py", line 151, in runCactusSetup log.txt: cactusDiskDatabaseString, logLevel, outgroupEvents, makeEventHeadersAlphaNumeric)) log.txt: File "/research/ium125/progressiveCactus/submodules/sonLib/bioio.py", line 212, in popenCatch log.txt: raise RuntimeError("Command: %s with stdin string '%s' exited with non-zero status %i" % (command, stdinString, sts)) log.txt: RuntimeError: Command: cactus_setup /research/ium125/progressiveCactus/bin/gmice_4_workdir/progressiveAlignment/Anc2/Anc2.fa /research/ium125/progressiveCactus/bin/gmice4$ log.txt: <kyoto_tycoon database_dir="/research/ium125/progressiveCactus/bin/gmice_4_workdir/progressiveAlignment/Anc0/Anc0/Anc0_DB" database_name="Anc0.kch" $ log.txt: log.txt: ' --logLevel CRITICAL with stdin string 'None' exited with non-zero status 1 log.txt: Exiting the slave because of a failed job on host CSE-cbmedg01.psu.edu log.txt: Due to failure we are reducing the remaining retry count of job /research/ium125/progressiveCactus/bin/gmice_4_workdir/jobTree/jobs/t1/job to 0 log.txt: We have set the default memory of the failed job to 34359738368 bytes Job: /research/ium125/progressiveCactus/bin/gmice_4_workdir/jobTree/jobs/t1/job is completely failed

t1/log.txt file:

---JOBTREE SLAVE OUTPUT LOG--- File does not exist: /research/ium125/progressiveCactus/bin/gmice_4_workdir/progressiveAlignment/Anc2/Anc2.fa

Traceback (most recent call last): File "/research/ium125/progressiveCactus/submodules/jobTree/src/jobTreeSlave.py", line 271, in main defaultMemory=defaultMemory, defaultCpu=defaultCpu, depth=depth) File "/research/ium125/progressiveCactus/submodules/jobTree/scriptTree/stack.py", line 153, in execute self.target.run() File "/research/ium125/progressiveCactus/submodules/cactus/pipeline/cactus_workflow.py", line 432, in run makeEventHeadersAlphaNumeric=self.getOptionalPhaseAttrib("makeEventHeadersAlphaNumeric", bool, False)) File "/research/ium125/progressiveCactus/submodules/cactus/shared/common.py", line 151, in runCactusSetup cactusDiskDatabaseString, logLevel, outgroupEvents, makeEventHeadersAlphaNumeric)) File "/research/ium125/progressiveCactus/submodules/sonLib/bioio.py", line 212, in popenCatch raise RuntimeError("Command: %s with stdin string '%s' exited with non-zero status %i" % (command, stdinString, sts)) RuntimeError: Command: cactus_setup /research/ium125/progressiveCactus/bin/gmice_4_workdir/progressiveAlignment/Anc2/Anc2.fa /research/ium125/progressiveCactus/bin/gmice_4_workdir/progress$ <kyoto_tycoon database_dir="/research/ium125/progressiveCactus/bin/gmice_4_workdir/progressiveAlignment/Anc0/Anc0/Anc0_DB" database_name="Anc0.kch" host="127.0.0.1"$ ' --logLevel CRITICAL with stdin string 'None' exited with non-zero status 1 Exiting the slave because of a failed job on host CSE-cbmedg01.psu.edu Due to failure we are reducing the remaining retry count of job /research/ium125/progressiveCactus/bin/gmice_4_workdir/jobTree/jobs/t1/job to 0 We have set the default memory of the failed job to 34359738368 bytes

joelarmstrong commented 6 years ago

Hey Ilya,

Sorry for the delay. This is a bug caused by the 0.01 branch length at the root. Getting rid of that branch length (i.e. ((NZO_HlLtJ:0.01,CAST_EiJ:0.01):0.01,(129S1_SvImJ:0.01,GRCm38:0.01):0.01);) should stop it from crashing. The root cause is that different newick parsers in different languages interpret the meaning of the root branch length differently: some create a "real root" node at a distance of 0.01 above the root, while some don't. The inconsistency between the number of nodes in the two interpretations causes the scheduler to go awry.

I have added a quick hack to this distribution to cause it to barf early on detecting a root branch length.

iminkin commented 6 years ago

Thanks, I think I discovered this issue/fix a long time ago, but then went off my mind.