glennhickey / progressiveCactus

Distribution package for the Prgressive Cactus multiple genome aligner. Dependencies are linked as submodules
Other
83 stars 26 forks source link

Resuming a job appears to be broken - I am providing debug output. #104

Open jasonsydes opened 6 years ago

jasonsydes commented 6 years ago

Hello there!

Thank you for writing Progressive Cactus! We are very excited to use it.

We've been testing it out over the past few weeks, and we are unable to restart / resubmit a job. This problem appears to have been reported in several other issues (#16, #51 and #38), but it still appears to be broken. In an attempt to help get this problem resolved (because it would be really nice to be able to restart a job immediately instead of having to wait another 30 hours), I dug a little deeper.

In short, the problem is happening on line 214 of src/projectWrapper.py (at least it is on commit 71aa42c):

            if newLine.replace(tempPath, oldPath) != oldLine:
                areSame = False

It's a pesky "_temp" suffix that is causing the problem. For example:

For example, "oldLine" looks like this:

<cactus experiment_path="/home/sydes/cactus/progressiveAlignment/Anc0/Anc0_experiment.xml" name="Anc0"/>

While "newLine" looks like this:

<cactus experiment_path="/home/sydes/cactus/progressiveAlignment_temp/Anc0/Anc0_experiment.xml" name="Anc0"/>

I modified isSameAsExisting() to show where it is dying (see below).

Is this something you might be able to fix?

Thank you again for writing Progressive Cactus!!

Modified version of isSameAsExisting():

    # create a project in a dummy directory.  check if the
    # project xml is the same as the current project.
    # we do this to see if we should start fresh or try to
    # work with the existing project when the overwrite flag is off
    def isSameAsExisting(self, expPath, projPath, fixNames):
        if not os.path.exists(projPath):
            print("DEBUG AAA")
            return False
        oldPath = os.path.dirname(projPath + "/")
        tempPath = "%s_temp" % oldPath
        if os.path.exists(tempPath):
            print("DEBUG BBB")
            system("rm -rf %s" % tempPath)
        cmd = "cactus_createMultiCactusProject.py %s %s --fixNames=%d" % (
            expPath, tempPath, fixNames)
        if len(self.seqFile.outgroups) > 0:
            print("DEBUG CCC")
            cmd += " --outgroupNames " + ",".join(self.seqFile.outgroups)
        if self.options.rootOutgroupDists:
            print("DEBUG DDD")
            cmd += " --rootOutgroupDists %s" % self.options.rootOutgroupDists
            cmd += " --rootOutgroupPaths %s" % self.options.rootOutgroupPaths
        if self.options.root is not None:
            print("DEBUG EEE")
            cmd += " --root %s" % self.options.root
        system(cmd)
        projFilePathNew = os.path.join(tempPath,'%s_temp_project.xml' %
                                       self.alignmentDirName)
        projFilePathOld = os.path.join(oldPath, '%s_project.xml' %
                                       self.alignmentDirName)

        newFile = [line for line in open(projFilePathNew, "r")]
        oldFile = [line for line in open(projFilePathOld, "r")]
        areSame = True
        print("projFilePathOld = {}".format(projFilePathOld))
        print("projFilePathNew = {}".format(projFilePathNew))
        print("DEBUG BEGIN: {}".format(areSame))
        if len(newFile) != len(oldFile):
            areSame = False
            print("DEBUG FFF: {}".format(areSame))
        for newLine, oldLine in zip(newFile, oldFile):
            print("DEBUG GGG: {}".format(areSame))
            if newLine.replace(tempPath, oldPath) != oldLine:
                print("DEBUG HHH1: {}".format(areSame))
                print("DEBUG newLine:\n{}".format(newLine))
                print("DEBUG newLine.replace(tempPath, oldPath):\n{}".format(newLine.replace(tempPath, oldPath)))
                print("DEBUG oldLine:\n{}".format(oldLine))
                areSame = False
                print("DEBUG HHH2: {}".format(areSame))
        system("rm -rf %s" % tempPath)
        print("DEBUG III: {}".format(areSame))
        return areSame

Output from modified version:

Error: Existing project ./cactus/progressiveAlignment not compatible with current input.  Please erase the working directory or rerun with the --overwrite option to start from scratch.

Temporary data was left in: ./cactus
More information can be found in ./cactus/cactus.log

Beginning Alignment
projFilePathOld = ./cactus/progressiveAlignment/progressiveAlignment_project.xml
projFilePathNew = ./cactus/progressiveAlignment_temp/progressiveAlignment_temp_project.xml
DEBUG BEGIN: True
DEBUG GGG: True
DEBUG GGG: True
DEBUG GGG: True
DEBUG GGG: True
DEBUG HHH1: True
DEBUG newLine:
    <cactus experiment_path="/home/sydes/cactus/progressiveAlignment_temp/Anc0/Anc0_experiment.xml" name="Anc0"/>

DEBUG newLine.replace(tempPath, oldPath):
    <cactus experiment_path="/home/sydes/cactus/progressiveAlignment_temp/Anc0/Anc0_experiment.xml" name="Anc0"/>

DEBUG oldLine:
    <cactus experiment_path="/home/sydes/cactus/progressiveAlignment/Anc0/Anc0_experiment.xml" name="Anc0"/>

DEBUG HHH2: False
DEBUG GGG: False
DEBUG HHH1: False
DEBUG newLine:
    <cactus experiment_path="/home/sydes/cactus/progressiveAlignment_temp/Anc1/Anc1_experiment.xml" name="Anc1"/>

DEBUG newLine.replace(tempPath, oldPath):
    <cactus experiment_path="/home/sydes/cactus/progressiveAlignment_temp/Anc1/Anc1_experiment.xml" name="Anc1"/>

DEBUG oldLine:
    <cactus experiment_path="/home/sydes/cactus/progressiveAlignment/Anc1/Anc1_experiment.xml" name="Anc1"/>

DEBUG HHH2: False
DEBUG GGG: False
DEBUG HHH1: False
DEBUG newLine:
    <cactus experiment_path="/home/sydes/cactus/progressiveAlignment_temp/Anc2/Anc2_experiment.xml" name="Anc2"/>

DEBUG newLine.replace(tempPath, oldPath):
    <cactus experiment_path="/home/sydes/cactus/progressiveAlignment_temp/Anc2/Anc2_experiment.xml" name="Anc2"/>

DEBUG oldLine:
    <cactus experiment_path="/home/sydes/cactus/progressiveAlignment/Anc2/Anc2_experiment.xml" name="Anc2"/>

DEBUG HHH2: False
DEBUG GGG: False
DEBUG III: False
jasonsydes commented 6 years ago

Here is some better debug output to show the problem.

I have a fix and will be submitting a pull request shortly.

Updated modified version of isSameAsExisting():

    # create a project in a dummy directory.  check if the
    # project xml is the same as the current project.
    # we do this to see if we should start fresh or try to
    # work with the existing project when the overwrite flag is off
    def isSameAsExisting(self, expPath, projPath, fixNames):
        if not os.path.exists(projPath):
            print("DEBUG AAA")
            return False
        oldPath = os.path.dirname(projPath + "/")
        tempPath = "%s_temp" % oldPath
        # Fix for relative directories
        if oldPath[0:2] == './':
            oldPath = oldPath[2:]
        if tempPath[0:2] == './':
            tempPath = tempPath[2:]
        if os.path.exists(tempPath):
            print("DEBUG BBB")
            system("rm -rf %s" % tempPath)
        cmd = "cactus_createMultiCactusProject.py %s %s --fixNames=%d" % (
            expPath, tempPath, fixNames)
        if len(self.seqFile.outgroups) > 0:
            print("DEBUG CCC")
            cmd += " --outgroupNames " + ",".join(self.seqFile.outgroups)
        if self.options.rootOutgroupDists:
            print("DEBUG DDD")
            cmd += " --rootOutgroupDists %s" % self.options.rootOutgroupDists
            cmd += " --rootOutgroupPaths %s" % self.options.rootOutgroupPaths
        if self.options.root is not None:
            print("DEBUG EEE")
            cmd += " --root %s" % self.options.root
        system(cmd)
        projFilePathNew = os.path.join(tempPath,'%s_temp_project.xml' %
                                       self.alignmentDirName)
        projFilePathOld = os.path.join(oldPath, '%s_project.xml' %
                                       self.alignmentDirName)

        newFile = [line for line in open(projFilePathNew, "r")]
        oldFile = [line for line in open(projFilePathOld, "r")]
        areSame = True
        print("projFilePathOld = {}".format(projFilePathOld))
        print("projFilePathNew = {}".format(projFilePathNew))
        print("DEBUG BEGIN: {}".format(areSame))
        if len(newFile) != len(oldFile):
            areSame = False
            print("DEBUG FFF: {}".format(areSame))
        for newLine, oldLine in zip(newFile, oldFile):
            print("DEBUG GGG: {}".format(areSame))
            if newLine.replace(tempPath, oldPath) != oldLine:
                print("DEBUG HHH1: {}".format(areSame))
                print("DEBUG newLine:\n{}".format(newLine))
                print("DEBUG newLine.replace(tempPath, oldPath):\n{}".format(newLine.replace(tempPath, oldPath)))
                print("DEBUG oldLine:\n{}".format(oldLine))
                print("")
                print("DEBUG tempPath:\n{}".format(tempPath))
                print("DEBUG oldPath:\n{}".format(oldPath))
                areSame = False
                print("DEBUG HHH2: {}".format(areSame))
        system("rm -rf %s" % tempPath)
        print("DEBUG III: {}".format(areSame))
        return areSame

And the output:

Error: Existing project ./cactus/progressiveAlignment not compatible with current input.  Please erase the working directory or rerun with the --overwrite option to start from scratch.

Temporary data was left in: ./cactus
More information can be found in ./cactus/cactus.log

Beginning Alignment
projFilePathOld = ./cactus/progressiveAlignment/progressiveAlignment_project.xml
projFilePathNew = ./cactus/progressiveAlignment_temp/progressiveAlignment_temp_project.xml
DEBUG BEGIN: True
DEBUG GGG: True
DEBUG GGG: True
DEBUG GGG: True
DEBUG GGG: True
DEBUG HHH1: True
DEBUG newLine:
    <cactus experiment_path="/home/sydes/cactus/progressiveAlignment_temp/Anc0/Anc0_experiment.xml" name="Anc0"/>

DEBUG newLine.replace(tempPath, oldPath):
    <cactus experiment_path="/home/sydes/cactus/progressiveAlignment_temp/Anc0/Anc0_experiment.xml" name="Anc0"/>

DEBUG oldLine:
    <cactus experiment_path="/home/sydes/cactus/progressiveAlignment/Anc0/Anc0_experiment.xml" name="Anc0"/>

DEBUG tempPath:
./cactus/progressiveAlignment_temp
DEBUG oldPath:
./cactus/progressiveAlignment
DEBUG HHH2: False
DEBUG GGG: False
DEBUG HHH1: False
DEBUG newLine:
    <cactus experiment_path="/home/sydes/cactus/progressiveAlignment_temp/Anc1/Anc1_experiment.xml" name="Anc1"/>

DEBUG newLine.replace(tempPath, oldPath):
    <cactus experiment_path="/home/sydes/cactus/progressiveAlignment_temp/Anc1/Anc1_experiment.xml" name="Anc1"/>

DEBUG oldLine:
    <cactus experiment_path="/home/sydes/cactus/progressiveAlignment/Anc1/Anc1_experiment.xml" name="Anc1"/>

DEBUG tempPath:
./cactus/progressiveAlignment_temp
DEBUG oldPath:
./cactus/progressiveAlignment
DEBUG HHH2: False
DEBUG GGG: False
DEBUG HHH1: False
DEBUG newLine:
    <cactus experiment_path="/home/sydes/cactus/progressiveAlignment_temp/Anc2/Anc2_experiment.xml" name="Anc2"/>

DEBUG newLine.replace(tempPath, oldPath):
    <cactus experiment_path="/home/sydes/cactus/progressiveAlignment_temp/Anc2/Anc2_experiment.xml" name="Anc2"/>

DEBUG oldLine:
    <cactus experiment_path="/home/sydes/cactus/progressiveAlignment/Anc2/Anc2_experiment.xml" name="Anc2"/>

DEBUG tempPath:
./cactus/progressiveAlignment_temp
DEBUG oldPath:
./cactus/progressiveAlignment
DEBUG HHH2: False
DEBUG GGG: False
DEBUG III: False
Command exited with non-zero status 255
    Command being timed: "./bin/runProgressiveCactus.sh cactus/tree_and_seqs ./cactus ./cactus/b00.hal --ktHost=127.0.0.1 --stats --maxThreads 28 --maxMemory=8796093022208 --defaultMemory=3145728000"