DataBiosphere / toil

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
http://toil.ucsc-cgl.org/.
Apache License 2.0
892 stars 241 forks source link

Can't import `toil.jobStores.azureJobStore` on Parasol workers #473

Closed adamnovak closed 8 years ago

adamnovak commented 8 years ago

I have a Toil script I want to run on ku, via Parasol, to download some stuff from Azure. It cheats and uses the credential-finding function from toil.jobStores.azureJobStore to find the Azure credentials. When I run the script, the nodes are unable to import toil.jobStores.azureJobStore (or indeed any of the Azure storage modules).

To investigate this further, I make a small minimal example:

#!/usr/bin/env python2.7
"""
breakToil.py: break Toil on Parasol.

"""

import argparse, sys, os

from toil.job import Job

# This import will fail on the workers
import toil.jobStores.azureJobStore

def parse_args(args):
    """
    Takes in the command-line arguments list (args), and returns a nice argparse
    result with fields for all the options.

    Borrows heavily from the argparse documentation examples:
    <http://docs.python.org/library/argparse.html>
    """

    # Construct the parser (which is stored in parser)
    # Module docstring lives in __doc__
    # See http://python-forum.com/pythonforum/viewtopic.php?f=3&t=36847
    # And a formatter class so our examples in the docstring look good. Isn't it
    # convenient how we already wrapped it to 80 characters?
    # See http://docs.python.org/library/argparse.html#formatter-class
    parser = argparse.ArgumentParser(description=__doc__, 
        formatter_class=argparse.RawDescriptionHelpFormatter)

    # Add the Toil options so the job store is the first argument
    Job.Runner.addToilOptions(parser)

    # The command line arguments start with the program name, which we don't
    # want to treat as an argument for argparse. So we remove it.
    args = args[1:]

    return parser.parse_args(args)

def noop_job(job, options):
    """
    A simple noop job that should never fail.
    """
    print("This job does nothing interesting.")

def main(args):
    """
    Parses command line arguments and do the work of the program.
    "args" specifies the program arguments, with args[0] being the executable
    name. The return value should be used as the program's exit code.
    """

    options = parse_args(args) # This holds the nicely-parsed options object

    # Make a root job
    root_job = Job.wrapJobFn(noop_job, options,
        cores=1, memory="1G", disk="1G")

    # Run it and see how many jobs fail
    failed_jobs = Job.Runner.startToil(root_job,  options)

    if failed_jobs > 0:
        raise Exception("{} jobs failed!".format(failed_jobs))

    print("All jobs completed successfully")

if __name__ == "__main__" :
    sys.exit(main(sys.argv))

I can run this fine on ku (the master) with the singleMachine batch system. It also runs just fine on the ku cluster nodes when using the singleMachine batch system. However, when I run it with the Parasol batch system, I get this:

[anovak@ku hgvm]$ rm -Rf break_tree && ./breakToil.py ./break_tree --batchSystem=parasol
No handlers could be found for logger "toil.resource"
INFO:toil.lib.bioio:Logging set at level: INFO
INFO:toil.common:Using the parasol batch system
INFO:toil.jobStores.fileJobStore:Jobstore directory is: /hive/users/anovak/ga4gh/bake-off/hgvm/break_tree
INFO:toil.batchSystems.parasol:Going to sleep for a few seconds to kill any existing jobs
INFO:toil.batchSystems.parasol:Removed any old jobs from the queue
WARNING:toil.batchSystems.parasol:Could not clear sick status of the parasol batch /hive/users/anovak/ga4gh/bake-off/hgvm/break_tree/results.txt
WARNING:toil.batchSystems.parasol:Could not flush the parasol batch /hive/users/anovak/ga4gh/bake-off/hgvm/break_tree/results.txt
INFO:toil.batchSystems.parasol:Reset the results queue
INFO:toil.common:Written the environment for the jobs to the environment file
WARNING:toil.resource:Can't globalize module ModuleDescriptor(dirPath='/cluster/home/anovak/.local/lib/python2.7/site-packages', name='toil.job', extension='.pyc').
INFO:toil.leader:Checked batch system has no running jobs and no updated jobs
INFO:toil.leader:Found 1 jobs to start and 0 jobs with successors to run
INFO:toil.leader:Starting the main loop
WARNING:toil.leader:The jobWrapper seems to have left a log file, indicating failure: i/a/jobHez1Wk
WARNING:toil.leader:Reporting file: i/a/jobHez1Wk
WARNING:toil.leader:i/a/jobHez1Wk:      ---TOIL WORKER OUTPUT LOG---
WARNING:toil.leader:i/a/jobHez1Wk:      WARNING:toil.resource:Can't find resource for leader path '/hive/users/anovak/ga4gh/bake-off/hgvm/breakToil.py'
WARNING:toil.leader:i/a/jobHez1Wk:      WARNING:toil.resource:Can't localize module ModuleDescriptor(dirPath='/hive/users/anovak/ga4gh/bake-off/hgvm', name='breakToil', extension='.py')
WARNING:toil.leader:i/a/jobHez1Wk:      Traceback (most recent call last):
WARNING:toil.leader:i/a/jobHez1Wk:        File "/cluster/home/anovak/.local/lib/python2.7/site-packages/toil/worker.py", line 284, in main
WARNING:toil.leader:i/a/jobHez1Wk:          fileStore=fileStore)
WARNING:toil.leader:i/a/jobHez1Wk:        File "/cluster/home/anovak/.local/lib/python2.7/site-packages/toil/job.py", line 1072, in _execute
WARNING:toil.leader:i/a/jobHez1Wk:          returnValues = self.run(fileStore)
WARNING:toil.leader:i/a/jobHez1Wk:        File "/cluster/home/anovak/.local/lib/python2.7/site-packages/toil/job.py", line 1184, in run
WARNING:toil.leader:i/a/jobHez1Wk:          userFunction = self._getUserFunction()
WARNING:toil.leader:i/a/jobHez1Wk:        File "/cluster/home/anovak/.local/lib/python2.7/site-packages/toil/job.py", line 1156, in _getUserFunction
WARNING:toil.leader:i/a/jobHez1Wk:          userFunctionModule = self._loadUserModule(self.userFunctionModule)
WARNING:toil.leader:i/a/jobHez1Wk:        File "/cluster/home/anovak/.local/lib/python2.7/site-packages/toil/job.py", line 738, in _loadUserModule
WARNING:toil.leader:i/a/jobHez1Wk:          return importlib.import_module(userModule.name)
WARNING:toil.leader:i/a/jobHez1Wk:        File "/cluster/home/anovak/python/lib/python2.7/importlib/__init__.py", line 37, in import_module
WARNING:toil.leader:i/a/jobHez1Wk:          __import__(name)
WARNING:toil.leader:i/a/jobHez1Wk:        File "/hive/users/anovak/ga4gh/bake-off/hgvm/breakToil.py", line 12, in <module>
WARNING:toil.leader:i/a/jobHez1Wk:          import toil.jobStores.azureJobStore
WARNING:toil.leader:i/a/jobHez1Wk:        File "/cluster/home/anovak/.local/lib/python2.7/site-packages/toil/jobStores/azureJobStore.py", line 32, in <module>
WARNING:toil.leader:i/a/jobHez1Wk:          from azure.storage import (TableService, BlobService, SharedAccessPolicy, AccessPolicy,
WARNING:toil.leader:i/a/jobHez1Wk:        File "/cluster/home/anovak/.local/lib/python2.7/site-packages/azure/storage/__init__.py", line 1312, in <module>
WARNING:toil.leader:i/a/jobHez1Wk:          from azure.storage.blobservice import BlobService
WARNING:toil.leader:i/a/jobHez1Wk:        File "/cluster/home/anovak/.local/lib/python2.7/site-packages/azure/storage/blobservice.py", line 67, in <module>
WARNING:toil.leader:i/a/jobHez1Wk:          from azure.storage.storageclient import _StorageClient
WARNING:toil.leader:i/a/jobHez1Wk:        File "/cluster/home/anovak/.local/lib/python2.7/site-packages/azure/storage/storageclient.py", line 26, in <module>
WARNING:toil.leader:i/a/jobHez1Wk:          from azure.http.httpclient import _HTTPClient
WARNING:toil.leader:i/a/jobHez1Wk:        File "/cluster/home/anovak/.local/lib/python2.7/site-packages/azure/http/httpclient.py", line 20, in <module>
WARNING:toil.leader:i/a/jobHez1Wk:          from httplib import (
WARNING:toil.leader:i/a/jobHez1Wk:      ImportError: cannot import name HTTPSConnection
WARNING:toil.leader:i/a/jobHez1Wk:      Exiting the worker because of a failed jobWrapper on host ku-1-32.local
WARNING:toil.leader:i/a/jobHez1Wk:      ERROR:__main__:Exiting the worker because of a failed jobWrapper on host ku-1-32.local
WARNING:toil.leader:i/a/jobHez1Wk:      WARNING:toil.jobWrapper:Due to failure we are reducing the remaining retry count of job i/a/jobHez1Wk to 0
WARNING:toil.leader:i/a/jobHez1Wk:      WARNING:toil.jobWrapper:We have increased the default memory of the failed job to 2147483648 bytes
WARNING:toil.leader:Job: i/a/jobHez1Wk is completely failed
INFO:toil.leader:Only failed jobs and their dependents (1 total) are remaining, so exiting.
INFO:toil.leader:Finished the main loop
INFO:toil.leader:Waiting for stats and logging collator process to finish
INFO:toil.leader:Stats/logging finished collating in 0.285951137543 seconds
Traceback (most recent call last):
  File "./breakToil.py", line 69, in <module>
    sys.exit(main(sys.argv))
  File "./breakToil.py", line 61, in main
    failed_jobs = Job.Runner.startToil(root_job,  options)
  File "/cluster/home/anovak/.local/lib/python2.7/site-packages/toil/job.py", line 362, in startToil
    return mainLoop(config, batchSystem, jobStore, rootJob)
  File "/cluster/home/anovak/.local/lib/python2.7/site-packages/toil/leader.py", line 505, in mainLoop
    raise FailedJobsException( config.jobStore, totalFailedJobs )
toil.leader.FailedJobsException: The job store '/hive/users/anovak/ga4gh/bake-off/hgvm/break_tree' contains 1 failed jobs
[anovak@ku hgvm]$ 

Is anyone else able to reproduce this on ku? I do in fact have the Azure modules installed; they seem to be failing to find some important internal Python component.

adamnovak commented 8 years ago

This is happening to me on commit 5dc041253c of Toil.

joelarmstrong commented 8 years ago

Did you compile your own Python? If so, make sure it was compiled with SSL support.

If you're using the system python, that's had issues with SSL support for ages. Sometimes changing your LD_LIBRARY_PATH so it gets a more recent version of openssl fixes things.

adamnovak commented 8 years ago

I did in fact build my own python, but I'm pretty sure it has SSL support. Like I said, it works fine on the same systems when not running under Parasol.

I modified my script to log some info about the process after an attempt to import the module. Here it is:

When it fails:

[anovak@ku hgvm]$ rm -Rf break_tree && ./breakToil.py ./break_tree --batchSystem=parasol
No handlers could be found for logger "toil.resource"
INFO:toil.lib.bioio:Logging set at level: INFO
INFO:toil.common:Using the parasol batch system
INFO:toil.jobStores.fileJobStore:Jobstore directory is: /hive/users/anovak/ga4gh/bake-off/hgvm/break_tree
INFO:toil.batchSystems.parasol:Going to sleep for a few seconds to kill any existing jobs
INFO:toil.batchSystems.parasol:Removed any old jobs from the queue
WARNING:toil.batchSystems.parasol:Could not clear sick status of the parasol batch /hive/users/anovak/ga4gh/bake-off/hgvm/break_tree/results.txt
WARNING:toil.batchSystems.parasol:Could not flush the parasol batch /hive/users/anovak/ga4gh/bake-off/hgvm/break_tree/results.txt
INFO:toil.batchSystems.parasol:Reset the results queue
INFO:toil.common:Written the environment for the jobs to the environment file
WARNING:toil.resource:Can't globalize module ModuleDescriptor(dirPath='/cluster/home/anovak/.local/lib/python2.7/site-packages', name='toil.job', extension='.pyc').
INFO:toil.leader:Checked batch system has no running jobs and no updated jobs
INFO:toil.leader:Found 1 jobs to start and 0 jobs with successors to run
INFO:toil.leader:Starting the main loop
WARNING:toil.leader:The jobWrapper seems to have left a log file, indicating failure: h/g/jobCKwJxd
WARNING:toil.leader:Reporting file: h/g/jobCKwJxd
WARNING:toil.leader:h/g/jobCKwJxd:      ---TOIL WORKER OUTPUT LOG---
WARNING:toil.leader:h/g/jobCKwJxd:      WARNING:toil.resource:Can't find resource for leader path '/hive/users/anovak/ga4gh/bake-off/hgvm/breakToil.py'
WARNING:toil.leader:h/g/jobCKwJxd:      WARNING:toil.resource:Can't localize module ModuleDescriptor(dirPath='/hive/users/anovak/ga4gh/bake-off/hgvm', name='breakToil', extension='.py')
WARNING:toil.leader:h/g/jobCKwJxd:      This job does nothing interesting.
WARNING:toil.leader:h/g/jobCKwJxd:      Interpreter:
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/bin/python2.7
WARNING:toil.leader:h/g/jobCKwJxd:      Environment:
WARNING:toil.leader:h/g/jobCKwJxd:      {'SSH_ASKPASS': '/usr/libexec/openssh/gnome-ssh-askpass', 'SSH_CLIENT': '128.114.59.89 59997 22', 'MYSQLINC': '/usr/include/mysql', 'USE_SSL': '1', 'REMOTEHOST': 'ku.local', 'MAVEN_HOME': '/opt/maven', 'LESSOPEN': '|/usr/bin/lesspipe.sh %s', 'CXXFLAGS': '-fmessage-length=80 -fmessage-length=80 ', 'WINDOW': '0', 'SGE_CELL': 'default', 'CVS_RSH': 'ssh', 'LOGNAME': 'anovak', 'USER': 'anovak', 'QTDIR': '/usr/lib64/qt-3.3', 'PATH': '/cluster/home/anovak/build/jdk1.7.0_51/bin:/cluster/home/anovak/.local/bin:/cluster/home/anovak/python/bin:/cluster/home/anovak/build/lastz-distrib-1.03.34/bin:/cluster/software/bin:/hive/groups/recon/local/bin:/cluster/home/anovak/build/jdk1.7.0_51/bin:/cluster/home/anovak/.local/bin:/cluster/home/anovak/python/bin:/cluster/home/anovak/build/lastz-distrib-1.03.34/bin:/cluster/software/bin:/hive/groups/recon/local/bin:/opt/openmpi/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/bio/ncbi/bin:/opt/bio/mpiblast/bin:/opt/bio/EMBOSS/bin:/opt/bio/clustalw/bin:/opt/bio/tcoffee/bin:/opt/bio/hmmer/bin:/opt/bio/phylip/exe:/opt/bio/mrbayes:/opt/bio/fasta:/opt/bio/glimmer/bin:/opt/bio/glimmer/scripts:/opt/bio/gromacs/bin:/opt/bio/gmap/bin:/opt/bio/tigr/bin:/opt/bio/autodocksuite/bin:/opt/bio/wgs/bin:/opt/eclipse:/opt/ganglia/bin:/opt/ganglia/sbin:/usr/java/latest/bin:/opt/maven/bin:/opt/pdsh/bin:/opt/rocks/bin:/opt/rocks/sbin:/opt/gridengine/bin/linux-x64:/cluster/bin/penn/x86_64:/cluster/home/anovak/hive/build/progressiveCactus/bin:/cluster/home/anovak/hive/build/progressiveCactus/submodules/hal/bin:/cluster/bin/x86_64:/cluster/home/anovak/hive/kent/src/parasol/bin:/cluster/home/anovak/bin/x86_64:/cluster/home/anovak/build/bedops/bin:/cluster/home/anovak/build/bedtools2-2.20.1/bin:/cluster/home/anovak/bin:/cluster/home/anovak/build/rlcsa:/cluster/home/anovak/build/sbt/bin:/cluster/home/anovak/build/ropebwt2:/cluster/home/anovak/build/apache-maven-3.2.1/bin:/cluster/home/anovak/build/cactus2hal/bin:/cluster/home/anovak/build/jellyfish-2.1.3/bin:/cluster/home/anovak/build/mafTools/bin:/cluster/home/anovak/build/bwa-0.7.10:/cluster/home/anovak/build/hal2sg:/cluster/home/anovak/build/mafJoin/bin:/cluster/home/anovak/build/edirect:/cluster/home/anovak/build/vg:/cluster/home/anovak/build/sg2vg:/cluster/home/anovak/bin:/cluster/bin/penn/x86_64:/cluster/home/anovak/hive/build/progressiveCactus/bin:/cluster/home/anovak/hive/build/progressiveCactus/submodules/hal/bin:/cluster/bin/x86_64:/cluster/home/anovak/hive/kent/src/parasol/bin:/cluster/home/anovak/bin/x86_64:/cluster/home/anovak/build/bedops/bin:/cluster/home/anovak/build/bedtools2-2.20.1/bin:/cluster/home/anovak/bin:/cluster/home/anovak/build/rlcsa:/cluster/home/anovak/build/sbt/bin:/cluster/home/anovak/build/ropebwt2:/cluster/home/anovak/build/apache-maven-3.2.1/bin:/cluster/home/anovak/build/cactus2hal/bin:/cluster/home/anovak/build/jellyfish-2.1.3/bin:/cluster/home/anovak/build/mafTools/bin:/cluster/home/anovak/build/bwa-0.7.10:/cluster/home/anovak/build/hal2sg:/cluster/home/anovak/build/mafJoin/bin:/cluster/home/anovak/build/edirect:/cluster/home/anovak/build/vg:/cluster/home/anovak/build/sg2vg', 'PARASOL': '7', 'LD_LIBRARY_PATH': '/cluster/home/anovak/.local/lib64:/cluster/home/anovak/.local/lib:/cluster/software/lib', 'SSH_CONNECTION': '128.114.59.89 59997 132.249.245.78 22', 'LANG': 'en_US.UTF-8', 'QTLIB': '/usr/lib64/qt-3.3/lib', 'TERM': 'screen', 'ECLIPSE_HOME': '/opt/eclipse', 'LIBRARY_PATH': '/cluster/home/anovak/.local/lib64:/cluster/home/anovak/.local/lib', 'MPIHOME': '/opt/openmpi', 'QTINC': '/usr/lib64/qt-3.3/include', 'PDSHROOT': '/opt/pdsh', 'LD_RUN_PATH': '/cluster/home/anovak/.local/lib', 'G_BROKEN_FILENAMES': '1', 'SGE_EXECD_PORT': '537', 'ROCKS_ROOT': '/opt/rocks', 'BLASTMAT': '/opt/bio/ncbi/data', 'GCC_COLORS': '1', 'TMPDIR': '/scratch/tmp', 'PYTHONPATH': '/hive/users/anovak/build/progressiveCactus/submodules:/hive/users/anovak/build/progressiveCactus/submodules:', 'SGE_QMASTER_PORT': '536', 'CFLAGS': '-fmessage-length=80 -fmessage-length=80 ', 'JAVA_HOME': '/cluster/home/anovak/build/jdk1.7.0_51', 'HOME': '/cluster/home/anovak', 'MODULESHOME': '/usr/share/Modules', 'SGE_ROOT': '/opt/gridengine', 'JOB_ID': '491662485', 'PS1': '[\\[\\e[7m\\]\\u@\\[\\e[38;5;86m\\]\\h\\[\\e[m\\] \\W]\\[\\e[1;32m\\]$\\[\\e[m\\] ', 'REMOTEUSER': 'root', 'MYSQLLIBS': '/usr/lib64/mysql/libmysqlclient.a -lz', 'HMMER_DB': '/cluster/home/anovak/bio/hmmer/db', 'HOST': 'ku-1-32.local', 'SHELL': '/bin/bash', 'BIOROLL': '/opt/bio', 'PKG_CONFIG_PATH': '/cluster/home/anovak/.local/lib/pkgconfig', 'BASH_FUNC_module()': '() {  eval `/usr/bin/modulecmd bash $*`\n}', 'HISTSIZE': '1000', 'STY': '4316.pts-3.ku', '_': './breakToil.py', 'MODULEPATH': '/usr/share/Modules/modulefiles:/etc/modulefiles', 'TERMCAP': 'SC|screen|VT 100/ANSI X3.64 virtual terminal:\\\n\t:DO=\\E[%dB:LE=\\E[%dD:RI=\\E[%dC:UP=\\E[%dA:bs:bt=\\E[Z:\\\n\t:cd=\\E[J:ce=\\E[K:cl=\\E[H\\E[J:cm=\\E[%i%d;%dH:ct=\\E[3g:\\\n\t:do=^J:nd=\\E[C:pt:rc=\\E8:rs=\\Ec:sc=\\E7:st=\\EH:up=\\EM:\\\n\t:le=^H:bl=^G:cr=^M:it#8:ho=\\E[H:nw=\\EE:ta=^I:is=\\E)0:\\\n\t:li#53:co#133:am:xn:xv:LP:sr=\\EM:al=\\E[L:AL=\\E[%dL:\\\n\t:cs=\\E[%i%d;%dr:dl=\\E[M:DL=\\E[%dM:dc=\\E[P:DC=\\E[%dP:\\\n\t:im=\\E[4h:ei=\\E[4l:mi:IC=\\E[%d@:ks=\\E[?1h\\E=:\\\n\t:ke=\\E[?1l\\E>:vi=\\E[?25l:ve=\\E[34h\\E[?25h:vs=\\E[34l:\\\n\t:ti=\\E[?1049h:te=\\E[?1049l:us=\\E[4m:ue=\\E[24m:so=\\E[3m:\\\n\t:se=\\E[23m:mb=\\E[5m:md=\\E[1m:mr=\\E[7m:me=\\E[m:ms:\\\n\t:Co#8:pa#64:AF=\\E[3%dm:AB=\\E[4%dm:op=\\E[39;49m:AX:\\\n\t:vb=\\Eg:G0:as=\\E(0:ae=\\E(B:\\\n\t:ac=\\140\\140aaffggjjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~..--++,,hhII00:\\\n\t:po=\\E[5i:pf=\\E[4i:k0=\\E[10~:k1=\\EOP:k2=\\EOQ:k3=\\EOR:\\\n\t:k4=\\EOS:k5=\\E[15~:k6=\\E[17~:k7=\\E[18~:k8=\\E[19~:\\\n\t:k9=\\E[20~:k;=\\E[21~:F1=\\E[23~:F2=\\E[24~:F3=\\E[1;2P:\\\n\t:F4=\\E[1;2Q:F5=\\E[1;2R:F6=\\E[1;2S:F7=\\E[15;2~:\\\n\t:F8=\\E[17;2~:F9=\\E[18;2~:FA=\\E[19;2~:kb=\x7f:K2=\\EOE:\\\n\t:kB=\\E[Z:kF=\\E[1;2B:kR=\\E[1;2A:*4=\\E[3;2~:*7=\\E[1;2F:\\\n\t:#2=\\E[1;2H:#3=\\E[2;2~:#4=\\E[1;2D:%c=\\E[6;2~:%e=\\E[5;2~:\\\n\t:%i=\\E[1;2C:kh=\\E[1~:@1=\\E[1~:kH=\\E[4~:@7=\\E[4~:\\\n\t:kN=\\E[6~:kP=\\E[5~:kI=\\E[2~:kD=\\E[3~:ku=\\EOA:kd=\\EOB:\\\n\t:kr=\\EOC:kl=\\EOD:km:', 'SGE_ARCH': 'linux-x64', 'MACHTYPE': 'x86_64', 'ANT_HOME': '/opt/rocks', 'SSH_TTY': '/dev/pts/3', 'LC_COLLATE': 'C', 'LOADEDMODULES': 'rocks-openmpi', 'LS_COLORS': 'rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.tbz=01;31:*.tbz2=01;31:*.bz=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:', 'OMPI_MCA_btl': 'tcp,self', 'HISTCONTROL': 'ignoredups', 'SHLVL': '2', 'PWD': '/cluster/home/anovak/hive/ga4gh/bake-off/hgvm', 'CPLUS_INCLUDE_PATH': '/cluster/home/anovak/.local/include', 'ROCKSROOT': '/opt/rocks/share/devel', 'MPICH_PROCESS_GROUP': 'no', 'MAIL': '/var/spool/mail/anovak', 'ROLLSROOT': '/opt/rocks/share/devel/src/roll', '_LMFILES_': '/usr/share/Modules/modulefiles/rocks-openmpi', 'C_INCLUDE_PATH': '/cluster/Traceback (most recent call last):
WARNING:toil.leader:h/g/jobCKwJxd:        File "/cluster/home/anovak/.local/lib/python2.7/site-packages/toil/worker.py", line 284, in main
WARNING:toil.leader:h/g/jobCKwJxd:          fileStore=fileStore)
WARNING:toil.leader:h/g/jobCKwJxd:        File "/cluster/home/anovak/.local/lib/python2.7/site-packages/toil/job.py", line 1072, in _execute
WARNING:toil.leader:h/g/jobCKwJxd:          returnValues = self.run(fileStore)
WARNING:toil.leader:h/g/jobCKwJxd:        File "/cluster/home/anovak/.local/lib/python2.7/site-packages/toil/job.py", line 1186, in run
WARNING:toil.leader:h/g/jobCKwJxd:          rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
WARNING:toil.leader:h/g/jobCKwJxd:        File "/hive/users/anovak/ga4gh/bake-off/hgvm/breakToil.py", line 66, in noop_job
WARNING:toil.leader:h/g/jobCKwJxd:          import toil.jobStores.azureJobStore
WARNING:toil.leader:h/g/jobCKwJxd:        File "/cluster/home/anovak/.local/lib/python2.7/site-packages/toil/jobStores/azureJobStore.py", line 32, in <module>
WARNING:toil.leader:h/g/jobCKwJxd:          from azure.storage import (TableService, BlobService, SharedAccessPolicy, AccessPolicy,
WARNING:toil.leader:h/g/jobCKwJxd:        File "/cluster/home/anovak/.local/lib/python2.7/site-packages/azure/storage/__init__.py", line 1312, in <module>
WARNING:toil.leader:h/g/jobCKwJxd:          from azure.storage.blobservice import BlobService
WARNING:toil.leader:h/g/jobCKwJxd:        File "/cluster/home/anovak/.local/lib/python2.7/site-packages/azure/storage/blobservice.py", line 67, in <module>
WARNING:toil.leader:h/g/jobCKwJxd:          from azure.storage.storageclient import _StorageClient
WARNING:toil.leader:h/g/jobCKwJxd:        File "/cluster/home/anovak/.local/lib/python2.7/site-packages/azure/storage/storageclient.py", line 26, in <module>
WARNING:toil.leader:h/g/jobCKwJxd:          from azure.http.httpclient import _HTTPClient
WARNING:toil.leader:h/g/jobCKwJxd:        File "/cluster/home/anovak/.local/lib/python2.7/site-packages/azure/http/httpclient.py", line 20, in <module>
WARNING:toil.leader:h/g/jobCKwJxd:          from httplib import (
WARNING:toil.leader:h/g/jobCKwJxd:      ImportError: cannot import name HTTPSConnection
WARNING:toil.leader:h/g/jobCKwJxd:      Exiting the worker because of a failed jobWrapper on host ku-1-32.local
WARNING:toil.leader:h/g/jobCKwJxd:      ERROR:__main__:Exiting the worker because of a failed jobWrapper on host ku-1-32.local
WARNING:toil.leader:h/g/jobCKwJxd:      WARNING:toil.jobWrapper:Due to failure we are reducing the remaining retry count of job h/g/jobCKwJxd to 0
WARNING:toil.leader:h/g/jobCKwJxd:      WARNING:toil.jobWrapper:We have increased the default memory of the failed job to 2147483648 bytes
WARNING:toil.leader:h/g/jobCKwJxd:      home/anovak/.local/include', 'BLASTDB': '/cluster/home/anovak/bio/ncbi/db'}
WARNING:toil.leader:h/g/jobCKwJxd:      Loaded libraries:
WARNING:toil.leader:h/g/jobCKwJxd:      /lib64/libbz2.so.1.0.4
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/binascii.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/grp.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/bz2.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_struct.so
WARNING:toil.leader:h/g/jobCKwJxd:      /lib64/libkeyutils.so.1.3
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/zlib.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_socket.so
WARNING:toil.leader:h/g/jobCKwJxd:      /lib64/libgssapi_krb5.so.2.2
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/pyexpat.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_heapq.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/select.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/math.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/array.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_locale.so
WARNING:toil.leader:h/g/jobCKwJxd:      /lib64/libm-2.12.so
WARNING:toil.leader:h/g/jobCKwJxd:      /lib64/libkrb5.so.3.3
WARNING:toil.leader:h/g/jobCKwJxd:      /usr/lib64/libcrypto.so.1.0.0
WARNING:toil.leader:h/g/jobCKwJxd:      /lib64/libresolv-2.12.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_multiprocessing.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/resource.so
WARNING:toil.leader:h/g/jobCKwJxd:      /lib64/libdl-2.12.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/cPickle.so
WARNING:toil.leader:h/g/jobCKwJxd:      /lib64/libkrb5support.so.0.1
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/datetime.so
WARNING:toil.leader:h/g/jobCKwJxd:      /lib64/libuuid.so.1.3.0
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_collections.so
WARNING:toil.leader:h/g/jobCKwJxd:      /lib64/libselinux.so.1
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_json.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_io.so
WARNING:toil.leader:h/g/jobCKwJxd:      /lib64/libutil-2.12.so
WARNING:toil.leader:h/g/jobCKwJxd:      /lib64/libz.so.1.2.3
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_hashlib.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_random.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_bisect.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/fcntl.so
WARNING:toil.leader:h/g/jobCKwJxd:      /lib64/libk5crypto.so.3.1
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/itertools.so
WARNING:toil.leader:h/g/jobCKwJxd:      /lib64/ld-2.12.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_functools.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/cStringIO.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/time.so
WARNING:toil.leader:h/g/jobCKwJxd:      /lib64/libcom_err.so.2.1
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/strop.so
WARNING:toil.leader:h/g/jobCKwJxd:      /usr/lib64/libssl.so.1.0.0
WARNING:toil.leader:h/g/jobCKwJxd:      /lib64/libc-2.12.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_elementtree.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/operator.so
WARNING:toil.leader:h/g/jobCKwJxd:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_ctypes.so
WARNING:toil.leader:h/g/jobCKwJxd:      /lib64/libpthread-2.12.so
WARNING:toil.leader:Job: h/g/jobCKwJxd is completely failed
INFO:toil.leader:Only failed jobs and their dependents (1 total) are remaining, so exiting.
INFO:toil.leader:Finished the main loop
INFO:toil.leader:Waiting for stats and logging collator process to finish
INFO:toil.leader:Stats/logging finished collating in 0.308463096619 seconds
Traceback (most recent call last):
  File "./breakToil.py", line 92, in <module>
    sys.exit(main(sys.argv))
  File "./breakToil.py", line 84, in main
    failed_jobs = Job.Runner.startToil(root_job,  options)
  File "/cluster/home/anovak/.local/lib/python2.7/site-packages/toil/job.py", line 362, in startToil
    return mainLoop(config, batchSystem, jobStore, rootJob)
  File "/cluster/home/anovak/.local/lib/python2.7/site-packages/toil/leader.py", line 505, in mainLoop
    raise FailedJobsException( config.jobStore, totalFailedJobs )
toil.leader.FailedJobsException: The job store '/hive/users/anovak/ga4gh/bake-off/hgvm/break_tree' contains 1 failed jobs

When it works:

[anovak@ku hgvm]$ rm -Rf break_tree && ./breakToil.py ./break_tree --batchSystem=singleMachine
No handlers could be found for logger "toil.resource"
INFO:toil.lib.bioio:Logging set at level: INFO
INFO:toil.common:Using the single machine batch system
INFO:toil.jobStores.fileJobStore:Jobstore directory is: /hive/users/anovak/ga4gh/bake-off/hgvm/break_tree
WARNING:toil.batchSystems.singleMachine:Limiting maxCores to CPU count of system (24).
INFO:toil.batchSystems.singleMachine:Setting up the thread pool with 240 workers, given a minimum CPU fraction of 0.100000 and a maximum CPU value of 24.
INFO:toil.common:Written the environment for the jobs to the environment file
WARNING:toil.resource:Can't globalize module ModuleDescriptor(dirPath='/cluster/home/anovak/.local/lib/python2.7/site-packages', name='toil.job', extension='.pyc').
INFO:toil.leader:Checked batch system has no running jobs and no updated jobs
INFO:toil.leader:Found 1 jobs to start and 0 jobs with successors to run
INFO:toil.leader:Starting the main loop
INFO:toil.batchSystems.singleMachine:Executing command: '/cluster/home/anovak/python/bin/python2.7 -E /cluster/home/anovak/.local/lib/python2.7/site-packages/toil/worker.py /hive/users/anovak/ga4gh/bake-off/hgvm/break_tree f/J/jobSTGCMw'.
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/cluster/home/anovak/python/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/cluster/home/anovak/python/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/cluster/home/anovak/.local/lib/python2.7/site-packages/toil/job.py", line 398, in asyncWrite
    raise RuntimeError("The termination flag is set, exiting")
RuntimeError: The termination flag is set, exiting

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/cluster/home/anovak/python/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/cluster/home/anovak/python/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/cluster/home/anovak/.local/lib/python2.7/site-packages/toil/job.py", line 398, in asyncWrite
    raise RuntimeError("The termination flag is set, exiting")
RuntimeError: The termination flag is set, exiting

Exception RuntimeError: RuntimeError('cannot join current thread',) in <bound method FileStore.__del__ of <toil.job.FileStore object at 0x7f9037b7e490>> ignored
WARNING:toil.leader:The jobWrapper seems to have left a log file, indicating failure: f/J/jobSTGCMw
WARNING:toil.leader:Reporting file: f/J/jobSTGCMw
WARNING:toil.leader:f/J/jobSTGCMw:      ---TOIL WORKER OUTPUT LOG---
WARNING:toil.leader:f/J/jobSTGCMw:      WARNING:toil.resource:Can't find resource for leader path '/hive/users/anovak/ga4gh/bake-off/hgvm/breakToil.py'
WARNING:toil.leader:f/J/jobSTGCMw:      WARNING:toil.resource:Can't localize module ModuleDescriptor(dirPath='/hive/users/anovak/ga4gh/bake-off/hgvm', name='breakToil', extension='.py')
WARNING:toil.leader:f/J/jobSTGCMw:      This job does nothing interesting.
WARNING:toil.leader:f/J/jobSTGCMw:      Interpreter:
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/bin/python2.7
WARNING:toil.leader:f/J/jobSTGCMw:      Environment:
WARNING:toil.leader:f/J/jobSTGCMw:      {'SSH_ASKPASS': '/usr/libexec/openssh/gnome-ssh-askpass', 'LOADEDMODULES': 'rocks-openmpi', 'MYSQLINC': '/usr/include/mysql', 'LESSOPEN': '|/usr/bin/lesspipe.sh %s', 'CXXFLAGS': '-fmessage-length=80 -fmessage-length=80 ', 'MACHTYPE': 'x86_64', 'SSH_CLIENT': '128.114.59.89 59997 22', 'BLASTDB': '/cluster/home/anovak/bio/ncbi/db', 'LOGNAME': 'anovak', 'USER': 'anovak', 'MAVEN_HOME': '/opt/maven', 'QTDIR': '/usr/lib64/qt-3.3', 'ECLIPSE_HOME': '/opt/eclipse', 'LD_LIBRARY_PATH': '/cluster/home/anovak/.local/lib64:/cluster/home/anovak/.local/lib:/cluster/software/lib', 'PATH': '/cluster/home/anovak/build/jdk1.7.0_51/bin:/cluster/home/anovak/.local/bin:/cluster/home/anovak/python/bin:/cluster/home/anovak/build/lastz-distrib-1.03.34/bin:/cluster/software/bin:/hive/groups/recon/local/bin:/cluster/home/anovak/build/jdk1.7.0_51/bin:/cluster/home/anovak/.local/bin:/cluster/home/anovak/python/bin:/cluster/home/anovak/build/lastz-distrib-1.03.34/bin:/cluster/software/bin:/hive/groups/recon/local/bin:/opt/openmpi/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/bio/ncbi/bin:/opt/bio/mpiblast/bin:/opt/bio/EMBOSS/bin:/opt/bio/clustalw/bin:/opt/bio/tcoffee/bin:/opt/bio/hmmer/bin:/opt/bio/phylip/exe:/opt/bio/mrbayes:/opt/bio/fasta:/opt/bio/glimmer/bin:/opt/bio/glimmer/scripts:/opt/bio/gromacs/bin:/opt/bio/gmap/bin:/opt/bio/tigr/bin:/opt/bio/autodocksuite/bin:/opt/bio/wgs/bin:/opt/eclipse:/opt/ganglia/bin:/opt/ganglia/sbin:/usr/java/latest/bin:/opt/maven/bin:/opt/pdsh/bin:/opt/rocks/bin:/opt/rocks/sbin:/opt/gridengine/bin/linux-x64:/cluster/bin/penn/x86_64:/cluster/home/anovak/hive/build/progressiveCactus/bin:/cluster/home/anovak/hive/build/progressiveCactus/submodules/hal/bin:/cluster/bin/x86_64:/cluster/home/anovak/hive/kent/src/parasol/bin:/cluster/home/anovak/bin/x86_64:/cluster/home/anovak/build/bedops/bin:/cluster/home/anovak/build/bedtools2-2.20.1/bin:/cluster/home/anovak/bin:/cluster/home/anovak/build/rlcsa:/cluster/home/anovak/build/sbt/bin:/cluster/home/anovak/build/ropebwt2:/cluster/home/anovak/build/apache-maven-3.2.1/bin:/cluster/home/anovak/build/cactus2hal/bin:/cluster/home/anovak/build/jellyfish-2.1.3/bin:/cluster/home/anovak/build/mafTools/bin:/cluster/home/anovak/build/bwa-0.7.10:/cluster/home/anovak/build/hal2sg:/cluster/home/anovak/build/mafJoin/bin:/cluster/home/anovak/build/edirect:/cluster/home/anovak/build/vg:/cluster/home/anovak/build/sg2vg:/cluster/home/anovak/bin:/cluster/bin/penn/x86_64:/cluster/home/anovak/hive/build/progressiveCactus/bin:/cluster/home/anovak/hive/build/progressiveCactus/submodules/hal/bin:/cluster/bin/x86_64:/cluster/home/anovak/hive/kent/src/parasol/bin:/cluster/home/anovak/bin/x86_64:/cluster/home/anovak/build/bedops/bin:/cluster/home/anovak/build/bedtools2-2.20.1/bin:/cluster/home/anovak/bin:/cluster/home/anovak/build/rlcsa:/cluster/home/anovak/build/sbt/bin:/cluster/home/anovak/build/ropebwt2:/cluster/home/anovak/build/apache-maven-3.2.1/bin:/cluster/home/anovak/build/cactus2hal/bin:/cluster/home/anovak/build/jellyfish-2.1.3/bin:/cluster/home/anovak/build/mafTools/bin:/cluster/home/anovak/build/bwa-0.7.10:/cluster/home/anovak/build/hal2sg:/cluster/home/anovak/build/mafJoin/bin:/cluster/home/anovak/build/edirect:/cluster/home/anovak/build/vg:/cluster/home/anovak/build/sg2vg', 'LANG': 'en_US.UTF-8', 'QTLIB': '/usr/lib64/qt-3.3/lib', 'TERM': 'screen', 'SHELL': '/bin/bash', 'CVS_RSH': 'ssh', 'LIBRARY_PATH': '/cluster/home/anovak/.local/lib64:/cluster/home/anovak/.local/lib', 'MPIHOME': '/opt/openmpi', 'BASH_FUNC_module()': '() {  eval `/usr/bin/modulecmd bash $*`\n}', 'QTINC': '/usr/lib64/qt-3.3/include', 'PDSHROOT': '/opt/pdsh', 'LD_RUN_PATH': '/cluster/home/anovak/.local/lib', 'G_BROKEN_FILENAMES': '1', 'SGE_EXECD_PORT': '537', 'HISTSIZE': '1000', 'BLASTMAT': '/opt/bio/ncbi/data', 'GCC_COLORS': '1', '_': './breakToil.py', 'SSH_CONNECTION': '128.114.59.89 59997 132.249.245.78 22', 'JAVA_HOME': '/cluster/home/anovak/build/jdk1.7.0_51', 'HOME': '/cluster/home/anovak', 'MODULESHOME': '/usr/share/Modules', 'SGE_ROOT': '/opt/gridengine', 'CFLAGS': '-fmessage-length=80 -fmessage-length=80 ', 'PS1': '[\\[\\e[7m\\]\\u@\\[\\e[38;5;86m\\]\\h\\[\\e[m\\] \\W]\\[\\e[1;32m\\]$\\[\\e[m\\] ', 'PYTHONPATH': '/hive/users/anovak/build/progressiveCactus/submodules:/hive/users/anovak/build/progressiveCactus/submodules:', 'MYSQLLIBS': '/usr/lib64/mysql/libmysqlclient.a -lz', 'HMMER_DB': '/cluster/home/anovak/bio/hmmer/db', 'OMPI_MCA_btl': 'tcp,self', 'BIOROLL': '/opt/bio', 'MAIL': '/var/spool/mail/anovak', 'PKG_CONFIG_PATH': '/cluster/home/anovak/.local/lib/pkgconfig', 'USE_SSL': '1', 'STY': '4316.pts-3.ku', 'TMPDIR': '/scratch/tmp', 'MODULEPATH': '/usr/share/Modules/modulefiles:/etc/modulefiles', 'TERMCAP': 'SC|screen|VT 100/ANSI X3.64 virtual terminal:\\\n\t:DO=\\E[%dB:LE=\\E[%dD:RI=\\E[%dC:UP=\\E[%dA:bs:bt=\\E[Z:\\\n\t:cd=\\E[J:ce=\\E[K:cl=\\E[H\\E[J:cm=\\E[%i%d;%dH:ct=\\E[3g:\\\n\t:do=^J:nd=\\E[C:pt:rc=\\E8:rs=\\Ec:sc=\\E7:st=\\EH:up=\\EM:\\\n\t:le=^H:bl=^G:cr=^M:it#8:ho=\\E[H:nw=\\EE:ta=^I:is=\\E)0:\\\n\t:li#53:co#133:am:xn:xv:LP:sr=\\EM:al=\\E[L:AL=\\E[%dL:\\\n\t:cs=\\E[%i%d;%dr:dl=\\E[M:DL=\\E[%dM:dc=\\E[P:DC=\\E[%dP:\\\n\t:im=\\E[4h:ei=\\E[4l:mi:IC=\\E[%d@:ks=\\E[?1h\\E=:\\\n\t:ke=\\E[?1l\\E>:vi=\\E[?25l:ve=\\E[34h\\E[?25h:vs=\\E[34l:\\\n\t:ti=\\E[?1049h:te=\\E[?1049l:us=\\E[4m:ue=\\E[24m:so=\\E[3m:\\\n\t:se=\\E[23m:mb=\\E[5m:md=\\E[1m:mr=\\E[7m:me=\\E[m:ms:\\\n\t:Co#8:pa#64:AF=\\E[3%dm:AB=\\E[4%dm:op=\\E[39;49m:AX:\\\n\t:vb=\\Eg:G0:as=\\E(0:ae=\\E(B:\\\n\t:ac=\\140\\140aaffggjjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~..--++,,hhII00:\\\n\t:po=\\E[5i:pf=\\E[4i:k0=\\E[10~:k1=\\EOP:k2=\\EOQ:k3=\\EOR:\\\n\t:k4=\\EOS:k5=\\E[15~:k6=\\E[17~:k7=\\E[18~:k8=\\E[19~:\\\n\t:k9=\\E[20~:k;=\\E[21~:F1=\\E[23~:F2=\\E[24~:F3=\\E[1;2P:\\\n\t:F4=\\E[1;2Q:F5=\\E[1;2R:F6=\\E[1;2S:F7=\\E[15;2~:\\\n\t:F8=\\E[17;2~:F9=\\E[18;2~:FA=\\E[19;2~:kb=\x7f:K2=\\EOE:\\\n\t:kB=\\E[Z:kF=\\E[1;2B:kR=\\E[1;2A:*4=\\E[3;2~:*7=\\E[1;2F:\\\n\t:#2=\\E[1;2H:#3=\\E[2;2~:#4=\\E[1;2D:%c=\\E[6;2~:%e=\\E[5;2~:\\\n\t:%i=\\E[1;2C:kh=\\E[1~:@1=\\E[1~:kH=\\E[4~:@7=\\E[4~:\\\n\t:kN=\\E[6~:kP=\\E[5~:kI=\\E[2~:kD=\\E[3~:ku=\\EOA:kd=\\EOB:\\\n\t:kr=\\EOC:kl=\\EOD:km:', 'SGE_ARCH': 'linux-x64', '_LMFILES_': '/usr/share/Modules/modulefiles/rocks-openmpi', 'ANT_HOME': '/opt/rocks', 'SSH_TTY': '/dev/pts/3', 'LC_COLLATE': 'C', 'HOSTNAME': 'ku.sdsc.edu', 'SGE_CELL': 'default', 'HISTCONTROL': 'ignoredups', 'SHLVL': '2', 'PWD': '/cluster/home/anovak/hive/ga4gh/bake-off/hgvm', 'WINDOW': '0', 'CPLUS_INCLUDE_PATH': '/cluster/home/anovak/.local/include', 'ROCKSROOT': '/opt/rocks/share/devel', 'MPICH_PROCESS_GROUP': 'no', 'ROCKS_ROOT': '/opt/rocks', 'ROLLSROOT': '/opt/rocks/share/devel/src/roll', 'LS_COLORS': 'rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.tbz=01;31:*.tbz2=01;31:*.bz=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:', 'C_INCLUDE_PATH': '/cluster/home/anovak/.local/include', 'SGE_QMASTER_PORT': '536'}
WARNING:toil.leader:f/J/jobSTGCMw:      Loaded liTraceback (most recent call last):
WARNING:toil.leader:f/J/jobSTGCMw:        File "/cluster/home/anovak/.local/lib/python2.7/site-packages/toil/worker.py", line 284, in main
WARNING:toil.leader:f/J/jobSTGCMw:          fileStore=fileStore)
WARNING:toil.leader:f/J/jobSTGCMw:        File "/cluster/home/anovak/.local/lib/python2.7/site-packages/toil/job.py", line 1072, in _execute
WARNING:toil.leader:f/J/jobSTGCMw:          returnValues = self.run(fileStore)
WARNING:toil.leader:f/J/jobSTGCMw:        File "/cluster/home/anovak/.local/lib/python2.7/site-packages/toil/job.py", line 1186, in run
WARNING:toil.leader:f/J/jobSTGCMw:          rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
WARNING:toil.leader:f/J/jobSTGCMw:        File "/hive/users/anovak/ga4gh/bake-off/hgvm/breakToil.py", line 68, in noop_job
WARNING:toil.leader:f/J/jobSTGCMw:          raise RuntimeError("Load succeeded")
WARNING:toil.leader:f/J/jobSTGCMw:      RuntimeError: Load succeeded
WARNING:toil.leader:f/J/jobSTGCMw:      Exiting the worker because of a failed jobWrapper on host ku.sdsc.edu
WARNING:toil.leader:f/J/jobSTGCMw:      ERROR:__main__:Exiting the worker because of a failed jobWrapper on host ku.sdsc.edu
WARNING:toil.leader:f/J/jobSTGCMw:      WARNING:toil.jobWrapper:Due to failure we are reducing the remaining retry count of job f/J/jobSTGCMw to 0
WARNING:toil.leader:f/J/jobSTGCMw:      WARNING:toil.jobWrapper:We have increased the default memory of the failed job to 2147483648 bytes
WARNING:toil.leader:f/J/jobSTGCMw:      braries:
WARNING:toil.leader:f/J/jobSTGCMw:      /lib64/libbz2.so.1.0.4
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/binascii.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/grp.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/bz2.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_struct.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/zlib.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_socket.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_elementtree.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/pyexpat.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_heapq.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/select.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/math.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/array.so
WARNING:toil.leader:f/J/jobSTGCMw:      /lib64/libm-2.12.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/.local/lib64/libcrypto.so.1.0.0
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_multiprocessing.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/resource.so
WARNING:toil.leader:f/J/jobSTGCMw:      /lib64/libdl-2.12.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/cPickle.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/datetime.so
WARNING:toil.leader:f/J/jobSTGCMw:      /lib64/libuuid.so.1.3.0
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_collections.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_ssl.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_json.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_io.so
WARNING:toil.leader:f/J/jobSTGCMw:      /lib64/libutil-2.12.so
WARNING:toil.leader:f/J/jobSTGCMw:      /lib64/libz.so.1.2.3
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_hashlib.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_random.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_bisect.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/fcntl.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/strop.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/itertools.so
WARNING:toil.leader:f/J/jobSTGCMw:      /lib64/ld-2.12.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_functools.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/cStringIO.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/time.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_locale.so
WARNING:toil.leader:f/J/jobSTGCMw:      /lib64/libc-2.12.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/operator.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/python/lib/python2.7/lib-dynload/_ctypes.so
WARNING:toil.leader:f/J/jobSTGCMw:      /cluster/home/anovak/.local/lib64/libssl.so.1.0.0
WARNING:toil.leader:f/J/jobSTGCMw:      /lib64/libpthread-2.12.so
WARNING:toil.leader:f/J/jobSTGCMw:      Load succeeded
WARNING:toil.leader:Job: f/J/jobSTGCMw is completely failed
INFO:toil.leader:Only failed jobs and their dependents (1 total) are remaining, so exiting.
INFO:toil.leader:Finished the main loop
INFO:toil.leader:Waiting for stats and logging collator process to finish
INFO:toil.leader:Stats/logging finished collating in 0.264318943024 seconds
Traceback (most recent call last):
  File "./breakToil.py", line 92, in <module>
    sys.exit(main(sys.argv))
  File "./breakToil.py", line 84, in main
    failed_jobs = Job.Runner.startToil(root_job,  options)
  File "/cluster/home/anovak/.local/lib/python2.7/site-packages/toil/job.py", line 362, in startToil
    return mainLoop(config, batchSystem, jobStore, rootJob)
  File "/cluster/home/anovak/.local/lib/python2.7/site-packages/toil/leader.py", line 505, in mainLoop
    raise FailedJobsException( config.jobStore, totalFailedJobs )
toil.leader.FailedJobsException: The job store '/hive/users/anovak/ga4gh/bake-off/hgvm/break_tree' contains 1 failed jobs
adamnovak commented 8 years ago

Not sure what's wrong with markdown today, but it looks like it is loading different libraries under parasol. Instead of /cluster/home/anovak/.local/lib64/libssl.so.1.0.0, for example, I am getting /usr/lib64/libssl.so.1.0.0.

However, my environment looks to be the same in both cases.

Is Toil maybe not setting up sys.environ early enough to control where the Python process loads its libraries from? And not getting the right settings originally due to login/non-login shell weirdness and/or Parasol just su-ing to me and not bothering to set up my environment first?

adamnovak commented 8 years ago

OK, so I went and looked at https://users.soe.ucsc.edu/~kent/src/parasol/parasol/paraNode/paraNode.c and found this for the code where Parasol sets up your user:

if ((grandChildId = fork()) == 0)
    {
    int newStdin, newStdout, newStderr, execErr;
    char *homeDir;

    /* Change to given user and dir. */
    changeUid(user, &homeDir);
    chdir(dir);
    umask(umaskVal); 

    /* Redirect standard io.  There has to  be a less
     * cryptic way to do this. Close all open files, then
     * open in/out/err in order so they have descriptors
     * 0,1,2. */
    logClose();
    close(socketHandle);
    close(connectionHandle);
    close(0);
    close(1);
    close(2);
    open(in, O_RDONLY);
    open(out, O_WRONLY | O_CREAT, 0666);
    open(err, O_WRONLY | O_CREAT, 0666);

    /* Update environment. */
        {
    struct hash *hash = environToHash(environ);
    hashUpdate(hash, "JOB_ID", jobIdString);
    hashUpdate(hash, "USER", user);
    hashUpdate(hash, "HOME", homeDir);
    hashUpdate(hash, "HOST", hostName);
    hashUpdate(hash, "PARASOL", "1");
    updatePath(hash, userPath, homeDir, sysPath);
    environ = hashToEnviron(hash);
    freeHashAndVals(&hash);
    }

    if ((execErr = execvp(exe, params)) < 0)
    {
    perror("");
    warn("Error execlp'ing %s %s", exe, params);
    }
    exit(execErr);
    }

Parasol just takes its environment it was running with as root, fixes up a few variables related to the user, and jumps right into your process. Even if Toil updates os.environ later with settings copied from my master's environment, I don't think changing os.environ["LD_LIBRARY_PATH"] will affect where libraries are looked for.

joelarmstrong commented 8 years ago

Yeah, the dlopen man page says:

If, at the time that the program was started, the environment vari‐
able  LD_LIBRARY_PATH was defined to contain a colon-separated list
of directories, then these are searched.  (As  a  security  measure
this  variable  is  ignored  for  set-user-ID and set-group-ID pro‐
grams.)

So there's no os.environ tomfoolery that can get rid of this problem, even if the import is delayed.

Toil should manually propagate environment variables important to python like PATH, PYTHONPATH, LD_LIBRARY_PATH. I think this should just be done when writing the command to issue to batchsystem rather than messing around with #441. (That setenv functionality is important as well, but this blocks us from being able to use toil on practically any system you don't manage yourself.)

adamnovak commented 8 years ago

Hiram sends the following recommendation:

Make your jobList refer to a shell script that can do anything it wants:

runOne arg1 arg2 ... argN {check out exists+ resultFile}

Where runOne is a shell script that runs the operations with the given
arguments.  You can make the shell script include your HOME/.bashrc:

#!/bin/bash
set -beEu -o pipefail
source ~/.bashrc
... proceed with commands using arguments ...

The Parasol batch system could potentially ship a Bash script that loads up the user's .bashrc or .bash_profile before running Python for the actual job.

joelarmstrong commented 8 years ago

What if they use zsh? :)

I think it's best to replicate the exact same behavior on every batch system. The single-machine batchsystem uses the same environment for each process, so rather than assuming the proper variables are all in ~/.bash_profile, ~/.bashrc, or whatever the zsh equivalent is, we should just propagate any of the currently set environment variables that could affect the main Python process. (Separately from environ.pickle, for obvious reasons :))

hannes-ucsc commented 8 years ago

So is the problem that LD_LIBRARY_PATH is not propagated?

The Parasol batch system could potentially ship a Bash script that loads up the user's .bashrc or .bash_profile before running Python for the actual job.

Too invasive and [ba]sh-specific, as @joelarmstrong points out.

we should just propagate any of the currently set environment variables that could affect the main Python process

It is hard to decide which environment variables should be propagated. I think the best solution is to make the environment manipulation a responsibility of the concrete batch system class. Then have a user-configurable set of environment variables to be propagated. The set could have a reasonable default.

Relates to #441. @cket are you following this?

hannes-ucsc commented 8 years ago

Depends on #441 Depends on #547