dask / dask-drmaa

Deploy Dask on DRMAA clusters
BSD 3-Clause "New" or "Revised" License
40 stars 22 forks source link

Modify workerTemplate for particular submissions #9

Open mrocklin opened 7 years ago

mrocklin commented 7 years ago

Currently we create a single job template to create all workers.

class DRMAACluster(object):
    def __init__(self, **kwargs):
        self.local_cluster = LocalCluster(n_workers=0, **kwargs)
        self.session = drmaa.Session()
        self.session.initialize()

        self.worker_template = self.session.createJobTemplate()
        self.worker_template.remoteCommand = os.path.join(sys.exec_prefix, 'bin', 'dask-worker')
        self.worker_template.jobName = 'dask-worker'
        self.worker_template.args = ['%s:%d' % (socket.gethostname(), self.local_cluster.scheduler.port)]
        self.worker_template.outputPath = ':/%s/out' % os.getcwd()
        self.worker_template.errorPath = ':/%s/err' % os.getcwd()
        self.worker_template.workingDirectory = os.getcwd()

        self.workers = set()

This job template is easily accessible and so provides a nice and familiar release valve for expert users of python-drmaa who want to customize their setup.

However as we submit slightly different kinds of worker jobs we'll want to modify this template a bit, by specifying extra native specifications and by adding extra keywords to the dask-worker command.

def start_workers(self, n=1, .. want to add extra arguments here ...):
    and safely create new worker template here for submission

I'm unable to find a way to copy-and-append an existing job template, which leads me to instead consider storing all of the job template attributes (args, native spec, command, error path, etc.) bare, instead of within a job template object. Then we'll create a new job template for each call to start_workers.

class DRMAACluster(object):
    def __init__(self, **kwargs):
        self.local_cluster = LocalCluster(n_workers=0, **kwargs)
        self.session = drmaa.Session()
        self.session.initialize()

        # self.worker_template = self.session.createJobTemplate()
        self.remoteCommand = os.path.join(sys.exec_prefix, 'bin', 'dask-worker')
        self.jobName = 'dask-worker'
        self.args = ['%s:%d' % (socket.gethostname(), self.local_cluster.scheduler.port)]
        self.outputPath = ':/%s/out' % os.getcwd()
        self.errorPath = ':/%s/err' % os.getcwd()
        self.workingDirectory = os.getcwd()

        self.workers = set()

    def start_workers(self, args, ...)
        wt = self.session.createJobTemplate()
        wt.args = self.args + args
        ...

Does anyone see a better way?

cc @davidr

nevermindewe commented 7 years ago

@mrocklin I think that keeping the job template attributes inside python makes more sense since the python drmaa library does a lot of C-drmaa wrapping. If for some reason, the drmaa session dies and has to be reinitialized, the drmaa job templates might be lost (speculation).

mrocklin commented 7 years ago

Yeah, in #10 we now generate a new job template every time we launch new workers.