Open dkmichaels opened 7 years ago
That should be doable. When deploying dask using the command line this would be accomplished with the --interface keyword https://stackoverflow.com/questions/43881157/how-do-i-use-an-infiniband-network-with-dask . We probably just need to expose this through the dask-drmaa interface or, better yet, help users to pass through any option.
On Wed, May 24, 2017 at 12:13 AM, dkmichaels notifications@github.com wrote:
Using SGE and wanting to initialize dask fully from python.
My main node is dual-homed, with 1G and 10G interfaces. The 10G is the one that my SGE cluster uses.
from dask_drmaa import DRMAAClusterfrom dask.distributed import Client
In [9]: cluster = DRMAACluster(hostname='master-10g')INFO:dask_drmaa.core:Start local scheduler at master-10g
In [10]: cluster.scheduler_address Out[10]: 'tcp://10.22.150.194:37386' . # this is the master-1g IP, not the one I want
Meanwhile, the workers are spinning trying to connect to the 1G IP:
tail worker.23523.1.err distributed.worker - INFO - Trying to connect to scheduler: tcp://10.22.150.194:37386
Can this be extended to allow one to specify the scheduler interface / hostname / IP to give to the workers?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-drmaa/issues/28, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszG1miLVVArcr47bWZjA_cjHVx4mZks5r82gUgaJpZM4NkaA- .
Here's the workaround I hacked together -- suggestions for improvement welcome:
Replace these lines (note the first line has no effect in the current code):
def create_job_template(...)
...
args = template['args']
args = [self.scheduler_address] + template['args']
...
with:
# replace scheduler's 1G IP with it's 10G IP
args = [self.scheduler_address.replace('10.22.150.194', '10.22.250.1')]
args = args + template['args']
Hardcoding IPs allows me to proceed with my testing, but this is really a hack.
Sometimes using the nativeSpecification
argument to DRMAA resolves issues like this. Would need to play around on your cluster and/or ask admins to know for sure.
Using SGE and wanting to initialize dask fully from python.
My main node is dual-homed, with 1G and 10G interfaces. The 10G is the one that my SGE cluster uses.
Meanwhile, the workers are spinning trying to connect to the 1G IP:
Can this be extended to allow one to specify the scheduler interface / hostname / IP to give to the workers?