JuliaLang / Distributed.jl

Create and control multiple Julia processes remotely for distributed computing. Ships as a Julia stdlib.
https://docs.julialang.org/en/v1/stdlib/Distributed/
MIT License
23 stars 9 forks source link

Machinefile nonuniform install locations #23

Closed dahlend closed 5 months ago

dahlend commented 9 years ago

There is little documentation for the format of machinefiles (not that it is very complex).

Currently machinefiles take number of procs, host, and ssh flags only, and assumes that the install location is conserved across systems (not the case for some of my work). The only real way to solve the problem is to not use a host file. It doesn't seem unreasonable to add some sort of option to the machinefile (I could do it pretty easily after trying to track this thing down).

It's just a little more string parsing, just a "dir=" and "exename=" option.

amitmurthy commented 9 years ago

Will be great to have a PR.

For starters, handling dir / exename is good - you will need to call addprocs separately for each unique dir / exename combination in client.jl.

If you are upto it, handling exeflags and sshflags would be a bonus. Note that these flags may have spaces and = symbols in them.

dahlend commented 9 years ago

I see someone attempted this before.

JuliaLang/julia#9347

The question then comes down to an acceptable format for the machinefile

[n*] [user@]host [bind_addr][:port] [*dir=PATH] [*exename=EXE] [*exeflags=FLAGS] [*sshflags=FLAGS]

The n remaining in place to allow for backwards compatibility, no particular order to everything after, * for easy separation. Thoughts?

This means editing the SSHManager Manager and adding some additional parsing.

dahlend commented 9 years ago

Ah yes I missed your comment about the spaces and =, I need to think about this a bit more maybe.

rened commented 9 years ago

Just a thought, but what about a Dict literal?

Dict(n => 10, host => "server", username => "abc", bind_addr => ... etc)

We wouldn't need to come up with a new syntax and parser, and the literals could also be easily programmaticaly created (via repr) if necessary. You even get syntax highlighting for free! ;-) In case parsing (simply via eval(parse(one_line))) of the file fails we could still try to see whether it is in the current (then old) format, and show a deprecation warning.

dahlend commented 9 years ago

JuliaLang/julia#7616 JuliaLang/julia#7589

These discussions are relevant.

OpenMPI hostfile standard is limited, doesn't allow for the bells and whistles I'm suggesting.

amitmurthy commented 9 years ago

Practically speaking, both exeflags and sshflags would probably be the same across all hosts. They can thus be specified individually on separate lines. exeflags and sshflags can be only global definitions.

dir and exename can be specified both globally as well as per host definition. If defined at the host level, it will override the global value for that particular line.

bind_addr can be supported as bind_addr=host:<port>. And also make bind_addr a keyword arg in ssh addprocs

dahlend commented 9 years ago

Something along the lines of

[exeflags FLAGS]
[sshflags FLAGS]
[dir PATH]
[exename EXE]
[n*] [user@]host1 [bind_addr][:port] [*dir PATH] [*exename EXE] 
[n*] [user@]host2 [bind_addr][:port] [*dir PATH] [*exename EXE] 
...

This doesn't seem unreasonable.

rened commented 9 years ago

In my setting the sshflags could easily be different for the different hosts. What about not assuming anything about what is constant, but allow to specify each and every flag at the top as default, and each host-line may override whatever it wants? As we are changing the format because it is too inflexible, rather not impose new restrictions?

dahlend commented 9 years ago

Ok, Now to implement...

[exeflags FLAGS]
[sshflags FLAGS]
[dir PATH]
[exename EXE]
[n*] [user@]host1 [bind_addr][:port] [*dir PATH] [*exename EXE] [*sshflags FLAGS] [*exeflags FLAGS]
[n*] [user@]host2 [bind_addr][:port] [*dir PATH] [*exename EXE] [*sshflags FLAGS] [*exeflags FLAGS]
...

Wanted a fixed format before I spent time on it.

amitmurthy commented 9 years ago

How about trying to mimic Julian code syntax, as well as the addprocs call, as much as possible? For example, using backticks for exeflags and sshflags .

[exeflags=`FLAGS`]
[sshflags=`FLAGS`]
[dir="PATH"]
[exename="EXE"]
[n*] [user@]host1 [bind_addr="bind_addr[:port]"] [dir="PATH"] [exename="EXE"] [sshflags=`FLAGS`] [exeflags=`FLAGS`]
[n*] [user@]host2 [bind_addr="bind_addr[:port]"] [dir="PATH"] [exename="EXE"] [sshflags=`FLAGS`] [exeflags=`FLAGS`]
...
vtjnash commented 5 months ago

I think the decision was that addprocs in a julia script is already more flexible than this would be, which makes it simpler than inventing a new format