Closed dahlend closed 5 months ago
Will be great to have a PR.
For starters, handling dir
/ exename
is good - you will need to call addprocs
separately for each unique dir
/ exename
combination in client.jl
.
If you are upto it, handling exeflags
and sshflags
would be a bonus. Note that these flags may have spaces and =
symbols in them.
I see someone attempted this before.
JuliaLang/julia#9347
The question then comes down to an acceptable format for the machinefile
[n*]
[user@]host [bind_addr][:port]
[*dir=PATH]
[*exename=EXE]
[*exeflags=FLAGS]
[*sshflags=FLAGS]
The n remaining in place to allow for backwards compatibility, no particular order to everything after, *
for easy separation. Thoughts?
This means editing the SSHManager Manager and adding some additional parsing.
Ah yes I missed your comment about the spaces and =
, I need to think about this a bit more maybe.
Just a thought, but what about a Dict literal?
Dict(n => 10, host => "server", username => "abc", bind_addr => ... etc)
We wouldn't need to come up with a new syntax and parser, and the literals could also be easily programmaticaly created (via repr
) if necessary. You even get syntax highlighting for free! ;-)
In case parsing (simply via eval(parse(one_line))
) of the file fails we could still try to see whether it is in the current (then old) format, and show a deprecation warning.
JuliaLang/julia#7616 JuliaLang/julia#7589
These discussions are relevant.
OpenMPI hostfile standard is limited, doesn't allow for the bells and whistles I'm suggesting.
Practically speaking, both exeflags
and sshflags
would probably be the same across all hosts. They can thus be specified individually on separate lines. exeflags
and sshflags
can be only global definitions.
dir
and exename
can be specified both globally as well as per host definition. If defined at the host level, it will override the global value for that particular line.
bind_addr
can be supported as bind_addr=host:<port>
. And also make bind_addr
a keyword arg in ssh addprocs
Something along the lines of
[exeflags FLAGS]
[sshflags FLAGS]
[dir PATH]
[exename EXE]
[n*] [user@]host1 [bind_addr][:port] [*dir PATH] [*exename EXE]
[n*] [user@]host2 [bind_addr][:port] [*dir PATH] [*exename EXE]
...
This doesn't seem unreasonable.
In my setting the sshflags
could easily be different for the different hosts. What about not assuming anything about what is constant, but allow to specify each and every flag at the top as default, and each host-line may override whatever it wants?
As we are changing the format because it is too inflexible, rather not impose new restrictions?
Ok, Now to implement...
[exeflags FLAGS]
[sshflags FLAGS]
[dir PATH]
[exename EXE]
[n*] [user@]host1 [bind_addr][:port] [*dir PATH] [*exename EXE] [*sshflags FLAGS] [*exeflags FLAGS]
[n*] [user@]host2 [bind_addr][:port] [*dir PATH] [*exename EXE] [*sshflags FLAGS] [*exeflags FLAGS]
...
Wanted a fixed format before I spent time on it.
How about trying to mimic Julian code syntax, as well as the addprocs
call, as much as possible? For example, using backticks for exeflags
and sshflags
.
[exeflags=`FLAGS`]
[sshflags=`FLAGS`]
[dir="PATH"]
[exename="EXE"]
[n*] [user@]host1 [bind_addr="bind_addr[:port]"] [dir="PATH"] [exename="EXE"] [sshflags=`FLAGS`] [exeflags=`FLAGS`]
[n*] [user@]host2 [bind_addr="bind_addr[:port]"] [dir="PATH"] [exename="EXE"] [sshflags=`FLAGS`] [exeflags=`FLAGS`]
...
I think the decision was that addprocs
in a julia script is already more flexible than this would be, which makes it simpler than inventing a new format
There is little documentation for the format of machinefiles (not that it is very complex).
Currently machinefiles take number of procs, host, and ssh flags only, and assumes that the install location is conserved across systems (not the case for some of my work). The only real way to solve the problem is to not use a host file. It doesn't seem unreasonable to add some sort of option to the machinefile (I could do it pretty easily after trying to track this thing down).
It's just a little more string parsing, just a "dir=" and "exename=" option.