flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
167 stars 50 forks source link

Start Flux in multiple nodes without Slurm #1554

Closed marceloamaral closed 4 years ago

marceloamaral commented 6 years ago

The manual shows only two options to start a flux cluster. The first one is to start a local cluster (only one machine) for testing. The second one is by using Slurm to deploy flux on several machines.

However, I am trying to start the flux instance without using Slurm, but in a Cloud infrastructure. Since the Slurm command, is actually just starting a flux instance in each machine: srun --pty --mpi=none -N64 src/cmd/flux start My first naive approach was trying to run "flux start" on different machines. But, it didn't work, the instances do not know each other.

My question here is, how the instances are discovered and connected? Is there a "master" IP to be set?

Also, after starting it, how do I connect to the rank 0 to submit jobs?

garlick commented 6 years ago

This is barely supported with a config file that contains static URI's for each node in your cluster. One can get a flux instance wired up this way, but current code will not tolerate missing nodes or nodes that go down. (As mentioned earlier, that is work that is not done yet but is coming soon)

See the config file and systemd unit file in flux-core/etc if you still want to try this.

If the FLUX_URI environment variable is not set, commands will connect to a compiled-in URI, so if a "system instance" as we call is running on the local node, then one should be able to submit jobs to it. To submit jobs from outside the instance, you can use an ssh-based URI.

Note also that jobs can only run as the user who started flux.

garlick commented 6 years ago

Note the config file bootstrap support was added in pr #1320, where there is some additional explanation in the first comment.

marceloamaral commented 6 years ago

Thanks @garlick

For a basic test i'm trying with only two machines.

Then, I have changed the flux/local/etc/flux/conf.d/boot.conf in both machines to:

session-id = "system"

mcast-endpoint = "epgm://eth0;229.0.0.1:8050"

tbon-endpoints = [ "tcp://IP-MACHINE1:8020", # rank 0 "tcp://IP-MACHINE2:8020", # rank 1 ]

After that, I ran "flux start" in both machines, but flux is still finding only one host...

I have also tried to do a "flux start" in one machine, and run the following command in the other machine.

/flux/local/bin/flux broker -Sbroker.rundir=/run/flux -Sboot.method=config -Sboot.config_file=/flux/local/etc/flux/conf.d/boot.conf sleep inf

However, this command give me an error. broker: local address not found in tbon-endpoints array broker: invalid rank -1 size 2: Invalid argument broker: bootstrap failed

Am I missing something?

garlick commented 6 years ago

The bootstrap code is trying to find a local IP in the tbon-endpoints array and failing. Make sure IP matches a local interface?

You can also explicitly set "rank" and "size" in the config file to force it. See comment in src/broker/boot_config.h.

Both machines will need to start the broker directly using options like you have above (avoid flux-start for this).

marceloamaral commented 6 years ago

Thanks again @garlick!

My configuration was missing the rank and the size parameters; it has fixed the error.

Now I am getting the following error: _broker: fluxopen enclosing instance: No such file or directory

When stating the broker I have defined the following env variables, which is based in the ones created by the flux start command.

FLUX_CONNECTOR_PATH=/flux/local/lib/flux/connectors FLUX_WRECK_LUA_PATTERN=/flux/local/etc/wreck/lua.d/*.lua FLUX_SEC_DIRECTORY=/root/.flux LUA_PATH=/flux/local/share/lua/5.1/?.lua;/flux/flux-sched/rdl/?.lua;;; FLUX_WREXECD_PATH=/flux/local/libexec/flux/wrexecd LUA_CPATH=/flux/local/lib/lua/5.1/?.so;/flux/flux-sched/rdl/?.so;;; FLUX_MODULE_PATH=/flux/local/lib/flux/modules:/flux/flux-sched/sched FLUX_EXEC_PATH=/flux/local/libexec/flux/cmd FLUX_URI=local:///tmp/flux-zmboFD FLUX_PMI_LIBRARY_PATH=/flux/local/lib/flux/libpmi.so FLUX_RC1_PATH=/flux/local/etc/flux/rc1 FLUX_RC3_PATH=/flux/local/etc/flux/rc3

garlick commented 6 years ago

FLUX_URI is causing the error with the enclosing instance.

The other environment vars should not be required. Flux should use compiled-in values for all that. What happens if you don't set them?

I see you are starting the broker as "root"? That's probably a bit risky, although maybe not so much in your cloud environment? This version of Flux cannot switch user, so it means all jobs will run as root.

dongahn commented 6 years ago

Can we use MPICH's hydra to launch the flux instance here? My guess is that provides PMI library which would use ssh or rsh to launch processes. @twrs did this on our Sierra system.

garlick commented 6 years ago

Can we use MPICH's hydra to launch the flux instance here? My guess is that provides PMI library which would use ssh or rsh to launch processes.

I'm not sure if that would help @marceloamaral as I think he wants a standalone flux? But that's true, it's a way to launch flux without getting any slurm on you :-)

marceloamaral commented 6 years ago

@garlick It worked without the env variables

But I am having another problem now: flux submit -N2 -n2 sleep 30 submit: flux.rpc: Function not implemented

few other infos: ---- machine rank 0 flux dmesg 2018-06-28T15:05:46.645594Z broker.debug[0]: insmod connector-local 2018-06-28T15:05:56.645871Z broker.info[0]: wireup: 1/2 (incomplete) 10.0s flux module list Module Size Digest Idle S Nodeset connector-local 1550896 8B8CE6F 0 R 0 boot.conf session-id = "hype" tbon-endpoints = [ "tcp://192.2.133.181:8020", # rank 0 "tcp://192.2.133.182:8020", # rank 1 ] rank = 0 size = 2 ---- machine rank 1 flux dmesg 2018-06-28T15:17:16.302629Z broker.debug[1]: insmod connector-local flux module list Module Size Digest Idle S Nodeset connector-local 1550896 8B8CE6F 0 R 1 boot.conf session-id = "hype" tbon-endpoints = [ "tcp://192.2.133.181:8020", # rank 0 "tcp://192.2.133.182:8020", # rank 1 ] rank = 1 size = 2

It is not a real root user since I am running inside a container. The container uses a user namespace, so the container root user is actually the regular user in the bare-metal machine.

@dongahn, the standalone version would be better for me, but it is indeed helpful to know how flux is being started and how the configurations are being created in this approach.

dongahn commented 6 years ago

@marceloamaral: btw, which cloud infrastructure are you using? And what project? This would good information to have.

dongahn commented 6 years ago

For submit problrm, did you install flux-sched as well?

marceloamaral commented 6 years ago

@dongahn I am using IBM Cloud Private which uses open source Kubernetes

I have installed both the flux-core and the flux-sched, and also exported env variable (LUA* and FLUX_MODULE_PATH).

Note that, the flux start with loopback ip works for me. The problem is being to deploy on more than 1 node...

trws commented 6 years ago

Just to flesh out what's already been mentioned by Dong, hydra is a pretty easy way to get this going since all it really needs is ssh or rsh to bootstrap across nodes. The issue that you're having now usually means that sched hasn't been loaded, you can check with flux module list whether it's there or not, but my bet would be that either the rc script isn't running to load it or the load is somehow failing.

marceloamaral commented 6 years ago

So the problem is indeed that the sched module is not loaded.

When a start flux with "flux stat", I can see all the modules Module Size Digest Idle S Nodeset kvs 2059216 068BEA1 0 S 0 job 1655536 916630E 5 S 0 barrier 1525752 AC7E5B1 5 S 0 connector-local 1550896 8B8CE6F 0 R 0 content-sqlite 1531704 47B1436 5 S 0 resource-hwloc 1544296 14139FD 5 S 0 aggregator 1540736 B9BC068 5 S 0 cron 1670512 1071A1A 0 S 0 sched 547400 2C24CEA 5 S 0 userdb 1527152 BD65EAC 5 S 0

However, only running the _"flux broker -Sboot.method=config -Sboot.configfile=/flux/flux-core/etc/boot.conf" the list of modules is only Module Size Digest Idle S Nodeset connector-local 1550896 8B8CE6F 0 R 0

Is the command "flux broker" supposed to load all the modules? If yes, does it require additional configuration to do that? if not, what do I need to do to load/start those other modules?

Additionally, when I running the "flux start" the "flux hostlist" command as expected shows the local host. However, running the "flux broker" directly the "flux hostlist" command hangs and does not show anything...

@trws could you please point me to any documentation of how to use the hydra, it is not clear to me how to start it...

trws commented 6 years ago

Hydra is the standard mpiexec provided with mpich, they have some documentation on their wiki here. If you use it to launch flux broker … as though it were an MPI job you’ll get the same wireup behavior we describe with slurm. It may be worth trying that to see if it takes care of this, we may have an unknown bug in the config file setup, or perhaps something is being blocked in a way we don’t expect.

On 28 Jun 2018, at 11:08, Marcelo Carneiro do Amaral wrote:

So the problem is indeed that the sched module is not loaded.

When a start flux with "flux stat", I can see all the modules Module Size Digest Idle S Nodeset kvs 2059216 068BEA1 0 S 0 job 1655536 916630E 5 S 0 barrier 1525752 AC7E5B1 5 S 0 connector-local 1550896 8B8CE6F 0 R 0 content-sqlite 1531704 47B1436 5 S 0 resource-hwloc 1544296 14139FD 5 S 0 aggregator 1540736 B9BC068 5 S 0 cron 1670512 1071A1A 0 S 0 sched 547400 2C24CEA 5 S 0 userdb 1527152 BD65EAC 5 S 0

However, only running the _"flux broker -Sboot.method=config -Sboot.configfile=/flux/flux-core/etc/boot.conf" the list of modules is only Module Size Digest Idle S Nodeset connector-local 1550896 8B8CE6F 0 R 0

Is the command "flux broker" supposed to load all the modules? If yes, does it require additional configuration to do that? if not, what do I need to do to load/start those other modules?

Additionally, when I running the "flux start" the "flux hostlist" command as expected shows the local host. However, running the "flux broker" directly the "flux hostlist" command hangs and does not show anything...

@trws could you please point me to any documentation of how to use the hydra, it is not clear to me how to start it...

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/flux-framework/flux-core/issues/1554#issuecomment-401124205

marceloamaral commented 6 years ago

thanks @trws

Accordingly to the documentation, slurm runs flux start in all machines "srun --pty --mpi=none -N64 src/cmd/flux start"

Also, you are saying to run the "flux broker" command with hydra, which will actually run the command in the machines

But what I don't understand is that manually running it in different machines should be the same behavior, in the end it is just running the commands... isn't it?

But my problem is that only running the "flux start" it gets only the local machine, and I have updated the file local/etc/flux/conf.d/boot.conf to list two machines... On the other hand, only running the command "flux broker" does not load all the modules....

garlick commented 6 years ago

The modules are loaded when the rc1 script executes on rank 0. Maybe somehow rc1 did not get installed or it is failing some way?

grondo commented 6 years ago

@marceloamaral, two debugging commands might be helpful here:

This might show why rc1 is apparently run in one case and not the other.

It is the flux(1) command driver itself that should be setting the correct path to rc1, so it isn't immediately obvious why it didn't run in your second case.

marceloamaral commented 6 years ago

@grondo, please find the logs below:

-- with flux start FLUX_CONNECTOR_PATH=/flux/local/lib/flux/connectors FLUX_WRECK_LUA_PATTERN=/flux/local/etc/wreck/lua.d/*.lua FLUX_SEC_DIRECTORY=/root/.flux FLUX_WREXECD_PATH=/flux/local/libexec/flux/wrexecd FLUX_MODULE_PATH=/flux/local/lib/flux/modules:/flux/flux-sched/sched FLUX_EXEC_PATH=/flux/local/libexec/flux/cmd FLUX_URI=local:///tmp/flux-nGeNgm FLUX_PMI_LIBRARY_PATH=/flux/local/lib/flux/libpmi.so FLUX_RC1_PATH=/flux/local/etc/flux/rc1 FLUX_RC3_PATH=/flux/local/etc/flux/rc3

-- with flux broker it does not get a shell, but I ran it in background and exported the FLUX_URI with the created directory in /tmp as kind of expected, since I am not getting into a new shell, there is no new env variable... It might be that the flux broker command is not starting very well...

For further info, my flux broker command and config file: flux broker -Sboot.method=config -Sboot.config_file=/flux/flux-core/etc/boot.conf session-id = "hype" tbon-endpoints = [ "tcp://192.2.133.185:8020", # rank 0 "tcp://192.2.133.183:8020", # rank 1 ] rank = 0 size = 2

I have also tried to export those variables before or after starting the flux broker, but it does not change anything....

Is there a way to force flux start to load my broker configuration file?

trws commented 6 years ago

I think there may be some disconnect here, perhaps the docs are not completely up to date. When I run flux for my own testing with hydra I normally do so like this:

$ mpiexec.hydra -n <whatever> flux broker bash

This launches and bootstraps the broker and then runs a single process on the same node as the master flux broker, subsequently dropping into a shell with no pty. Slurm can offer a pty here, unfortunately hydra cannot, but you're still in a shell so you have the variables and can start a tmux or screen session or similar that can be connected to from another terminal that does have a pty.

Running start with either hydra or slurm I believe is no longer supported, it should be flux broker under a launcher like that, and while it runs the broker on every node it runs the argument to the broker on only one.

grondo commented 6 years ago

-- with flux broker it does not get a shell, but I ran it in background and exported the FLUX_URI with the created directory in /tmp

As @trws noted, the rank 0 flux-broker should spawn a $SHELL by default if you don't pass an argument to flux broker or flux start.

The fact that flux broker -Sboot.method=config -Sboot.config_file=/flux/flux-core/etc/boot.conf doesn't spawn a shell indicates a fundamental problem to me that one of the broker's doesn't know it is rank 0, or we're stuck actually attempting to execute rc1.

Running start with either hydra or slurm I believe is no longer supported, it should be flux broker under a launcher like that,

Actually I believe when running under any environment where PMI is detected, flux start will just exec flux-broker. The only difference is that broker options have to be passed through flux-start cmd via -o, --broker-opts=OPTS,..., so flux broker is possibly more convenient... (docs should maybe be updated?)

marceloamaral commented 6 years ago

Thanks @trws and @grondo, and sorry for the delay to reply.

To run flux, I am using the following command: flux broker -Sboot.method=config -Sboot.config_file=/flux/boot2.conf --setattr=log-stderr-level=10 bash If I configure the flux broker with only one IP, such as the local IP, it works. session-id = "hype" tbon-endpoints = [ "tcp://192.2.133.181:8020", # rank 0 ] rank = 0 size = 1

The problem is when I set two IPs, that is, for rack 0 and 1. In this case, the command stall and does not do anything.... session-id = "hype" tbon-endpoints = [ "tcp://192.2.133.181:8020", # rank 0 "tcp://192.2.133.183:8020", # rank 1 ] rank = 0 size = 2

Also, I have tried to run Flux with the command flux start Running flux start with --broker-opts didn't work for me as well, the command get an error... flux start -o,--setattr=log-stderr-level=7 -o, --broker-opts=-Sboot.method=config,-Sboot.config_file=/flux/boot.conf

018-07-09T13:55:46.364080Z broker.debug[0]: insmod connector-local
2018-07-09T13:55:46.364113Z broker.info[0]: wireup: 1/1 (complete) 0.0s
2018-07-09T13:55:46.364173Z broker.info[0]: Run level 1 starting
2018-07-09T13:55:46.401341Z broker.debug[0]: insmod barrier
2018-07-09T13:55:46.434091Z broker.debug[0]: insmod content-sqlite
2018-07-09T13:55:46.436636Z broker.debug[0]: content backing store: enabled content-sqlite
2018-07-09T13:55:46.470609Z broker.debug[0]: insmod kvs
2018-07-09T13:55:46.515131Z broker.debug[0]: insmod aggregator
2018-07-09T13:55:46.547727Z broker.debug[0]: insmod resource-hwloc
2018-07-09T13:55:46.564200Z broker.debug[0]: insmod job
2018-07-09T13:55:46.593905Z broker.debug[0]: insmod cron
2018-07-09T13:55:46.594616Z cron.info[0]: synchronizing cron tasks to event hb
2018-07-09T13:55:46.618104Z broker.debug[0]: insmod userdb
2018-07-09T13:55:46.624925Z resource-hwloc.debug[0]: loaded
2018-07-09T13:55:46.626392Z broker.info[0]: rc1: running /flux/local/etc/flux/rc1.d/01-enclosing-instance
2018-07-09T13:55:46.631653Z broker.info[0]: rc1: running /flux/local/etc/flux/rc1.d/02-hostlist
2018-07-09T13:55:46.781667Z broker.info[0]: rc1: running /flux/local/etc/flux/rc1.d/sched-start
2018-07-09T13:55:46.797948Z broker.debug[0]: insmod sched
2018-07-09T13:55:46.798394Z sched.info[0]: sched comms module starting
2018-07-09T13:55:46.798996Z sched.debug[0]: loaded: sched.fcfs
2018-07-09T13:55:46.799020Z sched.info[0]: sched.fcfs plugin loaded
2018-07-09T13:55:46.799035Z sched.debug[0]: LUA_PATH /flux/local/share/lua/5.1/?.lua;/flux/flux-sched/rdl/?.lua;;;
2018-07-09T13:55:46.799049Z sched.debug[0]: LUA_CPATH /flux/local/lib/lua/5.1/?.so;/flux/flux-sched/rdl/?.so;;;
2018-07-09T13:55:46.799065Z sched.info[0]: start to read resources
2018-07-09T13:55:46.810582Z sched.info[0]: resrc constructed using hwloc
2018-07-09T13:55:46.810644Z sched.info[0]: loaded resrc
2018-07-09T13:55:46.810659Z sched.info[0]: resources loaded
2018-07-09T13:55:46.811273Z sched.info[0]: events registered
2018-07-09T13:55:46.812454Z broker.info[0]: Run level 1 Exited (rc=0) 0.4s
2018-07-09T13:55:46.812472Z broker.info[0]: Run level 2 starting
2018-07-09T13:55:47.026142Z broker.err[0]: Run level 2 Exec Failure (rc=134) 231869.6s
2018-07-09T13:55:47.026272Z broker.info[0]: Run level 3 starting
**flux-broker: runlevel_set_level 2: No such file or directory**
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at src/zsys.c:471
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at src/zsys.c:472
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'REP' socket created at src/zauth.c:80
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'ROUTER' socket created at overlay.c:472
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at module.c:571
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at shmem.c:259
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at module.c:571
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at shmem.c:259
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at module.c:571
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at shmem.c:259
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at module.c:571
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at shmem.c:259
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at module.c:571
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at shmem.c:259
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at module.c:571
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at shmem.c:259
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at module.c:571
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at shmem.c:259
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at module.c:571
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at shmem.c:259
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at module.c:571
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at shmem.c:259
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at module.c:571
E: (flux-broker) 18-07-09 13:55:47 [239]dangling 'PAIR' socket created at shmem.c:259
flux-broker: src/zsock_option.inc:2906: zsock_set_sndtimeo: Assertion `rc == 0 || zmq_errno () == ETERM' failed.
Aborted (core dumped)
grondo commented 6 years ago

`2018-07-09T13:55:47.026142Z broker.err[0]: Run level 2 Exec Failure (rc=134) 231869.6s

Looks like your initial program failed to execute. It doesn't look like you set an explicit initial program, so perhaps try again with an explicit final argument to flux start, e.g. /bin/true or /bin/bash? Maybe the default fallback shell wasn't found for some reason.

garlick commented 4 years ago

Old issue, closing.