coreos / fleet

fleet ties together systemd and etcd into a distributed init system
Apache License 2.0
2.42k stars 302 forks source link

Fleet hangs up during start command with STDERR to STDOUT redirection #1364

Open iby opened 9 years ago

iby commented 9 years ago

Fleet hangs up and doesn't timeout while trying to start a service with STDERR to STDOUT redirection. This happens with a unit that has a misconfigured MachineID parameter or %m. When I change it to a proper machine id, it works as expected. I need the redirection to keep things simple inside python.

fleetctl start confd@node01.service 2>&1

If I do this without redirection, it simply quits without any messages. Doing this on stable version 0.10.2.

# confd@.service

[Unit]

Description = Confd service.

Requires = etcd2.service
Requires = docker.service

After = etcd2.service
After = docker.service

[Service]

# Let the process take awhile to start up (for first run Docker containers) and change
# killmode from "control-group" to "none" to let Docker remove work correctly.

TimeoutStartSec = 5
KillMode = none
EnvironmentFile = /etc/environment

ExecStartPre = /usr/bin/docker ps --all etcd | xargs /usr/bin/docker rm --force
ExecStart = /usr/bin/docker run \
                --name 'confd' \
                --net 'host' \
                --volume '/docker/confd/configuration.toml:/etc/confd/confd.toml' \
                --volume '/docker/confd/configuration:/etc/confd/conf.d' \
                --volume '/docker/confd/output:/var/confd/output' \
                --volume '/docker/confd/template:/etc/confd/templates' \
                ianbytchek/confd

ExecStop = /usr/bin/docker stop confd

[X-Fleet]

MachineID = %m

Also, when I try this manually for the first time fleetctl goes into endless loop, this happens on stable-alpha. It really seems that %m parameters doesn't work as expected.

$ fleetctl --debug start confd@node01.service
2015/09/28 17:42:23 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json
2015/09/28 17:42:23 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json 200 OK
2015/09/28 17:42:23 DEBUG fleetctl.go:605: Found Unit(confd@node01.service) in Registry, no need to recreate it
2015/09/28 17:42:23 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json
2015/09/28 17:42:23 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json 200 OK
2015/09/28 17:42:23 DEBUG fleetctl.go:715: Setting Unit(confd@node01.service) target state to launched
2015/09/28 17:42:23 DEBUG http.go:28: HTTP PUT http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json
2015/09/28 17:42:23 DEBUG http.go:31: HTTP PUT http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json 204 No Content
2015/09/28 17:42:23 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json
2015/09/28 17:42:23 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json 200 OK
2015/09/28 17:42:23 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json
2015/09/28 17:42:23 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json 200 OK
2015/09/28 17:42:24 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json
2015/09/28 17:42:24 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json 200 OK
2015/09/28 17:42:24 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json
2015/09/28 17:42:24 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json 200 OK
2015/09/28 17:42:25 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json
2015/09/28 17:42:25 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json 200 OK
2015/09/28 17:42:25 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json
2015/09/28 17:42:25 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json 200 OK
2015/09/28 17:42:26 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json
2015/09/28 17:42:26 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json 200 OK
2015/09/28 17:42:26 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json
2015/09/28 17:42:26 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json 200 OK
2015/09/28 17:42:27 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json
2015/09/28 17:42:27 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json 200 OK
2015/09/28 17:42:27 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json
2015/09/28 17:42:27 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json 200 OK

When I run it for the second time:

$ fleetctl --debug start confd@node01.service
2015/09/28 17:43:09 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json
2015/09/28 17:43:10 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json 200 OK
2015/09/28 17:43:10 DEBUG fleetctl.go:605: Found Unit(confd@node01.service) in Registry, no need to recreate it
2015/09/28 17:43:10 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json
2015/09/28 17:43:10 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/units/confd%40node01.service?alt=json 200 OK
2015/09/28 17:43:10 DEBUG fleetctl.go:711: Unit(confd@node01.service) already launched, skipping.

At the same time it cannot find any units…

$ fleetctl --debug list-units     
2015/09/28 18:06:01 DEBUG http.go:28: HTTP GET http://domain-sock/fleet/v1/state?alt=json
2015/09/28 18:06:01 DEBUG http.go:31: HTTP GET http://domain-sock/fleet/v1/state?alt=json 200 OK
UNIT    MACHINE ACTIVE  SUB
iby commented 9 years ago

Looks like might be related to #964?

jonboulle commented 8 years ago

I'm a bit confused by MachineID = %m - can you help me understand what you're trying to achieve/expect that to do?

We only support a subset of substitution parameters in the X-Fleet section - see here. Perhaps we could update that doc to clarify?

At the same time it cannot find any units…

In case you haven't figured out yet - list-unit-files is what you're after. list-units will show the state of scheduled units, of which there are none in this case because the unit will never successfully be scheduled.

jonboulle commented 8 years ago

BTW, you might be interested in this native Python client for the fleet API: https://github.com/cnelson/python-fleet

iby commented 8 years ago

The fleet library looks interesting, will dig into that later, thanks!

With MachineID = %m in X-Fleet I expected fleet to launch the instance / unit on the machine I'm invoking the command. I attempted to use that when just started working with fleet, then I realised that it probably doesn't make much sense because machine id becomes known after fleet decides what host the unit should be deployed to.

Yet, I didn't figure out how to launch a unit instance on a particular machine. Yes, I can just use systemd for that, but I don't want to deal with systemd, I want to use fleet, I use it for all container launching, it's great as a centralised command point and I don't want to deviate into another tool.

My current solution involves patching MachineID = %m with /etc/machine-id value after the unit file gets deployed to a host by Ansible. This enables fleet to launch that unit on the current machine. It'd be great if we could tell fleet to launch units on local host by simply giving it MachineID = %m, think it makes perfect sense given we can use explicit machine ids.