Fleetctl gets stuck sometimes when called from a subprocess

iby commented 8 years ago

I'm observing a consistently bad pattern for a couple of days now, not sure if this is a bug, but thought it's worth talking about.

I've reported today kelseyhightower/confd#346. In a nutshell, when I use fleetctl command inside a confd reload_cmd it gets blocked for good. The same command runs perfectly fine in shell and even unblocks confd if it's hanging right now.

While there's that, I also use fleetctl a lot through custom ansible modules, where I start a subprocess in python, which invokes some fleetctl commands. When I'm debugging new modules I call the same ansible command every few minutes, and 1 out of 3 times it gets stuck hanging there forever without failing, until I kill it. Normally the execution takes only a few seconds.

Those two cases seem to be the same, I assume confd also runs a subprocess with whatever command specified in reload_cmd. I tried using --debug with fleetctl but it didn't give any hints on what's going wrong, but I have a strong feeling this is related to start command. My unit files look like this.

[Unit]
Description=MongoDB (%i release) clustering service.
Requires=etcd2.service
Requires=mongo-db-node02@%i.service
After=etcd2.service
After=mongo-db-node02@%i.service
BindsTo=mongo-db-node02@%i.service

[Service]
EnvironmentFile=/etc/environment
SyslogIdentifier=%p@%i
Type=oneshot
ExecStart=/usr/bin/bash -c '…'
ExecStop=/usr/bin/bash -c '…'

[X-Fleet]
MachineID=70ee5130c69c481bb63c26e496f6f574

Is there anything in the logic that might be causing this? In particular when done in a go / python subprocess? Happy to provide any further details.

jonboulle commented 8 years ago

This is really hard to tell without further information. If you can reproduce it consistently, could you add --debug to the fleetctl commands so we can try get a little more detail?

iby commented 8 years ago

I know that it must be. That's the problem, I did try with debug option, it didn't give anything. Is there any other technique to debug it or enable debug level for logs? I'll put a separate example, but it was sort of a floater.

jonboulle commented 8 years ago

Sorry, I missed that. There's no other verbosity tweak on the client side. You could check the fleetd logs to try get more information but this sounds like a client-side thing so I'm not sure that'll help

jonboulle commented 8 years ago

Do you get no output at all with --debug when it hangs?

coreos / fleet

Fleetctl gets stuck sometimes when called from a subprocess #1381