apache / trafficcontrol

Apache Traffic Control is an Open Source implementation of a Content Delivery Network
https://trafficcontrol.apache.org/
Apache License 2.0
1.07k stars 344 forks source link

Grovetcconfig error when grove service isn't running. #3167

Open jhg03a opened 5 years ago

jhg03a commented 5 years ago

Found while doing a clean install of grove and initial run of grovetcconfig before grove is started for the first time.

From grovetcconfig:

"2018-12-27T05:29:40.443771743Z Config updates are required to '/etc/grove/grove.cfg'",
"2018-12-27T05:29:40.564246709Z Error restarting grove service (but successfully updated config file): exit status 1"

note in this case the exit code is 2

Which can be observed manually with:

sudo service grove reload
Reloading grove configuration (via systemctl):  Job for grove.service invalid.
                                                           [FAILED]

Also observed:

Dec 27 05:32:53 cdn-ec-01 systemd[1]: Unit grove.service cannot be reloaded because it is inactive.
jrushford commented 5 years ago

@jhg03a So after this has happened and you then start grove, can you run 'service grove reload' or re-run grovetccfg successfully? The issue is in the rc script, /etc/init.d/grove. If grove isn't running the reload function calls a 'failure' function that marks the failure.

jhg03a commented 5 years ago

If you start the grove service it works, but that's still not ok. grovetcconfig shouldn't try to reload if the service isn't started, nor should it try to start. If it's stopped, it's likely for a reason. In my case it was because the initial install wasn't complete yet. I'm specifically interested in having an exit code of 0 if things worked correctly; which entails trying to HUP only if appropriate otherwise you get errors back from systemd.

rob05c commented 5 years ago

See https://github.com/apache/trafficcontrol/pull/3195

jhg03a commented 5 years ago

To summarize a out of band conversation: The main concern is the incongruity between the error message and the exit code coupled with the lack of documentation. In this case an error code of 2 means only the reload failed, which in the case of a never started grove is expected. Additional documentation as consts inside the code and a more clear error message is sufficient. The PR #3195 goes one step further by exposing a switch that can be passed to ignore the reload entirely, thus making any form of automation capable of only caring about exit code 0.