Closed valexby closed 3 weeks ago
Base on LP1947585, the workaround solution is sudo systemctl restart ceilometer-agent
if it's not active.
Comment from @valexby
Without this one fixed on cou side or backported on nova-compute side, managed solutions will face this issue about ~500 nodes * 4 openstack releases = ~ 2000 times during future upgrades
it looks to me like LP#1947585 was backported all the way back to ussuri, but somehow the fix isn't working for older releases. I think we should just add the workaround within COU.
There is a limitation in implementation: How can I know there is a ceilometer-agent unit relate to nova-compute as a subordinate in COU?
The subordinate information is missing when we transform the origin juju status data into COU's Application class. This create a awkward situation that I am not able to confirm if the ceilometer-agent unit is there in the same machine.
There will be two options:
Now I prefer option 1 because:
Any feedback is welcome. I would start the implementation next Tuesday(6/11)
The subordinate information is missing when we transform the origin juju status data into COU's Application class.
Is anything stopping us from adding the subordinate information?
Your argument around going for option 1 makes sense. :+1: It seems a little strange to me that logic for controlling the services is spread over the machine charm and the subordinate, but that's how it is I guess. :thinking:
Hi,
Because of the known bug LP1947585
ceilometer-agent-compute
might be down in many cases after nova-compute release upgrade. Cou fails to complete upgrade in such a case, as the final jujuresume
action fails.Given that the bugfix for nova-compute wasn't back-ported to Ussuri-Wallaby and the nova bug hasn't had any activity for a year, maybe we could make cou starting
ceilometer-agent
after the nova release upgrade if it is down. That is a natural thing to do for a human-operator upgrading a cloud.