Azure / WALinuxAgent

Microsoft Azure Linux Guest Agent
http://azure.microsoft.com/
Apache License 2.0
541 stars 372 forks source link

[BUG] cloud-init service status checking log is confusing #1772

Open yuxisun1217 opened 4 years ago

yuxisun1217 commented 4 years ago

Describe the bug: A clear and concise description of what the bug is. WALA reports these 2 logs if cloud-init-local.service is disabled, which may not work as expected: INFO Daemon Error getting cloud-init enabled status from systemctl: Command '['systemctl', 'is-enabled', 'cloud-init-local.service']' returned non-zero exit status 1. INFO Daemon Error getting cloud-init enabled status from service: Command '['service', 'cloud-init', 'status']' returned non-zero exit status 4.

Steps: Create a VM with WALA installed and check waagent.log

Distro and WALinuxAgent details (please complete the following information):

Additional context

def _cloud_init_is_enabled_systemd():
    try:
        systemctl_output = subprocess.check_output([
            'systemctl',
            'is-enabled',
            'cloud-init-local.service'
        ], stderr=subprocess.STDOUT).decode('utf-8').replace('\n', '')

        unit_is_enabled = systemctl_output == 'enabled'
    except Exception as exc:
        logger.info('Error getting cloud-init enabled status from systemctl: {0}'.format(exc))
        unit_is_enabled = False

The "systemctl is-enabled cloud-init-local.service will return 1 if status is "disabled". It will raise exception in subprocess.check_output and report 'Error getting cloud-init enabled status' which isn't correct.

Besides,

def cloud_init_is_enabled():
    unit_is_enabled = _cloud_init_is_enabled_systemd() or _cloud_init_is_enabled_service()

It seems that the "_cloud_init_is_enabled_service()" function calls "service cloud-init status" command, which seems doesn't check whether the service is enabled but check the service running status. Not sure if it is by design. If not, please consider chkconfig.

Thanks!

Br, Yuxin Sun

trstringer commented 4 years ago

Thanks for bringing this to our attention!

It seems that the "_cloud_init_is_enabled_service()" function calls "service cloud-init status" command, which seems doesn't check whether the service is enabled but check the service running status. Not sure if it is by design.

This is by design.

The "systemctl is-enabled cloud-init-local.service will return 1 if status is "disabled". It will raise exception in subprocess.check_output and report 'Error getting cloud-init enabled status' which isn't correct.

So it's the error message that you don't like? Just to be clear. Do you think something more like "Cloud-init is either not found or not enabled"?

Thanks again!

yuxisun1217 commented 4 years ago

Hi @trstringer ,

So it's the error message that you don't like? Just to be clear. Do you think something more like "Cloud-init is either not found or not enabled"? The "Error getting cloud-init enabled status" sounds like running command failed. It should not include "run command successful and return disabled" situation.

In the code there's a line: unit_is_enabled = systemctl_output == 'enabled' But it seems that only when "systemctl is-enabled cloud-init-local.service" outputs "enabled" the exit code is 0(please correct me if I'm wrong). So IMO the "systemctl_output == 'enabled' is useless here.

Thanks!

trstringer commented 4 years ago

I think the answer to that is yes and no. Looking at the man pages for systemctl you can see that the is-enabled command technically would return zero for enabled units, but there are a handful of statuses which aren't directly enabled that would still return zero (static, indirect, transient, and generated). I think in this instance it is ok to be explicit on checking the outputted status from this command.

yuxisun1217 commented 4 years ago

I think the answer to that is yes and no. Looking at the man pages for systemctl you can see that the is-enabled command technically would return zero for enabled units, but there are a handful of statuses which aren't directly enabled that would still return zero (static, indirect, transient, and generated). I think in this instance it is ok to be explicit on checking the outputted status from this command.

@trstringer Thank you so much for you explanation! Now I understand that. The only question left is that IMO the "Error getting cloud-init enabled status" is confusing. Please help to check if need to make it clearer. It's just a suggestion. Thanks again!