Azure / azure-init

A minimal provisioning agent designed for Azure Linux VMs.
MIT License
7 stars 10 forks source link

Disable provisioning with password #57

Closed dongsupark closed 4 months ago

dongsupark commented 4 months ago

Password authentication by itself not as secure as ssh key. For better security, we should disable password authentication. So if user enabled password authentication in Azure, azure-init now simply fails to provision.

Clean up unnecessary functions in libazureinit, like mount_media, unmount_media, allow_password_authentication.

Fixes https://github.com/Azure/azure-init/issues/52.

Testing done

Manual test is done.

dongsupark commented 4 months ago

Tested this PR manually in Azure, with help of the existing script demo/image_creation.sh. (Though some tweak was needed for running that script.)

A provisioning with a ssh key seems to work well, at least no regression. On that machine, a systemd unit azure-init.service runs well, with an empty password given.

I did not yet manage to actually test the case of failing when a non-empty password.

dongsupark commented 4 months ago

I did not yet manage to actually test the case of failing when a non-empty password.

Also tested the scenario, by passing --admin-password **** to az vm create command. It works as expected.

anhvoms commented 4 months ago

Tested this PR manually in Azure, with help of the existing script demo/image_creation.sh. (Though some tweak was needed for running that script.)

A provisioning with a ssh key seems to work well, at least no regression. On that machine, a systemd unit azure-init.service runs well, with an empty password given.

I did not yet manage to actually test the case of failing when a non-empty password.

You can try to deploy a VM with boot diagnostic enabled, using password. The deployment will fail in about 20 minutes due to OS Provisioning timeout (because the agent returned an error). Assuming the output from azure-init makes it to serial console (if not we should make any err/warn/into output go to console to aid with debugging) we should see the error about non-empty pasword

dongsupark commented 4 months ago

You can try to deploy a VM with boot diagnostic enabled, using password. The deployment will fail in about 20 minutes due to OS Provisioning timeout (because the agent returned an error). Assuming the output from azure-init makes it to serial console (if not we should make any err/warn/into output go to console to aid with debugging) we should see the error about non-empty pasword

Thanks for the tip. It turns out azure-init simply returned success even on failure, so azure-init systemd unit looked ok. With the latest commit I pushed, now azure-init systemd unit fails when it should fail.

However, provisioning itself seems to run even when the systemd unit failed. That's why I could not see such a scenario of timeout with 20 min upon provisioning failure. I assume it has something to do with the issue you created a few days ago, right? I think that's all I can do for now with this PR.

anhvoms commented 4 months ago

You can try to deploy a VM with boot diagnostic enabled, using password. The deployment will fail in about 20 minutes due to OS Provisioning timeout (because the agent returned an error). Assuming the output from azure-init makes it to serial console (if not we should make any err/warn/into output go to console to aid with debugging) we should see the error about non-empty pasword

Thanks for the tip. It turns out azure-init simply returned success even on failure, so azure-init systemd unit looked ok. With the latest commit I pushed, now azure-init systemd unit fails when it should fail.

However, provisioning itself seems to run even when the systemd unit failed. That's why I could not see such a scenario of timeout with 20 min upon provisioning failure. I assume it has something to do with the issue you created a few days ago, right? I think that's all I can do for now with this PR.

I think we should investigate why provisioning didn't timeout. Likely there's another entity/agent in the VM that was reporting health. It would be good to know what it is and if it is expected. We might want to remove such noise in our testing. It shouldn't block this PR, however. See #61