awslabs / aws-greengrass-labs-certificate-rotator

Greengrass component and companion cloud backend for rotating the core device certificate and private key
Apache License 2.0
12 stars 2 forks source link

[Feature request] If unable to rotate, do not restart #61

Open IK-Adrian opened 1 month ago

IK-Adrian commented 1 month ago

If the rotation process fails, the rotator shouldn't restart Greengrass and see if miraculously everything worked. Instead, it should probably fail the job.

https://github.com/awslabs/aws-greengrass-labs-certificate-rotator/blob/573a0f3d755b629cbd67160acef5f3d09879a3f4/artifacts/state_creating_certificate.py#L11-L20

gregbreen commented 1 month ago

It's not letting it restart to "see if miraculously everything worked". The rotate() method should never fail. It is essentially an insane failure, because the rotate() method should not suffer exceptions in practice. If it does fail, this on_rx_message() doesn't know how far it progressed before it failed. It's similar to rotate() being interrupted by an unexpected loss of power. You may have seen some comments in the code about losing power at "an inopportune moment". The component is designed to recover from a spontaneous reset or power loss during the rotate() method. I'm letting it restart here, to go through the same recovery process, rather than build special recovery handling for an exception that should never occur in practice.

Anyway, you're right that it would be better to fail the job right here, and take the recovery actions right here. Low priority, but I might do it.

IK-Adrian commented 1 month ago

I understand that this is a almost-never-happens case, but taking into account that by this point we know something went wrong, restarting the device when not needed may just be a loss of service without any real reason for it. Maybe using a less broad aproach for the try ... except ... block can allow the rotate method itself to go back to a valid state before returning, and therefore we can just call fail the job here and go to StateIdle.