Azure / deployment-stacks

Contains Deployment Stacks CLI scripts and releases
MIT License
87 stars 6 forks source link

Weird issues with MySQL replicas #93

Closed slavizh closed 1 year ago

slavizh commented 1 year ago

Describe the bug With Mysql you can create server and replica server. The problem is that replica server when deployed via bicep can be created the first time, if you try to deploy it the second time the resource will fail. This is described here: https://learn.microsoft.com/en-us/azure/templates/microsoft.dbformysql/servers?pivots=deployment-language-bicep. And of course if you want to have successful deployment you have to remove the resource afterwards. And of course in that situation stacks will delete the replica server. Probably the only way to overcome this situation is to deploy the Mysql server and the replica first in detach mode. Remove the replica server from the configuration and re-deploy in detach mode so the replica can be detached rather deleted.

I did that scenario and while doing I have stumbled upon the following issues:

Issue 1: I did try to remove the replica and this re-deploy the stack in delete mode. I got the following output:

DeploymentId                : /subscriptions/<sub id/providers/Microsoft.Resources/deployments/lz-mysql-database-2023-02-16-14-33-23-8bd0c
Resources                   : /subscriptions/<sub id/resourceGroups/lz-mysql-accp
                              /subscriptions/<sub id/resourceGroups/lz-mysql-accp/providers/Microsoft.DBforMySQL/servers/lz-mysql023
                              /subscriptions/<sub id/resourceGroups/lz-mysql-accp/providers/Microsoft.DBforMySQL/servers/lz-mysql023/securityAlertPolicies/Default
                              /subscriptions/<sub id/resourceGroups/lz-mysql-accp/providers/Microsoft.DBforMySQL/servers/lz-mysql02-rep2/securityAlertPolicies/Default
DeletedResources            : /subscriptions/<sub id>/resourceGroups/lz-mysql-accp/providers/Microsoft.DBforMySQL/servers/lz-mysql02-rep2

Notice the strange thing that the replica server is deleted but the child resource securityAlertPolicies is not reported as deleted. That might be a bug in stacks as I do not know how a child resource can exists if the parent is not there.

Issue 2: Also when I did re-deploy of the stack after the replica server was deleted without any change in the configuration I got this error:

ResourcesCleanupAction      : delete
ResourceGroupsCleanupAction : delete
DenySettingsMode            : denyDelete
Location                    : westeurope
CreationTime(UTC)           : 16.2.2023 г. 13:30:14
DeploymentId                : /subscriptions/<Sub Id>/providers/Microsoft.Resources/deployments/lz-mysql-database-2023-02-16-14-44-22-26894
Resources                   : /subscriptions/<Sub Id>/resourceGroups/lz-mysql-accp
                              /subscriptions/<Sub Id>/resourceGroups/lz-mysql-accp/providers/Microsoft.DBforMySQL/servers/lz-mysql023
                              /subscriptions/<Sub Id>/resourceGroups/lz-mysql-accp/providers/Microsoft.DBforMySQL/servers/lz-mysql023/securityAlertPolicies/Default
                              /subscriptions/<Sub Id>/resourceGroups/lz-mysql-accp/providers/Microsoft.DBforMySQL/servers/lz-mysql02-rep2/securityAlertPolicies/Default
FailedResources             : {
                                id: /subscriptions/<Sub Id>/resourceGroups/lz-mysql-accp/providers/Microsoft.DBforMySQL/servers/lz-mysql02-rep2/securityAlertPolicies/Default
                                error: The Resource 'Microsoft.DBforMySQL/servers/lz-mysql02-rep2' under resource group 'lz-mysql-accp' was not found. For more details please go to [https://aka.ms/ARMResourceNo](https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2FARMResourceNo&data=05%7C01%7C%7Ca62fb55abe4e4a0dea7008db16993eec%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638128619678067407%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Psclm7G8emSXHPlHe9p64E3m%2Bs1%2BrXDUwGuUbEjy0UY%3D&reserved=0)
                              tFoundFix
                              },
                              {
                                id: /subscriptions/<Sub Id>/resourceGroups/lz-mysql-accp/providers/Microsoft.DBforMySQL/servers/lz-mysql02-rep2/securityAlertPolicies/Default
                                error: The Resource 'Microsoft.DBforMySQL/servers/lz-mysql02-rep2' under resource group 'lz-mysql-accp' was not found. For more details please go to [https://aka.ms/ARMResourceNo](https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2FARMResourceNo&data=05%7C01%7C%7Ca62fb55abe4e4a0dea7008db16993eec%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638128619678067407%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Psclm7G8emSXHPlHe9p64E3m%2Bs1%2BrXDUwGuUbEjy0UY%3D&reserved=0)
                              tFoundFix
                             }
Error                       : DeploymentFailed - At least one resource deployment operation failed. Please list deployment operations for details. Please see [[https://aka.ms/arm-deployment-operations](https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2Farm-deployment-operations&data=05%7C01%7C%7Ca62fb55abe4e4a0dea7008db16993eec%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638128619678067407%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=cQPc2F064led2Q6CuD%2BRdw9Kn0pFxzPqt8yT3v77A8c%3D&reserved=0)](https://aka.ms/ARMResourceNo) for usage
                               details.

Issue 3: I even got error trying to delete the deployment stacks Remove-AzSubscriptionDeploymentStack: Long running operation failed with status 'failed'. Additional Info:'One or more stages of deploymentStack deletion failed. Correlation id: 'ec4cb327-6161-442e-b9ae-0bbc914bc57f''

Re-running the remove command again finally remove it. So, it seems in that situation things become ugly.

if you need the templates or more info let me know and I can provide those in private.

dantedallag commented 1 year ago

Hey @slavizh, looking into this more. When you ran into issue 1, were there any entity errors associated with the stack?

slavizh commented 1 year ago

@dantedallag I think not but I will reproduce and let you know to be 100% sure.

slavizh commented 1 year ago

I think I have found the main issue that causes the two issues. It is related to the template's logic. The second time when we set the deployed to true the replica server resource (Microsoft.DBforMySQL/servers) is not deployed but the child resource (Microsoft.DBforMySQL/servers/securityAlertPolicies) is still deployed so we can have idempotency on that resource at least. At the end deployment goes fine but the resource Microsoft.DBforMySQL/servers/securityAlertPolicies is still present as managed resource where after the deployment Microsoft.DBforMySQL/servers is removed. So this explains issue 1 and after that I think issue 2 is caused mostly because resource Microsoft.DBforMySQL/servers/securityAlertPolicies is not present and cannot be deleted. There also could be some deny resources that cannot be deleted or something.

In summary the issue is mostly caused by how the template is written but also that brings a question that may be documentation for deployment stacks should advise to be careful when including child resources without the parent and using DenyDelete and DeleteAll modes as if someone deletes the parent resource could result in problems with the stack afterwards.

Another thing that I have noticed if the deployment stack is managing resource Microsoft.DBforMySQL/servers/securityAlertPolicies and not managing the parent, shouldn't I not be able to delete the parent. I have tried and I successfully deleted the parent even when -DenySettingsMode DenyDelete -DenySettingsApplyToChildScopes were applied and should apply to the child resource.

Not sure if deployment stack shouldn't somehow have some logic to prevent this to happen in first by failing in the beginning - meaning that if it finds that you will delete parent resource but child resources for that parent will stay in managed.

Specifically for your question. It does not gives error.

dantedallag commented 1 year ago

Here is my response to your first 3 initial issues, in the context of the latest comment you provided:

@slavizh Yes, I think you are correct. I looked into the problem, and I do think that there a stacks bug here where child resources can continue to exist in managed resources after the parent resource is deleted. I put in an issue for a fix for this. Expected behavior should be that the child resource is also marked as deleted, as it will be deleted when the parent is deleted.

For Issue 2, I think that the failure is on deployment and not deletion (lz-mysql02-rep2/securityAlertPolicies still exists in the template and trying to deploy it fails, because the parent is deleted). This is expected behavior, considering the issue mentioned above that got us to this state.

Issue 3 may be caused by a attempting to remove a deny assignment for a resource that doesn't exist. I need to spend a little more time on this one...

dantedallag commented 1 year ago

Another thing that I have noticed if the deployment stack is managing resource Microsoft.DBforMySQL/servers/securityAlertPolicies and not managing the parent, shouldn't I not be able to delete the parent. I have tried and I successfully deleted the parent even when -DenySettingsMode DenyDelete -DenySettingsApplyToChildScopes were applied and should apply to the child resource.

@snarkywolverine do you have any thoughts on this specific deny assignments scenario?

slavizh commented 1 year ago

I think using -DenySettingsMode DenyDelete -DenySettingsApplyToChildScopes you should not be able to delete the parent as the child resources should have deny assignments. Imagine situation like we have SQL logical server that is managed by one stack and Database for that SQL logical server is managed by another. If I delete the SQL logical server that would delete the SQL databases which should not be the case.

If I understand correctly if you use only -DenySettingsMode DenyDelete, deny assignments will be done on the top level resources only in the deployed resources, where if I use -DenySettingsMode DenyDelete -DenySettingsApplyToChildScopes, deny assignments will be done on the child resources as well. Or may be I am wrong on this assumption and the description for property DenySettingsApplyToChildScopes (Deny settings will be applied to child Azure management scopes) refers about only tenant, management group, subscription and resource groups scopes.

dantedallag commented 1 year ago

@slavizh Have you had a chance to try out this scenario with the newest release?

slavizh commented 1 year ago

@dantedallag haven't had a chance but proceed to closing this one. If I see any issues I will open new issue based on the latest release.