Azure / azure-monitor-baseline-alerts

Azure Monitor Baseline Alerts
MIT License
126 stars 182 forks source link

[Question/Feedback]: Editing Action group gives API error #268

Closed woutermation closed 4 days ago

woutermation commented 1 week ago

Check for previous/existing GitHub issues

Description

When I try to edit an action group the following error is shown:

The api-version query parameter (?api-version=) is required for all requests. (Code: MissingApiVersionParameter)

Deployment ran via latest Parameter file in Azure Devops via:

az deployment mg create --template-uri https://raw.githubusercontent.com/Azure/azure-monitor-baseline-alerts/main/patterns/alz/alzArm.json --location $(location) --management-group-id $(ManagementGroupPrefix) --parameters azure/management/amba.json --verbose

As it's a brownfield environment I've run the following commands afterwards:

.\Start-AMBARemediation.ps1 -managementGroupName $managementManagementGroup -policyName Alerting-Management
.\Start-AMBARemediation.ps1 -managementGroupName $connectivityManagementGroup -policyName Alerting-Connectivity
.\Start-AMBARemediation.ps1 -managementGroupName $identityManagementGroup -policyName Alerting-Identity
.\Start-AMBARemediation.ps1 -managementGroupName $LZManagementGroup -policyName Alerting-LandingZone
.\Start-AMBARemediation.ps1 -managementGroupName $pseudoRootManagementGroup -policyName Alerting-ServiceHealth
.\Start-AMBARemediation.ps1 -managementGroupName $pseudoRootManagementGroup -policyName Notification-Assets

Cleanup script fails also when trying to remove the settings, only option to remove the AG is to remove the complete Resource Group.

This happens on multiple tenants, am I doing something wrong?

Brunoga-MS commented 1 week ago

Hello @woutermation , thanks for your feedback. I am looking at now trying to repro your issue. I will update you as soon as possible.

Thanks, Bruno.

Brunoga-MS commented 1 week ago

Hello @woutermation , I tried to repro your issue but did not get any error. I was able to change the ag-AMBA-SH*** action group by editing existing configuration an by adding new notifications. Could you please describe more in details what were to trying to do?

Thanks, Bruno.

woutermation commented 1 week ago

Hi @Brunoga-MS, Thanks for the reply, if ran these commands on 2 tenants, both with the same result. Both are ALZ-bicep deployments, I'll try to test this later today on a dev/test sub with single management group etc.

woutermation commented 1 week ago

image

Brunoga-MS commented 1 week ago

Hi @woutermation , is this happening as a result of Azure Landing Zone deployment using the the ALZ portal accelerator of just AMBA-ALZ using one of the documented deployment methods? We still do not have bicep as deployment method in AMBA-ALZ.

Thanks, Bruno.

woutermation commented 1 week ago

Hi Bruno, The environments are both deployed with https://github.com/azure/alz-bicep and have multiple subscriptions etc. Deployment of AMBA done via https://azure.github.io/azure-monitor-baseline-alerts/patterns/alz/deploy/Deploy-with-Azure-Pipelines/ and only set

ALZMonitorActionGroupEmail
ALZMonitorResourceGroupLocation = "westeurope"
managementSubscriptionId
*ManagementGroup to reflect my MG groups
Brunoga-MS commented 1 week ago

@woutermation: this is really strange and unexpected and need more deeper investigation. I'll keep you posted as soon as I have news.

Thanks, Bruno.

woutermation commented 1 week ago

Happy to share more verbose logging or do a call/screenshare if needed.

Brunoga-MS commented 1 week ago

It would be great to have verbose logging. Please share ...

woutermation commented 1 week ago

Hi Bruno, Did some more deep-diving:

I first cleaned the whole environment and then redeployed, when the suppression rule remediation kicks in it also fails:

image

{
    "status": "Failed",
    "error": {
        "code": "DeploymentFailed",
        "message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.",
        "details": [
            {
                "code": "BadRequest",
                "message": "{\r\n  \"error\": {\r\n    \"code\": \"InvalidResourceNameFormat\",\r\n    \"message\": \"At least one resource name segment is invalid according to the Resource Provider specification.\"\r\n  }\r\n}"
            }
        ]
    }
}

What's the best way to share some more logging without me having to blur all my (customer) subscription IDs

Brunoga-MS commented 1 week ago

Hi @woutermation , Unfortunately there's no other way to send logging info without sharing too much (unless you obscure the critical and sensitive data). However, I did some more investigation and tried to repro your issue but I haven't got any failure. Looking at your first error, I suspect it was a temporary portal issue. To make sure my suspects are correct, I would ask you the following:

  1. clean the environment
  2. deploy amba, making sure you set all the parameters of your need
  3. once deployed, ensure all the deployment are green. To check for it, go to your pseudoRootManagementGroup and look at the deployments. You should see something like this:
image
  1. Once this is done, and once the policy compliance shows not-compliant, run the remediation using the remediation script. ensure the remediation completes successfully and then try to edit the action group

Is there any direct contact of yours that you can share?

Thanks, Bruno.

woutermation commented 6 days ago

Hi @Brunoga-MS

  1. Done
  2. Done
  3. Deployments are all green
  4. Previously I didn't wait for the compliance to show up as not-compliant but just ran the remediation immediately so I had good hopes this time

But still the same error, as this is a brown-field environment I need the remediation. I'll start with a clean deployment again and wait for the compliance to finish, then I will install a new Storage Account or something and check what that does.

You can contact me on my business teams; * remove email

Let you know what a clean/new resource does as soon as possible

Regards, Wouter

woutermation commented 6 days ago

I'm seeing the alert being created when creating a new resource but the Action Groups are only created when running the remediation, and then I'm still hitting the API error. Deploy Azure Monitor Baseline Alerts for Service Health is being remediated, all green ticks, I'm getting all the emails that I'm added to the group etc but still not able to edit the action group :-(

What am I doing wrong... Can it be something with an existing policy or something that is conflicting?

woutermation commented 5 days ago

Hi @Brunoga-MS , Thanks for the call yesterday, did a deployment with PowerShell and my admin account, instead of Pipeline and AppReg, same outcome. Did a fresh deployment on my PAYG azure environment, 1 MG etc, all working as expected.

So it's probably a policy that is set via the alz-bicep landingzone deployment (as we expected), now the only step is to find out which one as the errors are a bit unclear ;-)

Brunoga-MS commented 5 days ago

Hello @woutermation , thanks for confirming that the code is fine. One question: could you please check and let me know about the subscription offer you are using for your customer? Are they the classic offerings (Azure Plan, Enterprise Agreement, etc.) or CSP?

Thanks, Bruno.

woutermation commented 5 days ago

Hi @Brunoga-MS , Both customers have multiple CSP subscription(s), my own subscription was MSDN based. I'm currently disabling one policy at the time and cleanup/re-deploy everything, hopefully it's not the last one that's the troublesome.

woutermation commented 4 days ago

Finally found the problem, when using a # in the subscription name the action groups and the action processing rules will fail installing during remediation.

Brunoga-MS commented 4 days ago

Happy that you found out the problem and the solution. As per your reference we have the Naming rules and restrictions for Azure resources public documentation where you can find supported characters for all resource name. To proactively help others that might fall in to the same situation, I am going to add this to our AMBA documentation Given that, I will close this one as resolved.

Thanks once again for your precious help in investigating this issue.

Bruno.