Open cedricbraekevelt opened 2 years ago
This looks to be an issue with the resource provider not being able to handle the simultaneous request. I'd recommend opening a support case to have them look into it, but in the meantime, you can add a @batchSize()
decorator to the resource to not do everything all at once like so:
@batchSize(20)
resource logAlerts 'Microsoft.Insights/scheduledQueryRules@2021-08-01' = [for logAlert in logAlertsArray: { ... } ]
Would be curious to see if that solves it.
I have seen resources where they cannot handle parallel resource creation/updating after certain number. In most cases these are not documented and unlikely they will be fixed if they are not blocking issue for quite large number of customers. There are even resource that does not allow more than 1 resource of the same being created/updated at the same time. As always the fix is what Alex recommends and potentially those teams to document these limits either in Azure limits document for all services or in the ARM/API docs for the RP.
Having a similar issue whilst deploying 100+ analytic rules with a loop. Did @batchSize(1) just to see if this helps but unfortunately not :(. @alex-frankel
Leaving a note as I'm investigating if this is my doing in the code somewhere during preprocessing of input data but for now I wish I had more info in the error message.
EDIT: By shrinking the input to only 10 records I was able to get a deployment to go through (well fail with an actual error message), so I guess batchsize is not going to help here. One workaround I guess would be to split deployment into smaller parts which is not optimal for IaaC.
Updated the title to make this more discoverable
@Kaloszer had shared below document. Which states the limit of 50 in a deployment.
49 (due to AR limitation of max 50 at a time! NB!: https://learn.microsoft.com/en-us/azure/sentinel/import-export-analytics-rules#:~:text=You%20can%20import%20up%20to%2050%20analytics%20rules%20from%20a%20single%20ARM%20template%20file.)
See at the bottom:
There may be a way to use a Module to work around the limit. A module is equal to a deployment, each time it is called. If you can pass in < 50 each iteration of the Module.
maybe defer to @anthony-c-martin on a better way to slice() this array into smaller chunks or create a better lamda?
I think my math works on this one, as below.
param maxSubListSize int = 3
var list = [
{
name: 'Evie'
age: 5
interests: ['Ball', 'Frisbee']
}
{
name: 'Casper'
age: 3
interests: ['Other dogs']
}
{
name: 'Indy'
age: 2
interests: ['Butter']
}
{
name: 'Kira'
age: 8
interests: ['Rubs']
}
{
name: 'IndyDad'
age: 10
interests: ['Butter']
}
{
name: 'KiraMum'
age: 12
interests: ['Rubs']
}
]
var listLength = length(list)
var startIndexes = filter(range(0,listLength), item => item % maxSubListSize == 0)
var chunks = [for item in startIndexes: listLength >= item + maxSubListSize ? range(item, maxSubListSize) : range(item, listLength % maxSubListSize)]
module group 'foo2.bicep' = [for (items, index) in chunks: {
name: 'group-${index}'
params: {
myArray: [for (item, index) in items: list[item] ]
}
}]
output startIndexes array = startIndexes
output chunks array = chunks
output mychunks array = [for (items, index) in chunks: group[index].outputs.myArray]
param myArray array
output myArray array = myArray
I tried this out with up to 320 rules at once and I saw some 429's aka throttling, however they retried within the deployment, so I couldn't get it to fail.
Adding sample code anyway to split up the deployments into batches of 50 which is documented limit.
param maxSubListSize int = 50
var ruleCount = 320
var ruleNameBase = 'testRule'
var ruleDefaultsTest = {
location: 'eastus'
alertDescription: 'New alert created via template'
alertSeverity: 3
isEnabled: true
resourceId: resourceGroup().id
query: 'AzureActivity | where OperationName == "Validate Deployment" | where Level == "Error"'
metricMeasureColumn: 'AggregatedValue'
operator: 'GreaterThan'
threshold: '25'
timeAggregation: 'Count'
}
var rules = [for (item, index) in range(1, ruleCount): union({alertName: 'ruleNameBase${item}'},ruleDefaultsTest)]
var listLength = length(rules)
var startIndexes = filter(range(0,listLength), item => item % maxSubListSize == 0)
var chunks = [for item in startIndexes: listLength >= item + maxSubListSize ? range(item, maxSubListSize) : range(item, listLength % maxSubListSize)]
module group 'scheduledQuery.bicep' = [for (items, index) in chunks: {
name: 'group-${index}'
params: {
alerts: [for (item, index) in items: rules[item] ]
}
}]
output startIndexes array = startIndexes
output chunks array = chunks
// output mychunks array = [for (items, index) in chunks: group[index].outputs.alerts]
// output TestRules array = rules
output TestRulesLength int = length(rules)
param alerts array
// defaults
param autoMitigate bool = false
param checkWorkspaceAlertsStorageConfigured bool = false
param resourceIdColumn string = 'id'
param numberOfEvaluationPeriods int = 1
param minFailingPeriodsToAlert int = 1
param windowSize string = 'PT1H'
param evaluationFrequency string = 'PT5M'
param muteActionsDuration string = 'PT5M'
resource queryRule 'Microsoft.Insights/scheduledQueryRules@2021-08-01' = [for alert in alerts : {
name: alert.alertName
location: alert.location
tags: {}
properties: {
description: alert.alertDescription
severity: alert.alertSeverity
enabled: alert.isEnabled
scopes: [
alert.resourceId
]
evaluationFrequency: evaluationFrequency
windowSize: windowSize
criteria: {
allOf: [
{
query: alert.query
// metricMeasureColumn: alert.metricMeasureColumn
// resourceIdColumn: resourceIdColumn
dimensions: []
operator: alert.operator
threshold: alert.threshold
timeAggregation: alert.timeAggregation
failingPeriods: {
numberOfEvaluationPeriods: numberOfEvaluationPeriods
minFailingPeriodsToAlert: minFailingPeriodsToAlert
}
}
]
}
muteActionsDuration: muteActionsDuration
autoMitigate: autoMitigate
checkWorkspaceAlertsStorageConfigured: checkWorkspaceAlertsStorageConfigured
actions: {
actionGroups: [
// actionGroupId
]
customProperties: {
key1: 'value1'
key2: 'value2'
}
}
}
}]
output alerts array = alerts
Example of deployment.
deployment chunks with 50 each..
[
[
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40, 41, 42, 43, 44, 45, 46, 47, 48, 49
],
[
50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87,
88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99
],
[
100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114,
115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144,
145, 146, 147, 148, 149
],
[
150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
195, 196, 197, 198, 199
],
[
200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214,
215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229,
230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244,
245, 246, 247, 248, 249
],
[
250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264,
265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279,
280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294,
295, 296, 297, 298, 299
],
[
300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314,
315, 316, 317, 318, 319
]
]
@Bnetworx let me know if you are still interested to try this, I know it was from January. Given the docs mention max of 50, I am not sure if this is a bug. However in saying that it seems like they may have improved SLA for this. Either way I would just add the batching of deployments, then you can control to easily modify in the future or just leave with 50 per deployment.
@brwilkinson I've now checked again, and was able to deploy 54 AR's without chunking, however going higher than that validation would timeout and I'd get presented with:
New-AzResourceGroupDeployment: 13:57:37 - Error: Code=; Message=The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
New-AzResourceGroupDeployment: 13:57:37 - Error: Code=; Message=A task was canceled.
New-AzResourceGroupDeployment: 13:57:37 - Error: Code=; Message=A task was canceled.
After implementing the chunking you've provided I was able to run the deployment fully without cutting out any ARs, however I still got occasional rate limit errors:
Status Message: Rate limit of 200 per 30 seconds is exceeded (Code:BadRequest)
Status Message: Rate limit of 200 per 30 seconds is exceeded (Code:BadRequest)
Status Message: Rate limit of 200 per 30 seconds is exceeded (Code:BadRequest)
{
"status": "Failed",
"error": {
"code": "BadRequest",
"message": "Rate limit of 200 per 30 seconds is exceeded"
}
}
These deployments did not attempt to retry and just lied dead. Is there something I'm missing that I should include in my bicep deployment file to make these retry?
To resolve this issue I added additional batching ontop of batching over the group deployment and this seems to have fixed it. 😂
batchsize(1)
module group 'scheduledQuery.bicep' = [for (items, index) in chunks: {
name: 'group-${index}'
params: {
alerts: [for (item, index) in items: rules[item] ]
}
}]
@Kaloszer glad it's working.
Not sure if this is a common requirement, to require an array slice()
or if there was a more simple syntax to cover this need?
This is kind of a workaround for the @batchsize(1) handling not working for this particular case ( I suppose there might be more, but I haven't found any that would be similar), in the end more features never hurt, as long as they don't introduce more bugs :D.
On the other hand, shouldn't the root cause (rate limiting) here be fixed by the provider? Not sure whether that's the problem here.
@Kaloszer hopefully when this feature moves out of preview they are able to scale to handle more throughput and remove the documented limit.
Hi @brwilkinson ,
Thank you for returning on this issue, however we are (sadly) no longer using Azure Monitor and as such are currently not coming across such issues I'm afraid. Other resources typically don't need to be deployed that many times (in our customerbase anyway).
Thank you @Bnetworx and @Kaloszer for the follow up and feedback.
Bicep version Bicep CLI version 0.4.1124 (66c84c8ee5)
Describe the bug I'm creating log analytics schedulded query rules (Log Alerts V2) in a for loop. I've made a template for this and are providing the necessary parameters through a json datafile. All query rules are defined there. There are 72 items in there. When I run the for loop, nothing happens. No deployment starts in Azure, I only get there errors:
To Reproduce Create 'Microsoft.Insights/scheduledQueryRules@2021-08-01' in a loop with more then 66 items and the deployment will fail. There no parent loops or anything like that, this code is only being run ONCE.
Additional context
Log Alert V2 definition:
One of my data items (of which I have 72)