Closed cmendible closed 2 months ago
As mentioned in #248:
For large environments, azqr
is attempting to many concurrent requests with diagnostics settings queries and failing.
If you hot this issue please use the -s
flag to set a subscription Id and if needed the -g
flag to specify a resource group, in order to reduce the number of scanned services.
@red-erik can you check if preview version: v.2.0.0-preview.5 works for you.
Thanks!
Hello, scanning the whole env I receive this:
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [chan send, 2 minutes]: github.com/Azure/azqr/internal/scanners.(DiagnosticSettingsScanner).ListResourcesWithDiagnosticSettings(0xc0000088b8, {0xc03271c000, 0x42b93, 0x48400}) D:/a/azqr/azqr/internal/scanners/diagnostics_settings.go:57 +0x111 github.com/Azure/azqr/internal/scanners.(DiagnosticSettingsScanner).Scan(0xc03254e000?, {0xc03271c000?, 0xc00014e050?, 0x22b1c00?}) D:/a/azqr/azqr/internal/scanners/diagnostics_settings.go:158 +0x18 github.com/Azure/azqr/internal.Scanner.Scan({}, 0xc02f77d808) D:/a/azqr/azqr/internal/scanner.go:142 +0x11fd github.com/Azure/azqr/cmd/azqr.scan(0x2e40920, {0xc000144008, 0x40, 0x40}) D:/a/azqr/azqr/cmd/azqr/scan.go:76 +0x525 github.com/Azure/azqr/cmd/azqr.init.func55(0x2e40920, {0xc000110140?, 0x4?, 0x2039c96?}) D:/a/azqr/azqr/cmd/azqr/scan.go:39 +0x2b github.com/spf13/cobra.(Command).execute(0x2e40920, {0xc000110120, 0x2, 0x2}) C:/Users/runneradmin/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:989 +0xab1 github.com/spf13/cobra.(Command).ExecuteC(0x2e3f7e0) C:/Users/runneradmin/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1117 +0x3ff github.com/spf13/cobra.(*Command).Execute(...) C:/Users/runneradmin/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1041 github.com/Azure/azqr/cmd/azqr.Execute() D:/a/azqr/azqr/cmd/azqr/root.go:36 +0x428 main.main() D:/a/azqr/azqr/cmd/main.go:11 +0xf
Will check with a single sub
Regards, Red.
Yo can also try running with the flag --azqr=false
this will just scan using APRL rules and should run without issues.
With a single sub I have this:
2024-08-21T09:13:13+02:00 INF Scanning subscriptions for Resource Count per Subscription and Type 2024-08-21T09:13:13+02:00 INF Generating Report: azqr_action_plan_2024_08_21_T091134.xlsx 2024-08-21T09:13:13+02:00 INF Skipping ImpactedResources. No data to render 2024-08-21T09:13:13+02:00 INF Skipping ResourceTypes. No data to render 2024-08-21T09:13:13+02:00 INF Skipping Services. No data to render 2024-08-21T09:13:13+02:00 INF Skipping Advisor. No data to render 2024-08-21T09:13:13+02:00 INF Skipping Defender. No data to render 2024-08-21T09:13:13+02:00 INF Skipping Costs. No data to render 2024-08-21T09:13:14+02:00 INF Scan completed.
Only 1 excel file produced, no pbit, no subfolder,etc.etc.
Checking a little bigger one
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [chan send]: github.com/Azure/azqr/internal/scanners.(DiagnosticSettingsScanner).ListResourcesWithDiagnosticSettings(0xc0000083f0, {0xc0043b4000, 0x2225, 0x2c00}) D:/a/azqr/azqr/internal/scanners/diagnostics_settings.go:57 +0x111 github.com/Azure/azqr/internal/scanners.(DiagnosticSettingsScanner).Scan(0xc00368a000?, {0xc0043b4000?, 0xc0000920a0?, 0x22b1c00?}) D:/a/azqr/azqr/internal/scanners/diagnostics_settings.go:158 +0x18 github.com/Azure/azqr/internal.Scanner.Scan({}, 0xc000053808) D:/a/azqr/azqr/internal/scanner.go:142 +0x11fd github.com/Azure/azqr/cmd/azqr.scan(0x2e40920, {0xc0000ae008, 0x40, 0x40}) D:/a/azqr/azqr/cmd/azqr/scan.go:76 +0x525 github.com/Azure/azqr/cmd/azqr.init.func55(0x2e40920, {0xc0000581c0?, 0x4?, 0x2039c96?}) D:/a/azqr/azqr/cmd/azqr/scan.go:39 +0x2b github.com/spf13/cobra.(Command).execute(0x2e40920, {0xc000058180, 0x4, 0x4}) C:/Users/runneradmin/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:989 +0xab1 github.com/spf13/cobra.(Command).ExecuteC(0x2e3f7e0) C:/Users/runneradmin/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1117 +0x3ff github.com/spf13/cobra.(*Command).Execute(...) C:/Users/runneradmin/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1041 github.com/Azure/azqr/cmd/azqr.Execute() D:/a/azqr/azqr/cmd/azqr/root.go:36 +0x428 main.main() D:/a/azqr/azqr/cmd/main.go:11 +0xf
--azqr=false seems to be working
@red-erik version: v.2.0.0-preview.6 really improves how azqr
handles scans with high number of resources.
I ran some tests and scanning diagnostics settings for 260000 resources can take about 20 minutes. After that each subscription scan can take about 2 or 3 minutes more.
Please note that using a Managed Identity or a Service Principal, instead of Azure CLI improves performance due to token caching.
Hello, running with ".\azqr.exe scan --mask=false -c=false -f" I receive this:
2024-08-22T11:03:34+02:00 INF Scanning subscriptions for Diagnostic Settings 2024-08-22T11:03:48+02:00 FTL Failed to get diagnostic settings error="POST https://management.azure.com/batch\n--------------------------------------------------------------------------------\nRESPONSE 429: 429 Too Many Requests\nERROR CODE: TenantRequestsThrottled\n--------------------------------------------------------------------------------\n{\n \"error\": {\n \"code\": \"TenantRequestsThrottled\",\n \"message\": \"Number of 'read' requests for tenant actor 'xxxx-xxxxx-xxxxx-xxxxx-xxxxx' exceeded. Please try again after '5' seconds after additional tokens are available. Refer to https://aka.ms/arm-throttling for additional information.\"\n }\n}\n--------------------------------------------------------------------------------\n"
Regards, Red.
It only works with --azqr=false
Hello, running with ".\azqr.exe scan --mask=false -c=false -f" I receive this:
2024-08-22T11:03:34+02:00 INF Scanning subscriptions for Diagnostic Settings 2024-08-22T11:03:48+02:00 FTL Failed to get diagnostic settings error="POST https://management.azure.com/batch\n--------------------------------------------------------------------------------\nRESPONSE 429: 429 Too Many Requests\nERROR CODE: TenantRequestsThrottled\n--------------------------------------------------------------------------------\n{\n "error": {\n "code": "TenantRequestsThrottled",\n "message": "Number of 'read' requests for tenant actor 'xxxx-xxxxx-xxxxx-xxxxx-xxxxx' exceeded. Please try again after '5' seconds after additional tokens are available. Refer to https://aka.ms/arm-throttling for additional information."\n }\n}\n--------------------------------------------------------------------------------\n"
Regards, Red.
I'll keep testing, I didn't get throttled yesterday while I runing some tests.
The reason disabling azqr
works is because that disables diagnostic settings checks.
I'll release 2.0.0-preview.7 shortly.
Preview7 seems to be working fine, with all required parameters (without disabling AZQR) . Thanks.
Red.
That is great news! @red-erik thanks for your feedback! Out of curiosity how long did the scan take?
started 2024-08-26T09:45:37+02:00 INF Scanning subscriptions for Microsoft.Automation/automationAccounts now 2024-08-26T10:50:24+02:00 INF Generating Report: azqr_action_plan_2024_08_26_T094535.xlsx and still waiting for the file. Should the pbit be generated too ?
The pbit
file is no longer generated and populated automatically (it was a maintenance nightmare). But you can now run:
azqr pbi -p .
which will create the pbit
file in the current folder. Then open it and select the xslx
result from the scan as the source for the dashboard.
Expected Behavior
Scanning large environments should work without issues
Actual Behavior
Scanning large environments exit with the following exception:
FTL Failed to get diagnostic settings error="Post "[https://management.azure.com/batch?api-version=2020-06-01\](https://management.azure.com/batch?api-version=2020-06-01%5C)": dial tcp [2603:1030:a0c::10]:443: bind: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full."