Azure / azqr

Azure Quick Review
https://azure.github.io/azqr
MIT License
526 stars 81 forks source link

Scanning large environments exit with: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full. #249

Closed cmendible closed 2 months ago

cmendible commented 2 months ago

Expected Behavior

Scanning large environments should work without issues

Actual Behavior

Scanning large environments exit with the following exception: FTL Failed to get diagnostic settings error="Post "[https://management.azure.com/batch?api-version=2020-06-01\](https://management.azure.com/batch?api-version=2020-06-01%5C)": dial tcp [2603:1030:a0c::10]:443: bind: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full."

cmendible commented 2 months ago

As mentioned in #248:

For large environments, azqr is attempting to many concurrent requests with diagnostics settings queries and failing.

If you hot this issue please use the -s flag to set a subscription Id and if needed the -g flag to specify a resource group, in order to reduce the number of scanned services.

cmendible commented 2 months ago

@red-erik can you check if preview version: v.2.0.0-preview.5 works for you.

Thanks!

red-erik commented 2 months ago

Hello, scanning the whole env I receive this:

fatal error: all goroutines are asleep - deadlock!

goroutine 1 [chan send, 2 minutes]: github.com/Azure/azqr/internal/scanners.(DiagnosticSettingsScanner).ListResourcesWithDiagnosticSettings(0xc0000088b8, {0xc03271c000, 0x42b93, 0x48400}) D:/a/azqr/azqr/internal/scanners/diagnostics_settings.go:57 +0x111 github.com/Azure/azqr/internal/scanners.(DiagnosticSettingsScanner).Scan(0xc03254e000?, {0xc03271c000?, 0xc00014e050?, 0x22b1c00?}) D:/a/azqr/azqr/internal/scanners/diagnostics_settings.go:158 +0x18 github.com/Azure/azqr/internal.Scanner.Scan({}, 0xc02f77d808) D:/a/azqr/azqr/internal/scanner.go:142 +0x11fd github.com/Azure/azqr/cmd/azqr.scan(0x2e40920, {0xc000144008, 0x40, 0x40}) D:/a/azqr/azqr/cmd/azqr/scan.go:76 +0x525 github.com/Azure/azqr/cmd/azqr.init.func55(0x2e40920, {0xc000110140?, 0x4?, 0x2039c96?}) D:/a/azqr/azqr/cmd/azqr/scan.go:39 +0x2b github.com/spf13/cobra.(Command).execute(0x2e40920, {0xc000110120, 0x2, 0x2}) C:/Users/runneradmin/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:989 +0xab1 github.com/spf13/cobra.(Command).ExecuteC(0x2e3f7e0) C:/Users/runneradmin/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1117 +0x3ff github.com/spf13/cobra.(*Command).Execute(...) C:/Users/runneradmin/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1041 github.com/Azure/azqr/cmd/azqr.Execute() D:/a/azqr/azqr/cmd/azqr/root.go:36 +0x428 main.main() D:/a/azqr/azqr/cmd/main.go:11 +0xf

    Will check with a single sub

Regards, Red.

cmendible commented 2 months ago

Yo can also try running with the flag --azqr=false this will just scan using APRL rules and should run without issues.

red-erik commented 2 months ago

With a single sub I have this:

2024-08-21T09:13:13+02:00 INF Scanning subscriptions for Resource Count per Subscription and Type 2024-08-21T09:13:13+02:00 INF Generating Report: azqr_action_plan_2024_08_21_T091134.xlsx 2024-08-21T09:13:13+02:00 INF Skipping ImpactedResources. No data to render 2024-08-21T09:13:13+02:00 INF Skipping ResourceTypes. No data to render 2024-08-21T09:13:13+02:00 INF Skipping Services. No data to render 2024-08-21T09:13:13+02:00 INF Skipping Advisor. No data to render 2024-08-21T09:13:13+02:00 INF Skipping Defender. No data to render 2024-08-21T09:13:13+02:00 INF Skipping Costs. No data to render 2024-08-21T09:13:14+02:00 INF Scan completed.

Only 1 excel file produced, no pbit, no subfolder,etc.etc.

Checking a little bigger one

red-erik commented 2 months ago

fatal error: all goroutines are asleep - deadlock!

goroutine 1 [chan send]: github.com/Azure/azqr/internal/scanners.(DiagnosticSettingsScanner).ListResourcesWithDiagnosticSettings(0xc0000083f0, {0xc0043b4000, 0x2225, 0x2c00}) D:/a/azqr/azqr/internal/scanners/diagnostics_settings.go:57 +0x111 github.com/Azure/azqr/internal/scanners.(DiagnosticSettingsScanner).Scan(0xc00368a000?, {0xc0043b4000?, 0xc0000920a0?, 0x22b1c00?}) D:/a/azqr/azqr/internal/scanners/diagnostics_settings.go:158 +0x18 github.com/Azure/azqr/internal.Scanner.Scan({}, 0xc000053808) D:/a/azqr/azqr/internal/scanner.go:142 +0x11fd github.com/Azure/azqr/cmd/azqr.scan(0x2e40920, {0xc0000ae008, 0x40, 0x40}) D:/a/azqr/azqr/cmd/azqr/scan.go:76 +0x525 github.com/Azure/azqr/cmd/azqr.init.func55(0x2e40920, {0xc0000581c0?, 0x4?, 0x2039c96?}) D:/a/azqr/azqr/cmd/azqr/scan.go:39 +0x2b github.com/spf13/cobra.(Command).execute(0x2e40920, {0xc000058180, 0x4, 0x4}) C:/Users/runneradmin/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:989 +0xab1 github.com/spf13/cobra.(Command).ExecuteC(0x2e3f7e0) C:/Users/runneradmin/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1117 +0x3ff github.com/spf13/cobra.(*Command).Execute(...) C:/Users/runneradmin/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1041 github.com/Azure/azqr/cmd/azqr.Execute() D:/a/azqr/azqr/cmd/azqr/root.go:36 +0x428 main.main() D:/a/azqr/azqr/cmd/main.go:11 +0xf

red-erik commented 2 months ago

--azqr=false seems to be working

cmendible commented 2 months ago

@red-erik version: v.2.0.0-preview.6 really improves how azqr handles scans with high number of resources.

I ran some tests and scanning diagnostics settings for 260000 resources can take about 20 minutes. After that each subscription scan can take about 2 or 3 minutes more.

Please note that using a Managed Identity or a Service Principal, instead of Azure CLI improves performance due to token caching.

red-erik commented 2 months ago

Hello, running with ".\azqr.exe scan --mask=false -c=false -f" I receive this:

2024-08-22T11:03:34+02:00 INF Scanning subscriptions for Diagnostic Settings 2024-08-22T11:03:48+02:00 FTL Failed to get diagnostic settings error="POST https://management.azure.com/batch\n--------------------------------------------------------------------------------\nRESPONSE 429: 429 Too Many Requests\nERROR CODE: TenantRequestsThrottled\n--------------------------------------------------------------------------------\n{\n \"error\": {\n \"code\": \"TenantRequestsThrottled\",\n \"message\": \"Number of 'read' requests for tenant actor 'xxxx-xxxxx-xxxxx-xxxxx-xxxxx' exceeded. Please try again after '5' seconds after additional tokens are available. Refer to https://aka.ms/arm-throttling for additional information.\"\n }\n}\n--------------------------------------------------------------------------------\n"

Regards, Red.

red-erik commented 2 months ago

It only works with --azqr=false

cmendible commented 2 months ago

Hello, running with ".\azqr.exe scan --mask=false -c=false -f" I receive this:

2024-08-22T11:03:34+02:00 INF Scanning subscriptions for Diagnostic Settings 2024-08-22T11:03:48+02:00 FTL Failed to get diagnostic settings error="POST https://management.azure.com/batch\n--------------------------------------------------------------------------------\nRESPONSE 429: 429 Too Many Requests\nERROR CODE: TenantRequestsThrottled\n--------------------------------------------------------------------------------\n{\n "error": {\n "code": "TenantRequestsThrottled",\n "message": "Number of 'read' requests for tenant actor 'xxxx-xxxxx-xxxxx-xxxxx-xxxxx' exceeded. Please try again after '5' seconds after additional tokens are available. Refer to https://aka.ms/arm-throttling for additional information."\n }\n}\n--------------------------------------------------------------------------------\n"

Regards, Red.

I'll keep testing, I didn't get throttled yesterday while I runing some tests.

The reason disabling azqr works is because that disables diagnostic settings checks.

I'll release 2.0.0-preview.7 shortly.

red-erik commented 2 months ago

Preview7 seems to be working fine, with all required parameters (without disabling AZQR) . Thanks.

Red.

cmendible commented 2 months ago

That is great news! @red-erik thanks for your feedback! Out of curiosity how long did the scan take?

red-erik commented 2 months ago

started 2024-08-26T09:45:37+02:00 INF Scanning subscriptions for Microsoft.Automation/automationAccounts now 2024-08-26T10:50:24+02:00 INF Generating Report: azqr_action_plan_2024_08_26_T094535.xlsx and still waiting for the file. Should the pbit be generated too ?

cmendible commented 2 months ago

The pbit file is no longer generated and populated automatically (it was a maintenance nightmare). But you can now run:

azqr pbi -p .

which will create the pbit file in the current folder. Then open it and select the xslx result from the scan as the source for the dashboard.