Azure / azure-media-migration

Azure Media Migration Tool
https://github.com/Azure/azure-media-migration
MIT License
11 stars 15 forks source link

TooManyRequests - migration tool has been throttled #237

Closed wbira closed 7 months ago

wbira commented 7 months ago

Hello! I run analyze command using migration tool against our staging media service account. It contains around 30k assets there. After 12k I got following exception. I'm a bit worried, cause on production media services account we have around 140k assets. Is there any way to increase limits (if yes which limit should be raised), or handle this error more gracefully e.g by implementing exponetial backoff or some kind of batching? I run this job on Azure container Instance with standard cpu/memory settings (4cpus 16GB RAM)

14:14:02 dbug: AMSMigrate.Ams.AssetAnalyzer[0] Analyzing Assets: 12000/0 Assets
Unhandled exception: Azure.RequestFailedException: The request is being throttled as the limit has been reached for operation type - Read_ObservationWindow_00:05:00. For more information, see - https://aka.ms/srpthrottlinglimits
Status: 429
ErrorCode: TooManyRequests

Content:
{"error":{"code":"TooManyRequests","message":"The request is being throttled as the limit has been reached for operation type - Read_ObservationWindow_00:05:00. For more information, see - https://aka.ms/srpthrottlinglimits"}}

Headers:
Cache-Control: no-cache
Pragma: no-cache
Retry-After: 36
x-ms-client-request-id: f438a892-05f1-4a5b-b902-a812d8ffa3ef
x-ms-request-id: 5ae539f4-1ac6-46fa-93d8-5b1cb10a7489
Strict-Transport-Security: REDACTED
Server: Microsoft-Azure-Storage-Resource-Provider/1.0,Microsoft-HTTPAPI/2.0 Microsoft-HTTPAPI/2.0
x-ms-ratelimit-remaining-subscription-reads: REDACTED
x-ms-correlation-request-id: REDACTED
x-ms-routing-request-id: REDACTED
X-Content-Type-Options: REDACTED
Date: Wed, 21 Feb 2024 14:17:48 GMT
Connection: close
Content-Length: 226
Content-Type: application/json
Expires: -1

   at Azure.ResourceManager.Storage.StorageAccountsRestOperations.GetPropertiesAsync(String subscriptionId, String resourceGroupName, String accountName, Nullable`1 expand, CancellationToken cancellationToken)
   at Azure.ResourceManager.Storage.StorageAccountCollection.GetAsync(String accountName, Nullable`1 expand, CancellationToken cancellationToken)
   at Azure.ResourceManager.Storage.StorageExtensions.GetStorageAccountAsync(ResourceGroupResource resourceGroupResource, String accountName, Nullable`1 expand, CancellationToken cancellationToken)
   at AMSMigrate.Ams.AzureResourceProvider.GetStorageAccountAsync(MediaServicesAccountResource account, MediaAssetResource asset, CancellationToken cancellationToken) in /src/ams/AzureResourceProvider.cs:line 72
   at AMSMigrate.Ams.AssetAnalyzer.<>c__DisplayClass4_2.<<MigrateAsync>b__2>d.MoveNext() in /src/ams/AssetAnalyzer.cs:line 231
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Parallel.<>c__54`1.<<ForEachAsync>b__54_0>d.MoveNext()
--- End of stack trace from previous location ---
   at AMSMigrate.Ams.BaseMigrator.MigrateInParallel[T](IAsyncEnumerable`1 values, IEnumerable`1 filteredList, Func`3 processItem, Int32 batchSize, CancellationToken cancellationToken) in /src/ams/BaseMigrator.cs:line 61
   at AMSMigrate.Ams.AssetAnalyzer.MigrateAsync(CancellationToken cancellationToken) in /src/ams/AssetAnalyzer.cs:line 222
   at AMSMigrate.Program.AnalyzeAssetsAsync(InvocationContext context, AnalysisOptions analysisOptions, CancellationToken cancellationToken) in /src/Program.cs:line 231
   at AMSMigrate.Program.<>c__DisplayClass0_0.<<Main>b__0>d.MoveNext() in /src/Program.cs:line 44
--- End of stack trace from previous location ---
   at System.CommandLine.Invocation.AnonymousCommandHandler.InvokeAsync(InvocationContext context)
   at System.CommandLine.Invocation.InvocationPipeline.<>c__DisplayClass4_0.<<BuildInvocationChain>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass17_0.<<UseParseErrorReporting>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at AMSMigrate.Program.SetupDependencies(InvocationContext context, Func`2 next) in /src/Program.cs:line 150
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass12_0.<<UseHelp>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass22_0.<<UseVersionOption>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass19_0.<<UseTypoCorrections>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<UseSuggestDirective>b__18_0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass16_0.<<UseParseDirective>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<RegisterWithDotnetSuggest>b__5_0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass8_0.<<UseExceptionHandler>b__0>d.MoveNext()
pohhsu commented 7 months ago

Hi Waldemar,

Thanks for reporting this issue. Just wanted to give you a quick update on our investigation.

The exception is from Azure Storage, originating from this call

var resource = await rg.GetStorageAccountAsync(asset.Data.StorageAccountName, cancellationToken: cancellationToken);
            return GetStorageAccount(resource);

GetStorageAccountAsync doc

Apparently, there is a limit on the amount of storage account management operations that one can do with storage (https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/request-limits-and-throttling), in particular, we think that you hit the 800 requests / 5 min throttling limit.

We don't believe there is any way to raise these limits.

Because the storage operation is used pretty pervasively in the code, it is a bit difficult to fully address this issue in the short run. We'll try to modify the code (when possible) to cache the calls to storage management api and reuse but even so, we're not sure we can avoid hitting this throttling limit fully. (unless we do retry and sleep for 5 minutes)

In the interim, there are two mitigations,

there is this -b flag,

 -b, --batch-size <batch-size>                                          Batch size for parallel processing. [default: 5]

that you can try to set to 1

This should slow down the number of asset analyzed / migrated because it will do it '1' asset at a time.

The other mitigation is to use

-cs, --creation-time-start The earliest creation time of the selected assets in UTC, format is yyyy-MM-ddThh:mm:ssZ, the hh:mm:ss is optional. -ce, --creation-time-end The latest creation time of the selected assets in UTC, format is yyyy-MM-ddThh:mm:ssZ, the hh:mm:ss is optional.

to limit the number of assets per run, that way you can chop the number of assets by date. This way you can partition the asset you want to migrate by date range and you run the tool multiple times, each with a smaller date range.

We apologize for your inconvenience..

pohhsu commented 7 months ago

Hi Waldemar,

We committed a PR to help alleviate the issue (https://github.com/Azure/azure-media-migration/pull/239). Feel free to try it out to see if it helps with your issue.

wbira commented 7 months ago

Hello @pohhsu Today I've managed to create next run of analyze command on staging media services account and it went through

2024-02-22 12:28:07.878 +00:00 [DBG] Analyzing Assets: 31894/0 Assets
2024-02-22 12:28:07.888 +00:00 [DBG] Finished analysis of assets for account: xxxxxx. Time taken "00:19:21.9585590"

I have a feeling that your change in PR #239 also improved performance a lot! Tomorrow I will run full migration to check if also assets command is also working (if it will be successul I will close a ticket), but it seems that issue is resolved (at least for analyze) Thank you for quick response and support!

wbira commented 7 months ago

Everything works fine. Thank you @pohhsu for your help!