Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.37k stars 2.71k forks source link

OnlineRequestSettings doesn't support expiring stale requests #36296

Open mgoldey opened 1 week ago

mgoldey commented 1 week ago

This is not from the operating system

Describe the bug

With an Azure AI ML Studio OnlineDeployment hosted inside of an Azure AI ML Studio OnlineEndpoint configured to have requests timeout after 30_000 ms, requests are still processed up to 25 minutes later, even though the initial connection has dropped.

Coupled with almost any retry behavior in tooling consuming endpoints, this causes system outages quickly when the system is under load.

To Reproduce

Steps to reproduce the behavior:

  1. Create an Azure ML Studio OnlineEndpoint and OnlineDeployment
  2. Set the OnlineRequestSettings to something like the following
        request_settings=OnlineRequestSettings(
            request_timeout_ms=30_000,
            max_concurrent_requests_per_instance=30,
        ),
  3. Concurrently submit a large amount of load, ideally 1000 jobs that take 10 seconds each with a concurrency above what the instance can handle.
  4. Watch logs after all requests are cancelled

Expected behavior

I would expect Azure AI ML Studio deployments to no longer process requests where the connection has been dropped. There should be a way to enforce job expiration in the OnlineRequestSettings object that should stop old jobs from being processed.

Screenshots N/A

Additional context

It's unclear if job expiration is a configurable setting anywhere for Azure AI ML Studio endpoints/deployments, regardless of the SDK language in use. If it is, this feature needs to be accessible through the python sdk. If it isn't, please tell me where to make that bug report. It's unclear if the load balancer in front of the deployment caches the stale jobs or if the GUnicorn worker process is somehow storing the requests in memory. Here's an example of how one can solve this problem if the worker is at fault.

github-actions[bot] commented 6 days ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.