OnlineRequestSettings doesn't support expiring stale requests

Package Name: azure.ai.ml.entities

Package Version:

"azure-core==1.30.1",
"azure-identity==1.16.0",
"azureml-inference-server-http==1.2.2",

Operating System:

This is not from the operating system

Python Version: 3.10

Describe the bug

With an Azure AI ML Studio OnlineDeployment hosted inside of an Azure AI ML Studio OnlineEndpoint configured to have requests timeout after 30_000 ms, requests are still processed up to 25 minutes later, even though the initial connection has dropped.

Coupled with almost any retry behavior in tooling consuming endpoints, this causes system outages quickly when the system is under load.

To Reproduce

Steps to reproduce the behavior:

Create an Azure ML Studio OnlineEndpoint and OnlineDeployment

Set the OnlineRequestSettings to something like the following

    request_settings=OnlineRequestSettings(
        request_timeout_ms=30_000,
        max_concurrent_requests_per_instance=30,
    ),

Concurrently submit a large amount of load, ideally 1000 jobs that take 10 seconds each with a concurrency above what the instance can handle.
Watch logs after all requests are cancelled

Expected behavior

I would expect Azure AI ML Studio deployments to no longer process requests where the connection has been dropped. There should be a way to enforce job expiration in the OnlineRequestSettings object that should stop old jobs from being processed.

Screenshots N/A

Additional context

It's unclear if job expiration is a configurable setting anywhere for Azure AI ML Studio endpoints/deployments, regardless of the SDK language in use. If it is, this feature needs to be accessible through the python sdk. If it isn't, please tell me where to make that bug report. It's unclear if the load balancer in front of the deployment caches the stale jobs or if the GUnicorn worker process is somehow storing the requests in memory. Here's an example of how one can solve this problem if the worker is at fault.

Azure / azure-sdk-for-python

OnlineRequestSettings doesn't support expiring stale requests #36296