Not sure if this is already possible or what is needed here, but it would be useful to emit telemetry about how long it takes between the host sending a request to the worker and the worker starting work on that request.
Especially for single threaded language workers like Python and Node, this would indicate that there a function is blocking others from executing, resulting in a queue up of requests between the host and the worker. We can use this telemetry to build detectors and perhaps apply heuristics to dynamically alter concurrency within in a single instance.
If combined with a metric like CPU utilization, we can make recommendations to the customer. In a Python app, long latency with low CPU could benefit from switch the code to async or increasing the number of workers. Long latency and high CPU usually means a CPU bound workload that needs concurrency to be dialed down.
@cgillum @anirudhgarg I think we were talking about this?
Not sure if this is already possible or what is needed here, but it would be useful to emit telemetry about how long it takes between the host sending a request to the worker and the worker starting work on that request.
Especially for single threaded language workers like Python and Node, this would indicate that there a function is blocking others from executing, resulting in a queue up of requests between the host and the worker. We can use this telemetry to build detectors and perhaps apply heuristics to dynamically alter concurrency within in a single instance.
If combined with a metric like CPU utilization, we can make recommendations to the customer. In a Python app, long latency with low CPU could benefit from switch the code to async or increasing the number of workers. Long latency and high CPU usually means a CPU bound workload that needs concurrency to be dialed down.
@cgillum @anirudhgarg I think we were talking about this?