Open peter-resnick opened 7 months ago
I'd like to ask you if you've solved this problem yet? Thanks.
Hi @ShengJinhao -- Thanks for bringing this up. I will assign this issue to myself and have a look at the underlying problem.
Do you have a reproducible example you can share with me regarding this behavior?
你好@ShengJinhao-- 感谢您提出这个问题。我会把这个问题分配给我自己,并看看根本问题是什么。
关于此行为,您有一个可以与我分享的可重现的例子吗?
Yes. But how should I share it with you? The problem I encountered is this: after I sent a large number of requests, I could receive them at first, but then I couldn't receive them at all, and I checked the logs and found the same problem as above.
Hi MLServer -
To start off, this is an awesome tool and the team has impressive work to get to this point.
I'm currently using MLServer in a high-throughput, low-latency system where we use gRPC to perform inferences. We have added an asynchronous capability into our inference client which sends many requests to the gRPC server at once (typically about 25). We have a timeout set on our client and we first started seeing a number of
DEADLINE_EXCEEDED
responses and I started to look into the model servers themselves to figure out why the server had started to exceed deadlines (we hadn't experienced this very often in the past) and it looks like the process loop is actually being restarted due messages being lost.We see the following traceback:
where the dispatcher is trying to check on a given message, but it's lost. ^ once this error occurs once, all of the rest of our parallel inference requests fail with the same exception (different message_id obviously).
I took a look at the source code and it looks like when the
process_response.result()
is called, the logic has a blanket exception for anything that isnt anasyncio.CancelledError
and assume that the process loop has crashed, so it restarts it by scheduling a new task, but it's not immediately clear (to me, at least) if this is really what should be happening. I don't see any signals from the server that the processing loop actually crashed - it just seems to be confused about which message its supposed to be getting.As a note about our system set up, we have these deployed into Kubernetes (so is our client app) as a deployment with between 10-15 pods at any given time with environment variable
MLSERVER_PARALLEL_WORKERS=16
.We are also using a
grpc.aio.insecure_channel(server)
pattern to manage the gRPC interactions on the client side.