SeldonIO / MLServer

An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more
https://mlserver.readthedocs.io/en/latest/
Apache License 2.0
692 stars 178 forks source link

Get metrics from the Queues in the inference in parallel/pool #728

Closed alvarorsant closed 1 year ago

alvarorsant commented 2 years ago

Hi, we would like the number of the elements within the request queue in pool inside of a metric, for performance issues. It's a good idea to get this data to tune the system increasing or decreasing parallel workers and pods in Openshift. What do you think?

adriangonz commented 2 years ago

Hey @alvarorsant ,

That's a pretty cool idea.

Right now, there are a number of queues which would be valuable to expose through metrics. From the top of my head, I could think of the following points (the order represents when they are hit in the processing flow):

  1. AsyncIO "task list" within the main process. This is technically not a queue, but it's a point where requests may get held up initially while the main process gets a free slot to process them.
  2. Queue for adaptive batching, waiting for requests to form a full batch. This will only be present if adaptive batching is enabled though.
  3. Queue for inference pool workers, waiting for a worker to pick up a new request.
  4. AsyncIO "task list" within the workers itself. Again, this is technically not a queue, but it's also another place where requests may get held up waiting for the worker to be free to run inference.

Ideally, it would be useful to instrument all 4 points, so that we know how many requests are currently being held on each one. Additionally, we could also have a 5th overarching count of many requests are currently being processed within MLServer (i.e. all requests currently between step 1. and 4.).

It would be great to hear your thoughts on that approach @alvarorsant .

alvarorsant commented 1 year ago

We have already found point 2 and 3, but could you specify the exact location of point 1 and 4 in code? We are preparing a PR for you indeed :) and we would need accurate indications.

Thanks in advance.

adriangonz commented 1 year ago

Hey @alvarorsant ,

That's great to hear! Thanks a lot for taking the lead in this one! :rocket:

For points 1. and 4., this would be within Python's own runtime. When called, AsyncIO methods in Python get all converted into "tasks" and scheduled for execution into a list, which gets iterated through by an event loop. The longer this task list becomes, the longer it will take to execute each scheduled task.

I haven't looked much into this one TBH, but there seem to be a few suggestions here of how to approach this one:

https://splunktool.com/how-can-i-measure-the-length-of-an-asyncio-event-loop

i.e. mainly doing something like:

len(asyncio.all_tasks(asyncio.get_running_loop()))
miguelopind commented 1 year ago

@adriangonz We opened this PR: https://github.com/SeldonIO/MLServer/pull/769 in which we added the metrics described in points 2 and 3 (Batch and Pool)

alvarorsant commented 1 year ago

Hi @adriangonz, I continue with @miguelopind's task. I opened this PR: https://github.com/SeldonIO/MLServer/pull/826, implementing points: 2,3,4. For implementing point 1, when you said 'AsyncIO "task list" within the main process', what do u exactly mean? Where is supposed it would be located that metric? Is it possible to monitor several processes in the main as labels?

Thanks in advance.

BTW, tell if the PR is ok for you or It'd need any modification.

adriangonz commented 1 year ago

Hey @alvarorsant,

Regarding your question, did you have a look at my previous comment (https://github.com/SeldonIO/MLServer/issues/728#issuecomment-1266739392)? Your main questions seem covered there, but please do let me know if there are any extra points you'd like to clarify.

Thanks for that PR BTW :+1: What should happen with the previous one though (https://github.com/SeldonIO/MLServer/pull/769)? If that one is not relevant anymore, could you sync with @miguelopind and remove it?

alvarorsant commented 1 year ago

Hello, @miguelopind left the company several days ago and he handed over the task but I don't have enough rights to close https://github.com/SeldonIO/MLServer/pull/769. I don't know if you have rights to do it.

Thanks a lot.

miguelopind commented 1 year ago

Hello @alvarorsant I can do it if you want, just tell me.

alvarorsant commented 1 year ago

Yes, @miguelopind, can you close it? I have already done a new PR gathering all the changes.

Thanks!

miguelopind commented 1 year ago

@alvarorsant done

alvarorsant commented 1 year ago

Hi Adrian, I've done a new PR https://github.com/SeldonIO/MLServer/pull/860 considering the changes you said (focused only in bacth queue and request queue with several tests)

adriangonz commented 1 year ago

Hey @alvarorsant ,

Thanks for making those changes.

Out of curiosity, is there any reason why you didn't update the existing PR instead? Either way, if #860 is now the up-to-date one, could you remove #826?

adriangonz commented 1 year ago

Fixed by #860