awslabs / multi-model-server

Multi Model Server is a tool for serving neural net models for inference
Apache License 2.0
998 stars 230 forks source link

Update documentation to establish the difference between backend time and backend response time #1014

Open sachanub opened 1 year ago

sachanub commented 1 year ago

There has been some confusion recently regarding the differences between backend time and backend response time. We need to update the documentation or add some comments which highlight the differences between these two.

Backend response time: Time taken by a backend worker process to handle a request from the frontend worker thread. Backend time: Total time taken from when a client request was scheduled by the frontend worker thread to the time when it responds to the client request. This is inclusive of the backend response time above.