A default optimization that LitServe can provide users is to map the decode_request function in case of dynamic batching using a ThreadPool. This can be useful for cases like image loading which is IO based.
I did a quick test with a ResNet-152 model for image classification and observed the following throughput (Requests per second) performance gain with threadpool:
🚀 Feature
A default optimization that LitServe can provide users is to map the
decode_request
function in case of dynamic batching using a ThreadPool. This can be useful for cases like image loading which is IO based.I did a quick test with a ResNet-152 model for image classification and observed the following throughput (Requests per second) performance gain with threadpool:
Motivation
Pitch
Alternatives
Additional context