Map `decode_request` during dynamic batching using a threadpool

Lightning-AI / LitServe

Lightning-fast serving engine for any AI model of any size. Flexible. Easy. Enterprise-scale.

https://lightning.ai/docs/litserve

Apache License 2.0

2.52k stars 160 forks source link

Map `decode_request` during dynamic batching using a threadpool #166

Open aniketmaurya opened 4 months ago

aniketmaurya commented 4 months ago

🚀 Feature

A default optimization that LitServe can provide users is to map the decode_request function in case of dynamic batching using a ThreadPool. This can be useful for cases like image loading which is IO based.

I did a quick test with a ResNet-152 model for image classification and observed the following throughput (Requests per second) performance gain with threadpool:

Motivation

Pitch

Alternatives

Additional context

grumpyp commented 2 months ago

hi @aniketmaurya have you already thought of an implementation of this?

I'd be interested in implementing it.