epwalsh / batched-fn

🦀 Rust server plugin for deploying deep learning models with batched prediction
https://crates.io/crates/batched-fn
Apache License 2.0
19 stars 2 forks source link

[Question] How to benchmark the execution? #17

Closed Yevgnen closed 2 years ago

Yevgnen commented 2 years ago

Thanks for the great package!

When tuning max_batch_size and max_delay, I wonder how can I benchmark or record the GPU execution time for a batch? I don't know where to put the time related codes in the code due to my limited knowledge to Rust. It might be easy for a single request but I have no idea for a batch.

Thanks.

epwalsh commented 2 years ago

Hi, you might want to start by putting timing code in the handler closure, e.g. right here: https://github.com/epwalsh/batched-fn/blob/93b3ffcf0becd30838389e70b329c3262b792b46/examples/example.rs#L36-L38

That will tell you how it takes to process each batch at a given batch size.

In general (for deep learning models, at least) I would set max_batch_size as large as you can without running out of memory, and max_delay small relative to the time it takes to process a batch.

Yevgnen commented 2 years ago

Hi, thanks for the suggestion. That helps! I have one more question, how about channel_cap?

epwalsh commented 2 years ago

When you set channel_cap, batched_fn! will internally use a bounded flume channel instead of an unbounded one. As a result, calls to your batched fn might return Error::Full.

One reason you might want to use this feature is to return 503 errors when your server is getting too many requests at once. In particular, set channel_cap to some number greater than max_batch_size and then catch Error::Full in your server code, and return a 503 error when that happens.

epwalsh commented 2 years ago

I realize the documentation about channel_cap has been lacking, so here's this: https://github.com/epwalsh/batched-fn/pull/19

Yevgnen commented 2 years ago

Thanks for the explanation!

epwalsh commented 2 years ago

You're welcome!