Closed Yevgnen closed 2 years ago
Hi, you might want to start by putting timing code in the handler
closure, e.g. right here: https://github.com/epwalsh/batched-fn/blob/93b3ffcf0becd30838389e70b329c3262b792b46/examples/example.rs#L36-L38
That will tell you how it takes to process each batch at a given batch size.
In general (for deep learning models, at least) I would set max_batch_size
as large as you can without running out of memory, and max_delay
small relative to the time it takes to process a batch.
Hi, thanks for the suggestion. That helps! I have one more question, how about channel_cap
?
When you set channel_cap
, batched_fn!
will internally use a bounded flume channel instead of an unbounded one. As a result, calls to your batched fn might return Error::Full
.
One reason you might want to use this feature is to return 503 errors when your server is getting too many requests at once. In particular, set channel_cap
to some number greater than max_batch_size
and then catch Error::Full
in your server code, and return a 503 error when that happens.
I realize the documentation about channel_cap
has been lacking, so here's this: https://github.com/epwalsh/batched-fn/pull/19
Thanks for the explanation!
You're welcome!
Thanks for the great package!
When tuning
max_batch_size
andmax_delay
, I wonder how can I benchmark or record the GPU execution time for a batch? I don't know where to put the time related codes in the code due to my limited knowledge to Rust. It might be easy for a single request but I have no idea for a batch.Thanks.