Dynamic Batching during Inference / Runtime

First, thanks for creating this great and high performant framework! I've looked in the open and closed issues and couldn't find this one.

Description

It would be really cool to be able to enable automatic batching of inference requests in the engine. The feature would dynamically wrap and unwrap similar-sized inputs in the engine based on configured max wait time and preferred batch size.

Instead of adding each item to the queue for the engine to individually process them, the engine would wrap data items in a batch and unwrap them after computation
The API is unchanged, but configuration settings are exposed to control batch size and max wait time per batching instance.

References

dynamic batching example implementation: https://github.com/triton-inference-server/server/blob/master/docs/model_configuration.md#dynamic-batcher

Expected Value

A large speedup is expected for practical use in high-load inference settings, where many users need to be served. When batching is implemented in the engine directly, it would be much faster than the currently available (best?) solution with the multi-model-server. Latter includes the overhead of a Java server + HTTP calls + Python-based batching.

apache / mxnet