NVIDIA-Merlin / Merlin

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.
Apache License 2.0
758 stars 113 forks source link

[RMP] Dynamic Batching support at serving time #906

Open EvenOldridge opened 1 year ago

EvenOldridge commented 1 year ago

Problem:

Customers with high volumes of traffic want to trade off latency for throughput by grouping requests as dynamic batches.

Goal:

Leverage Triton's dynamic batching capabilities to enable support for dynamic batches in Merlin.

New Functionality

Systems

Examples

Constraints:

Within Triton

Starting Point: