NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.
Apache License 2.0
758
stars
113
forks
source link
[RMP] Dynamic Batching support at serving time #906
Problem:
Customers with high volumes of traffic want to trade off latency for throughput by grouping requests as dynamic batches.
Goal:
Leverage Triton's dynamic batching capabilities to enable support for dynamic batches in Merlin.
New Functionality
Models
Transformers4Rec
NVTabular
Dataloader
Systems
Examples
Constraints:
Within Triton
Starting Point: