NVIDIA-Merlin / Merlin

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.
Apache License 2.0
722 stars 111 forks source link

[RMP] Quick-start for ranking models training pipeline #827

Open gabrielspmoreira opened 1 year ago

gabrielspmoreira commented 1 year ago

Problem:

Merlin provides documentation and a number of example notebooks on how to use tools like NVTabular, Dataloader and Merlin Models. In order to build a pipeline for training and evaluation purposes, a Data Scientist needs to analyze that material, copy-and-paste code snippets demonstrating the API and glue that code together to implement scripts for experimentation and benchmarking. It might also not be clear to the users the advanced API options featured by Merlin Models that can be mapped as a hyperparameter, and potentially improve models accuracy.

Goal:

This RMP provides a Quick-start for building ranking models training pipelines.
It addresses the ranking models part of this larger RMP NVIDIA-Merlin/models#732, in particular the steps 4-7 of the Data Scientist journey when experimenting with Merlin Models.

The Quick start for ranking is composed by:

Template scripts

Documentation

Constraints:

Starting Point:

The ranking training scripts we have developed for the MTL research project.

Tasks:

PR: https://github.com/NVIDIA-Merlin/models/pull/988

Tenrec dataset experiments

Documentation

Deployment and inference with Triton

Testing

Blog post

rnyak commented 1 year ago

The tasks below are handled by https://github.com/NVIDIA-Merlin/Merlin/pull/966.

https://github.com/NVIDIA-Merlin/Merlin/issues/919 https://github.com/NVIDIA-Merlin/Merlin/issues/912 https://github.com/NVIDIA-Merlin/Merlin/issues/913 https://github.com/NVIDIA-Merlin/Merlin/issues/914