NVIDIA-Merlin / Merlin

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.
Apache License 2.0
722 stars 112 forks source link

Creating integration tests for quick-start for ranking #1015

Closed gabrielspmoreira closed 1 year ago

gabrielspmoreira commented 1 year ago

Closes #667 This PR creates the integration tests for quick-start for ranking scripts, which includes preprocessing the TenRec dataset with different options and training ranking models on the preprocessed data.

Preprocessing tests

Model building, training and evaluation tests

Data setup

These integration tests require a 10M rows sample of the TenRec dataset, which is available in this internal Google Drive (tenrec_ci.zip). The data needs to be downloaded in the CI machine and uncompressed to /raid/data/tenrec_ci/, which is the standard path where our other CI datasets are (e.g. /raid/data/lastfm/preprocessed). P.s. If needed, the path for the TenRec sample data can be set by using the CI_TENREC_DATA_PATH env variable

github-actions[bot] commented 1 year ago

Documentation preview

https://nvidia-merlin.github.io/Merlin/review/pr-1015