NVIDIA-Merlin / NVTabular

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
Apache License 2.0
1.03k stars 143 forks source link

[FEA] Update statistics of an existing workflow #597

Open EvenOldridge opened 3 years ago

EvenOldridge commented 3 years ago

In addition to workflow.fit() and workflow.transform() we need a workflow.update() which takes the existing fit and updates the statistics.

For most ops this should be very straightforward, but may require us to capture additional data associated with the statistics like the number of entries.

This feature is a key part of continuous training of recommender systems.

gabrielspmoreira commented 3 years ago

Issue #798 addresses the incremental pre-processing of categorical features.