NVIDIA-Merlin / NVTabular

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
Apache License 2.0
1.05k stars 143 forks source link

[FEA] Incremental Pre-processing #798

Open gabrielspmoreira opened 3 years ago

gabrielspmoreira commented 3 years ago

Is your feature request related to a problem? Please describe. It is common in industry to train recommender systems models incrementally, i.e., taking a model trained with past data and fine-tuning with new data. In such cases, new values on existing categorical features need to be encoded as contiguous ids on top of the existing ones, whose embeddings will be appended to the pre-trained embedding tables.

Describe the solution you'd like NVTabular should support incremental pre-processing, by keeping the previous mapping between raw values and encoded values for categorical features (so that they match the position of the pre-trained embeddings), and assigning new values as contiguous item ids.

gabrielspmoreira commented 3 years ago

Issue #597 addresses incremental update of the statistics of numerical features

yuanqingz commented 3 years ago

Hi @karlhigley , what's the expected release date of v0.9?

BlakeB415 commented 11 months ago

Any progress on this?

rnyak commented 11 months ago

@BlakeB415 we dont have bandwidth to work on this feature now. So there is no progress on this feature.