NVIDIA-Merlin / Merlin

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.
Apache License 2.0
722 stars 112 forks source link

updates notebooks for multistage with subgraphs #1022

Closed jperez999 closed 1 year ago

jperez999 commented 1 year ago

This PR updates the multi stage recsys notebooks to use subgraph. There are quite a few changes that occur in there. Some major changes include a shift in flow of what information is stored in the feature store. Now the feature store holds raw item and raw user information. These values have not been pre processed. So when they are retrieved in the systems graph they must go through a preprocessing step. In the first notebook, we add usage of the Subgraph operator, and we create to subgraphs one for item and one for user. We also create another subgraph for the item categorification. This is so that we can categorify the item_features separately when they are used to retrieve item embeddings. In the second notebook, the use of subgraphs forces the ensemble to introduce NVT workflows to handle the preprocessing of the data after it is retrieved for both users and items. Here is where we introduce the usage of subworkflows, which are based on subgraph.

This PR depends on the following PRs: https://github.com/NVIDIA-Merlin/core/pull/349 https://github.com/NVIDIA-Merlin/systems/pull/372 https://github.com/NVIDIA-Merlin/core/pull/350 https://github.com/NVIDIA-Merlin/core/pull/353 https://github.com/NVIDIA-Merlin/systems/pull/378

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

github-actions[bot] commented 1 year ago

Documentation preview

https://nvidia-merlin.github.io/Merlin/review/pr-1022

rnyak commented 1 year ago

@jperez999 we need to update the unit test as well.. I can push it to this PR .