NVIDIA-Merlin / Merlin

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.
Apache License 2.0
772 stars 118 forks source link

[RMP] Update GTC Recommender to leverage Merlin Systems and new Merlin capabilities #887

Open EvenOldridge opened 1 year ago

EvenOldridge commented 1 year ago

Problem:

GTC Recommender was built through custom code and shortcuts. We would like to leverage Merlin to make the deployment of the GTC Recommender much more easily.

Definition of Done

New Functionality

Models

Transformers4Rec

NVTabular

Dataloader

Systems

Deliverables

Constraints:

Starting Point:

Existing GTC Recommender is our foundation for this work.

viswa-nvidia commented 1 year ago

@angmc , please add the systems related dev list here from the slack thread

angmc commented 1 year ago

To the question of what in this project is niche and is a functionality that may not be immediately needed, I would say it's anything related to the catalog swapping. This strategy worked because item/user ids were not used as inputs and because training the model would not have yielded significant improvements since the items being predicted were new. I don't think this is a common implementation for customers. A lot of what we did that relied on a python back-end so it would have to change. Now the model would have to be modified outside of triton, in the automation script, jit traced and repackaged as an ensemble and then deployed.

I don't believe any individual feature was too difficult to circumvent, but may not be the best user experience. Support for pre-trained embeddings and the use of categorify, I don't believe is a niche problem.

In post processing, the issue where we saw better throughput using pandas instead of cudf needs to be further explored. Pre-built post processing features or best practices can help prevent low throughput for customers.

bschifferer commented 1 year ago

@angmc @karlhigley @EvenOldridge

About: Operator for mapping between key values (Likely a modification to categorify to support an existing mapping)

NVTabular has a paramter called vocab (https://github.com/NVIDIA-Merlin/NVTabular/blob/main/nvtabular/ops/categorify.py#L208 ). Unfortunately, our documentation (inline doc string) doesn't explain what it is doing. But I think we can provide an existing mapping table to Categorify Op. We might be able to use the current NVT operator. The question is - how can we exchange the mapping table during serving?

bschifferer commented 1 year ago

@angmc @karlhigley @EvenOldridge @viswa-nvidia

About: Merging pre-trained embeddings in the dataloaders (https://github.com/NVIDIA-Merlin/Merlin/issues/211) Merging pre-trained embeddings at serving time (https://github.com/NVIDIA-Merlin/Merlin/issues/211)

I would not use pre-trained embeddings as input features to the model. Currently, we initialise an embedding table and load the weights of the pre-trained embedding into the embedding table. As the GTC recommender has only 5000 items, it is not required to use pretrained embedding as an input features. I think the pre-trained embeddings as input features will increase the latency/throughput numbers and I would not use this in that use-case.

As @angmc wrote ("Now the model would have to be modified outside of triton, in the automation script, jit traced and repackaged as an ensemble and then deployed.") - I think the process should be that the automation script updates the embedding table outside of Triton and we keep the current architecture loading the pre-trained embeddings into an embedding table.

A missing piece of the current proposal: Using pre-trained embedding vector as an input features isn't sufficient. We use weight-tying in the output layer to get the item scores. If we use pre-trained embeddings vector as an input features, we do not have all item embeddings available for the weight-tying operations. There are two solutions:

  1. We still initialise an embedding table and load the pre-trained embeddings as a weight -> then we added complexity by introducing pre-trained embeddings as input features because we still need to do the same automation as we do right now
  2. The model returns the output of the transformers before weight-tying and Merlin Systems will do a ANN look-up. This adds an additional complexity to split the model, initialize an ANN, etc. (as defined in RMP https://github.com/NVIDIA-Merlin/Merlin/issues/898 ). I am not sure, if that is too much scope for this ticket.

As GTC recommender has only 5000 items, the most easy way is to use pre-trained embeddings as an embedding table and not input feature and would still work with systems.

karlhigley commented 1 year ago

It seems to me that, assuming we're going to try to do this migration, we should be looking for a set of Merlin functionality which:

  1. Is sufficiently general that customers could apply it to their own use cases
  2. Can build a recommender that's broadly similar to the current GTC recommender

I don't think that we want to replicate the GTC recommender exactly—especially if it has quirks that we don't expect to reflect customer use cases—so I think we're kinda looking to make that system and our functionality meet in the middle somewhere.