Open EvenOldridge opened 1 year ago
@angmc , please add the systems related dev list here from the slack thread
To the question of what in this project is niche and is a functionality that may not be immediately needed, I would say it's anything related to the catalog swapping. This strategy worked because item/user ids were not used as inputs and because training the model would not have yielded significant improvements since the items being predicted were new. I don't think this is a common implementation for customers. A lot of what we did that relied on a python back-end so it would have to change. Now the model would have to be modified outside of triton, in the automation script, jit traced and repackaged as an ensemble and then deployed.
I don't believe any individual feature was too difficult to circumvent, but may not be the best user experience. Support for pre-trained embeddings and the use of categorify, I don't believe is a niche problem.
In post processing, the issue where we saw better throughput using pandas instead of cudf needs to be further explored. Pre-built post processing features or best practices can help prevent low throughput for customers.
@angmc @karlhigley @EvenOldridge
About: Operator for mapping between key values (Likely a modification to categorify to support an existing mapping)
NVTabular has a paramter called vocab
(https://github.com/NVIDIA-Merlin/NVTabular/blob/main/nvtabular/ops/categorify.py#L208 ). Unfortunately, our documentation (inline doc string) doesn't explain what it is doing. But I think we can provide an existing mapping table to Categorify Op. We might be able to use the current NVT operator. The question is - how can we exchange the mapping table during serving?
@angmc @karlhigley @EvenOldridge @viswa-nvidia
About: Merging pre-trained embeddings in the dataloaders (https://github.com/NVIDIA-Merlin/Merlin/issues/211) Merging pre-trained embeddings at serving time (https://github.com/NVIDIA-Merlin/Merlin/issues/211)
I would not use pre-trained embeddings as input features to the model. Currently, we initialise an embedding table and load the weights of the pre-trained embedding into the embedding table. As the GTC recommender has only 5000 items, it is not required to use pretrained embedding as an input features. I think the pre-trained embeddings as input features will increase the latency/throughput numbers and I would not use this in that use-case.
As @angmc wrote ("Now the model would have to be modified outside of triton, in the automation script, jit traced and repackaged as an ensemble and then deployed.") - I think the process should be that the automation script updates the embedding table outside of Triton and we keep the current architecture loading the pre-trained embeddings into an embedding table.
A missing piece of the current proposal: Using pre-trained embedding vector as an input features isn't sufficient. We use weight-tying in the output layer to get the item scores. If we use pre-trained embeddings vector as an input features, we do not have all item embeddings available for the weight-tying operations. There are two solutions:
As GTC recommender has only 5000 items, the most easy way is to use pre-trained embeddings as an embedding table and not input feature and would still work with systems.
It seems to me that, assuming we're going to try to do this migration, we should be looking for a set of Merlin functionality which:
I don't think that we want to replicate the GTC recommender exactly—especially if it has quirks that we don't expect to reflect customer use cases—so I think we're kinda looking to make that system and our functionality meet in the middle somewhere.
Problem:
GTC Recommender was built through custom code and shortcuts. We would like to leverage Merlin to make the deployment of the GTC Recommender much more easily.
Definition of Done
New Functionality
Models
Transformers4Rec
NVTabular
Dataloader
Systems
Deliverables
Constraints:
Starting Point:
Existing GTC Recommender is our foundation for this work.