shahjaidev commented 2 years ago

❓ Questions & Help

Details

If Transformers4Rec is used at training time (offline), does this necessitate that the model must do real time inference when deployed?

To put the question a bit more concretely: Say I train a transformer on a dataset that consists of sequences of user item clicks(same session)

To do next item prediction at deployment, is it necessary to run inference on the trained transformer ?

The alternative (computationally cheaper) would be to utilize the trained transformer to pre-compute item embeddings (specifically, context vectors ) for every item id. At inference time, simply run ANN given the current item’s embedding. The way these item embeddings would be generated would be passing in 1-id sequences to the transformer and getting the context vector. Is this a reasonable idea?

What would be an alternative if one wanted to use transformers offline to train on sequences but pre-compute embeddings for each item so that online inference is cheap?

I’m a bit hesitant to convince myself of the second approach because embeddings that come from a transformer are by nature context dependent and the whole premise of using the transformer was so that the current items context vector could attend to the previously observed items.

Would the context vectors resulting from. 1-element sequences be any more powerful than if one simply ran CBOW Word2Vec on clicked item sequences ?

sararb commented 2 years ago

Hi @shahjaidev, thank you for your question!

The proposed architecture in our Transformers4rec paper used the weight-tying technique to link the input item embeddings and the outputs returned by the NextItemPredictionTask. By training with this technique, the model reuses the item embeddings table as the weights of the output layer, so that both the user sequence representation and item embeddings are in a compatible vector space. So during inference, you can export the item embeddings table for the ANN index but you also need to export the Transformer block for generating the user's representation.

A typical pipeline would be:

Train a session-based transformer-based model with weight-tying.
Export item-embeddings table for the ANN index
Export the Transformer-block for the user tower
During inference:
- Build the user's representation based on the sequences of user-item clicks
- Do an ANN-based search to retrieve top-k items to recommend.

For more details about how to deploy a recommender system, you can check our example notebooks here: we particularly showcase how to setup Faiss for ANN.

Please let us know if that answers your question?

rnyak commented 2 years ago

@shahjaidev closing this issue for now. you can reopen if you have further questions.

NVIDIA-Merlin / Transformers4Rec

[QST] #447

❓ Questions & Help

Details