NVIDIA-Merlin / Transformers4Rec

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.
https://nvidia-merlin.github.io/Transformers4Rec/main
Apache License 2.0
1.08k stars 142 forks source link

[QST] Can we create vocabulary/item embeddings from a list of items? #709

Closed ardulat closed 1 year ago

ardulat commented 1 year ago

❓ Questions & Help

Details

Hi! I am using Transformers4Rec for KDD Cup, and I was wondering if there is a way to create a vocabulary/item embeddings before training a model?

A little more context: in KDD Cup, we have train sessions file containing sessions with item lists. We also have a product file listing all the product details. However, the problem is that the test sessions contain some items that do not appear in the train sessions (but are in the products list). Can we use a product list with all the information to create item embeddings before training a model?

rnyak commented 1 year ago

@ardulat hello. can you clarify this question?

Can we use a product list with all the information to create item embeddings before training a model?

Tf4Rec models would create initial (random) embeddings if you do not train a model then these embeddings will be meaningless without training a model.

On the other hand, you can feed product sequence and product side information to a TF4Rec model and train it with couple epochs and then extract item embeddings or hidden state embeddings out of the Trainer.

if you have any product description like text data etc, you can use any NLP models (e.g. Bert, GPT2) to generate item embeddings. TF4Rec wont create embeddings from raw text, you need to encode them/convert a numerical representation first (if that's your question?)

rnyak commented 1 year ago

@ardulat I am closing this due to low activity. please reopen if you have further questions.