NVIDIA-Merlin / Transformers4Rec

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.
https://nvidia-merlin.github.io/Transformers4Rec/main
Apache License 2.0
1.08k stars 142 forks source link

[FEA] How can it be extended to handle next basket recommendation ? #268

Open tim5go opened 2 years ago

tim5go commented 2 years ago

❓ Questions & Help

Details

As titled, how can Transformers4Rec be extended to handle next basket recommendation, not just item recommendation?

sararb commented 2 years ago

Thank you for the great question! Transformers4Rec is currently supporting three prediction tasks:

Additionally, we plan to add an item-level classification task to predict for multiple items in the same session at once.

I hope that answers your question?

tim5go commented 2 years ago

Hi @sararb First of all, thanks for your kind reply. Um... actually, I'm not sure if we are on the same page. When I said basket recommendation, I really mean an action (possibly Add or Purchase) on a group of items (A, B, C, D)

Is it something related what you said:

"item-level classification task to predict for multiple items in the same session at once"

Furthermore, can I use the prediction probability score from Binary Classification / Regression for item ranking?

sararb commented 2 years ago

Thank you for the clarification! The latter point "item-level classification task to predict for multiple items in the same session at once” is the most adapted to the task you are describing. There is an open feature request and we plan to support it asap.

The two first points of my previous answer are re-framing the problem in the following way:

Regarding your question "can I use the prediction probability score from Binary Classification / Regression for item ranking?" , do you mean the probability scores returned by the model for each item in (A, B, C, D) when item-level classification task is used?

tim5go commented 2 years ago

@sararb Yes, you're right. I'm referring to probability scores returned by the model for each item in (A, B, C, D), and planning to use these scores for product item ranking.

You may ask why I don't use the top-k items prediction right away for item ranking. The reason is that the business scenario I am dealing with have the following unique characteristiscs: 1) Only a small portion of products are for sales at a given time, and the product catalog is dynamically changing over time. 2) The contexts associated with the product items are very important for customers to make decision, and these contexts are also dynamically changing over time.

As a result, top-k items prediction would be problematic for my case as it doesn't guarantee the prediction it generates is actually for sales.

sararb commented 2 years ago

@tim5go

That's definitely a challenging and interesting problem!

Regarding your question about items ranking, if the prediction items (A, B, C, D) are sharing exactly the same context prior P (i.e. the same sequence of interactions is provided to the model for generating the predictions), you could use the probability score given by the classifier for ranking as this score represents how likely the user will purchase A (B, C or D) given the context P.

Another solution you might explore is to use the weight tying technique to share the weights between the item embeddings and the output layers (we are currently supporting this technique in NextItemPredictionTask and will be included in item-level classification as well). Then, you define a ranking function as the dot product between the representation of the sequence learned by the model and the embeddings representation of each item in (A, B, C, D). The score should measure the similarity between the context of interactions and the intention to purchase a given product. An open question would be how to represent the sequence interactions effectively? ( max/min pooling over the hidden representation of the interactions, first / last hidden state ).

These are some possible ideas but I am curious to get your feedback about how you solve such a problem :)