NVIDIA-Merlin / Transformers4Rec

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.

https://nvidia-merlin.github.io/Transformers4Rec/main

Apache License 2.0

1.08k stars 142 forks source link

[FEA] How can it be extended to handle next basket recommendation ? #268

Open tim5go opened 2 years ago

tim5go commented 2 years ago

❓ Questions & Help

Details

As titled, how can Transformers4Rec be extended to handle next basket recommendation, not just item recommendation?

sararb commented 2 years ago

Thank you for the great question! Transformers4Rec is currently supporting three prediction tasks:

NextItemPrediction: given a sequence of past interactions, predict the next item to interact with. If you have access to interactions types, you can extend this task to predict the next item of a given action (purchase or add for next basket predictions), taking into account the past sequence of items and their events types. The output layer will remain the same, i.e., logits over the catalog of items. (p.s: you may want to add the purchase event as a contextual vector to your prediction head, and you can check Figure 1 of our SIGIR’21 challenge paper for more detail about how to define this contextual vector)
Binary Classification / Regression tasks: these two tasks are applied to the whole session. For example, you can use Binary Classification to predict whether the user will purchase a given item based on the sequence of interactions within a session.

Additionally, we plan to add an item-level classification task to predict for multiple items in the same session at once.

I hope that answers your question?

tim5go commented 2 years ago

Hi @sararb First of all, thanks for your kind reply. Um... actually, I'm not sure if we are on the same page. When I said basket recommendation, I really mean an action (possibly Add or Purchase) on a group of items (A, B, C, D)

Is it something related what you said:

"item-level classification task to predict for multiple items in the same session at once"

Furthermore, can I use the prediction probability score from Binary Classification / Regression for item ranking?

sararb commented 2 years ago

Thank you for the clarification! The latter point "item-level classification task to predict for multiple items in the same session at once” is the most adapted to the task you are describing. There is an open feature request and we plan to support it asap.

The two first points of my previous answer are re-framing the problem in the following way:

In the 1st point, the task would be: Having the past session’s interactions what would be the next purchased items? The model will return the probability scores for all items of the catalog and you can retrieve top-k items that the user would likely purchase but it won’t necessarily be equal to the list (A, B, C, D).
In the2nd point, the task would be: Having the product A and the session’s interactions, the model should predict whether it will be purchased or not? This explodes the list of (A, B, C, D) to 4 inputs and we get from the model 4 independent scores.

Regarding your question "can I use the prediction probability score from Binary Classification / Regression for item ranking?" , do you mean the probability scores returned by the model for each item in (A, B, C, D) when item-level classification task is used?

tim5go commented 2 years ago

@sararb Yes, you're right. I'm referring to probability scores returned by the model for each item in (A, B, C, D), and planning to use these scores for product item ranking.

You may ask why I don't use the top-k items prediction right away for item ranking. The reason is that the business scenario I am dealing with have the following unique characteristiscs: 1) Only a small portion of products are for sales at a given time, and the product catalog is dynamically changing over time. 2) The contexts associated with the product items are very important for customers to make decision, and these contexts are also dynamically changing over time.

As a result, top-k items prediction would be problematic for my case as it doesn't guarantee the prediction it generates is actually for sales.

sararb commented 2 years ago

@tim5go

That's definitely a challenging and interesting problem!

Regarding your question about items ranking, if the prediction items (A, B, C, D) are sharing exactly the same context prior P (i.e. the same sequence of interactions is provided to the model for generating the predictions), you could use the probability score given by the classifier for ranking as this score represents how likely the user will purchase A (B, C or D) given the context P.

Another solution you might explore is to use the weight tying technique to share the weights between the item embeddings and the output layers (we are currently supporting this technique in NextItemPredictionTask and will be included in item-level classification as well). Then, you define a ranking function as the dot product between the representation of the sequence learned by the model and the embeddings representation of each item in (A, B, C, D). The score should measure the similarity between the context of interactions and the intention to purchase a given product. An open question would be how to represent the sequence interactions effectively? ( max/min pooling over the hidden representation of the interactions, first / last hidden state ).

These are some possible ideas but I am curious to get your feedback about how you solve such a problem :)