RUCAIBox / RecBole

A unified, comprehensive and efficient recommendation library
https://recbole.io/
MIT License
3.45k stars 617 forks source link

Final issue on finetine general model BPR #1882

Open SergeyPetrakov opened 1 year ago

SergeyPetrakov commented 1 year ago

Thank you for your great work on Recbole. I followed the issues https://github.com/RUCAIBox/RecBole/discussions/1636, https://github.com/RUCAIBox/RecBole/issues/1854 and https://github.com/RUCAIBox/RecBole/issues/1871, I still have some misandestanding with finetuning for new items and users for a trained model. Let's take a look on a small example. Let's take the General recsys model - BPR.

this is the code from your library

# -*- coding: utf-8 -*-
# @Time   : 2020/6/25
# @Author : Shanlei Mu
# @Email  : slmu@ruc.edu.cn

# UPDATE:
# @Time   : 2020/9/16
# @Author : Shanlei Mu
# @Email  : slmu@ruc.edu.cn

r"""
BPR
################################################
Reference:
    Steffen Rendle et al. "BPR: Bayesian Personalized Ranking from Implicit Feedback." in UAI 2009.
"""

import torch
import torch.nn as nn

from recbole.model.abstract_recommender import GeneralRecommender
from recbole.model.init import xavier_normal_initialization
from recbole.model.loss import BPRLoss
from recbole.utils import InputType

class BPR(GeneralRecommender):
    r"""BPR is a basic matrix factorization model that be trained in the pairwise way."""
    input_type = InputType.PAIRWISE

    def __init__(self, config, dataset):
        super(BPR, self).__init__(config, dataset)

        # load parameters info
        self.embedding_size = config["embedding_size"]

        # define layers and loss
        self.user_embedding = nn.Embedding(self.n_users, self.embedding_size)
        self.item_embedding = nn.Embedding(self.n_items, self.embedding_size)
        self.loss = BPRLoss()

        # parameters initialization
        self.apply(xavier_normal_initialization)

    def get_user_embedding(self, user):
        r"""Get a batch of user embedding tensor according to input user's id.

        Args:
            user (torch.LongTensor): The input tensor that contains user's id, shape: [batch_size, ]

        Returns:
            torch.FloatTensor: The embedding tensor of a batch of user, shape: [batch_size, embedding_size]
        """
        return self.user_embedding(user)

    def get_item_embedding(self, item):
        r"""Get a batch of item embedding tensor according to input item's id.

        Args:
            item (torch.LongTensor): The input tensor that contains item's id, shape: [batch_size, ]

        Returns:
            torch.FloatTensor: The embedding tensor of a batch of item, shape: [batch_size, embedding_size]
        """
        return self.item_embedding(item)

    def forward(self, user, item):
        user_e = self.get_user_embedding(user)
        item_e = self.get_item_embedding(item)
        return user_e, item_e

    def calculate_loss(self, interaction):
        user = interaction[self.USER_ID]
        pos_item = interaction[self.ITEM_ID]
        neg_item = interaction[self.NEG_ITEM_ID]

        user_e, pos_e = self.forward(user, pos_item)
        neg_e = self.get_item_embedding(neg_item)
        pos_item_score, neg_item_score = torch.mul(user_e, pos_e).sum(dim=1), torch.mul(
            user_e, neg_e
        ).sum(dim=1)
        loss = self.loss(pos_item_score, neg_item_score)
        return loss

    def predict(self, interaction):
        user = interaction[self.USER_ID]
        item = interaction[self.ITEM_ID]
        user_e, item_e = self.forward(user, item)
        return torch.mul(user_e, item_e).sum(dim=1)

    def full_sort_predict(self, interaction):
        user = interaction[self.USER_ID]
        user_e = self.get_user_embedding(user)
        all_item_e = self.item_embedding.weight
        score = torch.matmul(user_e, all_item_e.transpose(0, 1))
        return score.view(-1)

And as far as I understand after training precedure and saving model to 'saved/BPR.pth' and then loading via code

from recbole.quick_start import load_data_and_model

model_path = 'saved/BPR.pth'

# load trained model, corresponding config and data used
config, model, dataset, train_data, valid_data, test_data = load_data_and_model(
    model_file = model_path
)

model will contain all information about trained item and user embeddings and they can be achieved using functions get_item_embedding and get_user_embedding from above. However if I want to add some new users and items I should realise some new method like

def add_new_users(self, user_tokens_list_trained, user_tokens_list_new):
        uid_series = torch.tensor(self.dataset.token2id(self.dataset.uid_field, user_tokens_list_trained))
        trained_embedings = self.model.get_user_embedding(uid_series)
        new_user_emb = nn.init.xavier_normal_(nn.Parameter(torch.Tensor(len(user_tokens_list_new), 64)))
        user_emb= torch.cat([trained_embedings ,new_user_emb])
        return user_emb

and before training replace previous embeddings by embeddings achieved from add_new_users. Please tell me is it right approach or what could be done better. Actually additional simple question where can I took a list of all user_tokens and user_ids

Thank you very much in advance!

BishopLiu commented 1 year ago

@SergeyPetrakov Your method is feasible but don't forget to change the inner embeddings of BPR. You can get all user_tokens by dataset.field2id_token[dataset.uid_field] and all user_ids by dataset.field2token_id[dataset.uid_field].values().