RUCAIBox / RecBole

A unified, comprehensive and efficient recommendation library
https://recbole.io/
MIT License
3.48k stars 615 forks source link

How can I predict for new user using sequential model? #1314

Closed ys201810 closed 2 years ago

ys201810 commented 2 years ago

Hi! Thank you so much for your wonderful project.

I'm trying to make a predict code for new user using sequential model. "New user" means that they didn't appear during training.

At first, I thought it would be nice to use the full_sort_topk function from Docs and issues. But full_sort_topk need uid_series and test_data. And new user doesn't have both of them.

So I made below's code, but result is strange. Probably something is wrong with my code.

So, do you have a predict code for new user? Or please point out what is wrong in my code.

from recbole.data import create_dataset
from recbole.data.interaction import Interaction
from recbole.utils import get_model, init_seed

trained_model_file = '/path/to/seqential_trained_model_file.pth'
checkpoint = torch.load(trained_model_file)

config = checkpoint["config"]
dataset = create_dataset(config)

model = get_model(config["model"])(config, dataset)
model.load_state_dict(checkpoint["state_dict"])
model.load_other_parameter(checkpoint.get("other_parameter"))

device = torch.device(str("cuda:0") if torch.cuda.is_available() else "cpu") 
model = model.to(device)

tokens = ['199', '200']

pad_length = 50
item_length = len(tokens)
padded_item_sequence = torch.nn.functional.pad(torch.tensor(dataset.token2id(dataset.iid_field, tokens)),
                                                   (0, pad_length - item_length), "constant", 0)

input_interaction = Interaction(
        {
            "item_id_list": padded_item_sequence.reshape(1, -1),
            "item_length": torch.tensor([item_length]),
        }
    )
input_interaction = input_interaction.to(device)

scores = model.full_sort_predict(input_interaction)
scores = scores.view(-1, dataset.item_num)
scores[:, 0] = -np.inf
topk_score, topk_iid_list = torch.topk(scores, 30)

predicted_score_list = topk_score.tolist()[0]
predicted_item_list = dataset.id2token(dataset.iid_field, topk_iid_list.tolist()).tolist()

recommended_items = {
    "score_list": predicted_score_list,
    "item_list": predicted_item_list
}

print(recommended_items)

Thanks

Wicknight commented 2 years ago

@ys201810 Hello, thanks for your attention to RecBole! There is no fixed way to handle cold-start situation in RecBole. There are simple ways to deal with it such as embedding the features of new users, and then measure the similarity through vector distance to find the closest user and and replace new user with it. You adopted a similar idea here, and I don't think there is any problem with your code.

For the problem of abnormal results, I preliminarily speculate that it may be because the tokens you use is relatively short or you do not use the user's actual item interaction sequence to embed user. You can try to modify with actual item interaction sequence or use other embedding methods to embed user here. Or you can use the method I mentioned above to find the closest user. I hope my suggestions are helpful to you!

ys201810 commented 2 years ago

@Wicknight Thanks for quick and polite answer!

The idea of finding similar users from training users and getting predicted results is very helpful.

Sorry, I found that to mistake convert from field_id to real_id. Fix it then, I got appropriate results.

Thank you so much for your suport.

Again, thanks a lots for the great project and support.