MobileTeleSystems / RecTools

RecTools - library to build Recommendation Systems easier and faster than ever before
Apache License 2.0
279 stars 36 forks source link

calc_metrics: Appropiate method for predicting for train / test? #146

Closed dataversenomad closed 5 months ago

dataversenomad commented 5 months ago

Your Question

hi team. I was wondering what would be the correct way to predict for train (specifically) considering that I have trained the model.

dataset = dataset top_k = 6

model = RandomModel() model.fit(dataset) recos = model.recommend( users=train_users, dataset=dataset, k=top_k, filter_viewed=True, ) metric_values = calc_metrics(metrics, recos, train, train, catalog)

Is this the correct approach?

it makes sense that for test should be:

recos = model.recommend( users=test_users, dataset=dataset, k=top_k, filter_viewed=True, ) metric_values = calc_metrics(metrics, recos, test, train, catalog)

I appreciate your help in advance.

Operating System

No response

Python Version

No response

RecTools version

0.5.0

feldlime commented 5 months ago

Hi @dataversenomad, it's a good question, thanks.

In general, your approach is correct; you should pass interactions=train to evaluate metrics for the train.

The only tricky part is prev_interactions=train since in your case, you're passing not previous interactions but the current ones. So all the metrics using this argument (NoveltyMetric, PopularityMetric, SerendipityMetric) will use information from the future, and it will affect them. But it's quite a common problem when you deal with metrics estimation on the train set; you just need to keep it in mind.

Does that answer your question?

dataversenomad commented 5 months ago

it totally makes sense. I really appreciate all your help.

feldlime commented 5 months ago

Great, closing then