RUCAIBox / RecBole

A unified, comprehensive and efficient recommendation library
https://recbole.io/
MIT License
3.35k stars 603 forks source link

Question related to context-aware recommendation (CARS) #1136

Closed SlokomManel closed 2 years ago

SlokomManel commented 2 years ago

I have been testing different algorithms from CARS i.e., FM but in most of the cases side information i.e., gender, age, occupation, country does not help to improve the recommendation performance. Is that common or is there a bug / problem somewhere? I have been testing different data sets: ML1M, ML100K, LastFM.

Thank you.

Ethan-TZ commented 2 years ago

@SlokomManel Hello , thanks for your attention to RecBole! There may be several reasons for this problem. First, you have to make sure that the labeling method is consistent with the mainstream method. E.g. for movie lens dataset, all the ratings for 1s and 2s are normalized to be 0s; 4s and 5s to be 1s; and rating 3s are removed. Secondly, for CARS , not all feature fields can improve the performance of the model, because the importance of each field is different. Generally speaking, fields with a bigger cardinal number can affect the performance more (e.g. user-id or item-id). I hope my answer can help you.

SlokomManel commented 2 years ago

Thanks for your explanation. I am trying to reproduce some papers for example https://alexiskz.files.wordpress.com/2016/06/km1212-karatzoglou.pdf . While using same dataset "Frappe", same spliting ratio [0.75, 0.05, 0.2] on MostPop algorithm : My results: 06 Feb 19:31 INFO test result: {'map@5': 0.1228, 'map@10': 0.138, 'precision@5': 0.2066, 'precision@10': 0.247, 'recall@5': 0.101, 'recall@10': 0.227, 'mrr@5': 0.3307, 'mrr@10': 0.3668, 'ndcg@5': 0.2062, 'ndcg@10': 0.2649, 'hit@5': 0.6386, 'hit@10': 0.883, 'itemcoverage@5': 0.005, 'itemcoverage@10': 0.0063, 'giniindex@5': 0.998, 'giniindex@10': 0.9968, 'shannonentropy@5': 0.1204, 'shannonentropy@10': 0.1109} vs

Screenshot 2022-02-06 at 19 34 12

P.S. In the paper, the calculation of MAP (equation 40) takes into account the contextual information ! Would you think that we need to include contextual side information to the implemented topK metrics ?

Ethan-TZ commented 2 years ago

@SlokomManel This paper enumerates all combinations of contextual information(For Frappe , K = 7 2 3 = 42) when calculating MAP. That is, it calculates the mean of MAP under each contextual. So of course you need to include contextual side information to the implemented topK metrics.