PreferredAI / cornac

A Comparative Framework for Multimodal Recommender Systems
https://cornac.preferred.ai
Apache License 2.0
863 stars 141 forks source link

[FEATURE] Init an evaluation with a predefined global_uid_map and global_iid_map #581

Open lthoang opened 8 months ago

lthoang commented 8 months ago

Description

The current global_uid_map and global_iid_map are reset when building an evaluation. https://github.com/PreferredAI/cornac/blob/f2d44cec7272f01d344c007312d51bc3644968b9/cornac/eval_methods/base_method.py#L646C36-L646C36

Expected behavior with the suggested feature

We use the dictionary global_uid_map or global_iid_map if provided instead of rebuilding the dictionary.

Other Comments

tqtg commented 8 months ago

Could you give an example on why we want to build an eval method from pre-built uid/iid maps, where train/val/test datasets provided?

lthoang commented 8 months ago

@tqtg Taking Streaming Session-based Recommendation (SSR) scenario as an example, a dataset is split chronologically with ratio 60:40. The latter 40% is then split into 5 folds (8% each). Training on the first 60% will be validated on the first 8% then tested with the next 8%. After that, the last test will be included as training data and will be tested with the next 8%. This process will be repeated until the last fold is tested. Given the training data growing over time, the increasing number of items make it difficult for comparison across the test folds.

If global_iid_map is given, the evaluation process will rank the same number of items. Hence, it helps us compare the performance of difference metrics across test folds. SSR implementation also specified the number of users and items here.

tqtg commented 8 months ago

Looking at the way they do evaluation in the paper, I don't think it can be simply accommodated. After each step, they don't retrain the model but only fine-tune with the additional 8% of test data from previous step. We don't have clear path to support it yet. Let's take a step back and think about the whole evaluation scheme first before trying to fix this small thing.