RUCAIBox / RecBole

A unified, comprehensive and efficient recommendation library
https://recbole.io/
MIT License
3.35k stars 603 forks source link

[🐛BUG] Contextual model with temporal order tries to embed `timestamp` #694

Closed deklanw closed 3 years ago

deklanw commented 3 years ago
dataset: ml-100k
eval_setting: TO_LS,full
model: DCN
metrics: ["Recall", "MRR", "NDCG", "Hit", "MAP", "Precision"]
topk: [10, 20]
valid_metric: Precision@10
state: DEBUG
group_by_user: True
training_neg_sample_num: 1
threshold: null
load_col:
  inter: ["user_id", "item_id", "timestamp"]
  user: ["user_id", "age", "gender", "occupation"]
  item: ["item_id", "release_year", "class"]

truncated callstack

~/anaconda3/lib/python3.8/site-packages/recbole/model/abstract_recommender.py in embed_input_fields(self, interaction)
    372         print(interaction)
    373         for field_name in self.float_field_names:
--> 374             if len(interaction[field_name].shape) == 2:
    375                 float_fields.append(interaction[field_name])
    376             else:

~/anaconda3/lib/python3.8/site-packages/recbole/data/interaction.py in __getitem__(self, index)
    103     def __getitem__(self, index):
    104         if isinstance(index, str):
--> 105             return self.interaction[index]
    106         else:
    107             ret = {}

KeyError: 'timestamp'

I'm guessing this is because the timestamp is removed from the Interaction but at this part

https://github.com/RUCAIBox/RecBole/blob/master/recbole/model/abstract_recommender.py#L370-L381

it's not excluding timestamp as a float field.

EliverQ commented 3 years ago

Hi, @deklanw! Actually it isn't a bug. In our full sort setting, you can't load any columns in a .inter file other than user_id end item_id. In order to solve this problem, you need to use unused_col parameter, which means that the column is loaded for data pre-processing but will not participate in the training process. For details, you can refer to our modified config file here:

dataset: ml-100k
eval_setting: TO_LS,full
model: DCN
metrics: ["Recall", "MRR", "NDCG", "Hit", "MAP", "Precision"]
topk: [10, 20]
valid_metric: Precision@10
state: DEBUG
group_by_user: True
training_neg_sample_num: 1
threshold: null
load_col:
  inter: ["user_id", "item_id", "timestamp"]
  user: ["user_id", "age", "gender", "occupation"]
  item: ["item_id", "release_year", "class"]
unused_col:
  inter: ['timestamp']
deklanw commented 3 years ago

@EliverQ Gotcha! That makes sense.

Is there a clean way we could make this error message less opaque? It seems like it could be a common mistake

EliverQ commented 3 years ago

@deklanw Sorry for the ambiguous error message. We'll update some error messages including this one in the near future.