[🐛BUG] 在Context-aware的模型的配置参数中，是否TIME_FIELD一定要用？

sxhysj commented 2 years ago

描述这个 bug 我使用NFM、LightGBM模型的时候，会遇到如下错误：

  File "/home/abc/anaconda3/envs/kaggle/lib/python3.9/site-packages/recbole/data/interaction.py", line 131, in __getitem__
    return self.interaction[index]
KeyError: 't_dat'

相应的参数表设置为：

field_separator: ","
seq_separator: " "

# Common Features
USER_ID_FIELD: customer_id
ITEM_ID_FIELD: article_id
RATING_FIELD: price
TIME_FIELD: t_dat
seq_len: ~

# Selectively Loading
load_col:
    inter: [t_dat, customer_id, article_id, price, sales_channel_id]
    item: [article_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,graphical_appearance_no,graphical_appearance_name,col\
our_group_code,colour_group_name,perceived_colour_value_id,perceived_colour_value_name,perceived_colour_master_id,perceived_colour_master_name,departme\
nt_no,department_name,index_code,index_name,index_group_no,index_group_name,section_no,section_name,garment_group_no,garment_group_name,detail_desc]
    user: [customer_id,FN,Active,club_member_status,fashion_news_frequency,age,postal_code]
unload_col: ~
unused_col: ~

eval_args:
    group_by: user
    order: RO
    split: {'RS': [0.8,0.1,0.1]}
    mode: full
metrics: ['Recall', 'MRR', 'NDCG', 'Hit', 'Precision', 'MAP']
topk: 10
valid_metric: MRR@10
metric_decimal_place: 4

threshold:
    price: 0.01

timestamp列的数据形式为：

实验环境：

操作系统: [如 Linux, macOS 或 Windows]
- RecBole 版本 [1.0.0]
Python 版本 [如 3.9.7]
- PyTorch 版本 [如 1.10.2]
- cudatoolkit 版本 [11.3.1]

我的问题是：这个的timestamp是日期，似乎是这个问题导致了keyerror的错误。我是否可以在context-aware模型中不适用time_field字段呢？time_field字段可以用日期吗？

leoleojie commented 2 years ago

@sxhysj 你好，首先time_field字段可以用日期。但是由于你的数据形式为“2018-09-20"，这仅能使用token存储，因此是无法排序的。如果你想要对数据进行排序，请将数据先处理成数字"20180920"，此时将数据类型设置成float后就可以排序了。详情见 API文档中有关Evaluation Settings的部分。

但是由于你的参数中order设置为‘RO’，所以并不会对数据排序。那么时间只能作为一个普通的特征来使用，这应该是无意义的。

除此以外，一般来说我们应用context-aware recommendation模型来做CTR预测，所以你的评测方式应该改为'labeled', 同时评价指标也应改为'AUC','LogLoss'等，而不是使用排序的评价指标。这里提供一个参考：

eval_args:
group_by: None
mode: labeled
metrics: ['AUC', 'LogLoss']
valid_metric: AUC

伯乐系统同样支持你使用这些模型来做LTR任务，但此时你需要保证inter文件中不包含context信息。具体可以阅读文档中有关Context-aware Recommendation模型参数设置部分

sxhysj commented 2 years ago

好的，已调通，非bug

RUCAIBox / RecBole

[🐛BUG] 在Context-aware的模型的配置参数中，是否TIME_FIELD一定要用？ #1139