RUCAIBox / RecBole

A unified, comprehensive and efficient recommendation library
https://recbole.io/
MIT License
3.4k stars 612 forks source link

用criteo数据集进行CTR预估时报错 #1279

Closed yztian1997 closed 1 year ago

yztian1997 commented 2 years ago

Traceback (most recent call last): File "D:/大数据与人工智能/NFM/main.py", line 27, in train_data, valid_data, test_data = data_preparation(config, dataset) File "D:\Anaconda3\envs\pytorch-gpu\lib\site-packages\recbole\data\utils.py", line 99, in data_preparation built_datasets = dataset.build() File "D:\Anaconda3\envs\pytorch-gpu\lib\site-packages\recbole\data\dataset\sequential_dataset.py", line 194, in build return super().build() File "D:\Anaconda3\envs\pytorch-gpu\lib\site-packages\recbole\data\dataset\dataset.py", line 1465, in build self._change_feat_format() File "D:\Anaconda3\envs\pytorch-gpu\lib\site-packages\recbole\data\dataset\sequential_dataset.py", line 49, in _change_feat_format self.data_augmentation() File "D:\Anaconda3\envs\pytorch-gpu\lib\site-packages\recbole\data\dataset\sequential_dataset.py", line 94, in data_augmentation self._check_field('uid_field', 'time_field') File "D:\Anaconda3\envs\pytorch-gpu\lib\site-packages\recbole\data\dataset\dataset.py", line 1245, in _check_field raise ValueError(f'{field_name} isn\'t set.') ValueError: uid_field isn't set.

可以帮忙解答一下吗

hyp1231 commented 2 years ago

Hi, 请问可以给出运行的模型、数据集、运行指令等信息吗?以方便我们尝试复现

Sherry-XLL commented 1 year ago

您好 @yztian1997!criteo 数据集的参考配置文件如下,欢迎您下载我们最新版的 RecBole 代码重新运行。

# dataset config
field_separator: "\t"
seq_separator: " "
USER_ID_FIELD: ~
ITEM_ID_FIELD: ~
LABEL_FIELD: label
fill_nan: True
numerical_features: ['I1','I2','I3','I4','I5','I6','I7','I8','I9','I10','I11','I12','I13']
discretization:
  I1:
    method: 'LD'
  I2:
    method: 'LD'
  I3:
    method: 'LD'
  I4:
    method: 'LD'
  I5:
    method: 'LD'
  I6:
    method: 'LD'
  I7:
    method: 'LD'
  I8:
    method: 'LD'
  I9:
    method: 'LD'
  I10:
    method: 'LD'
  I11:
    method: 'LD'
  I12:
    method: 'LD'
  I13:
    method: 'LD'
load_col: 
    inter: '*'

# training and evaluation
epochs: 500
train_batch_size: 4096
eval_batch_size: 40960000

eval_args:
    group_by: ~
    split: {'RS':[0.8, 0.1, 0.1]}
    mode: labeled
    order: RO
valid_metric: AUC
metrics: ['AUC', 'LogLoss']

由于 issue 的低活跃度,我们将关闭该 issue。若您有新的问题,欢迎提出新的 issue。感谢您对伯乐的关注!