RUCAIBox / RecBole

A unified, comprehensive and efficient recommendation library
https://recbole.io/
MIT License
3.27k stars 590 forks source link

[🐛BUG] 使用ml-100k数据集运行SASRecF模型时报错 #1996

Open Gabrielle240125 opened 4 months ago

Gabrielle240125 commented 4 months ago

描述这个 bug 使用ml-100k数据集运行SASRecF模型时报错KeyError: 'age'

如何复现 复现这个 bug 的步骤:

  1. 您引入的额外 yaml 文件

dataset config

field_separator: "\t" #指定数据集field的分隔符 seq_separator: " " #指定数据集中token_seq或者float_seq域里的分隔符 USER_ID_FIELD: user_id #指定用户id域 ITEM_ID_FIELD: item_id #指定物品id域 RATING_FIELD: rating #指定打分rating域-二分法是否购买 TIME_FIELD: timestamp #指定时间域 NEGPREFIX: neg #指定负采样前缀 LABEL_FIELD: label #指定标签域 ITEM_LIST_LENGTH_FIELD: item_length #指定序列长度域 LIST_SUFFIX: _list #指定序列前缀 MAX_ITEM_LIST_LENGTH: 100 #指定最大序列长度 POSITION_FIELD: position_id #指定生成的序列位置id

指定从什么文件里读什么列,这里就是从.inter里面读取user_id, item_id, type, timestamp, flag这五列,剩下的以此类推

load_col: inter: [user_id, item_id, timestamp, rating] user: [user_id, age] selected features: [user_id, age]

training settings

epochs: 10 #训练的最大轮数## train_batch_size: 256 #训练的batch_size## learner: adam #使用的pytorch内置优化器 learning_rate: 0.001 #学习率 training_neg_sample_args: ~ #负采样数目 eval_step: 1 #每次训练后做evalaution的次数 stopping_step: 5 #控制训练收敛的步骤数,在该步骤数内若选取的评测标准没有什么变化,就可以提前停止了

参数

n_layers: 8 # (int) The number of transformer layers in transformer encoder.### n_heads: 8 # (int) The number of attention heads for multi-head attention layer.## hidden_size: 512 # (int) The number of features in the hidden state.### inner_size: 1024 # (int) The inner hidden size in feed-forward layer. hidden_dropout_prob: 0.2 # (float) The probability of an element to be zeroed. attn_dropout_prob: 0.2 # (float) The probability of an attention score to be zeroed. hidden_act: 'gelu' # (str) The activation function in feed-forward layer. pooling_mode: 'sum' # (str) Intra-feature pooling mode. Range in ['max', 'mean', 'sum']. layer_norm_eps: 1e-12 # (float) A value added to the denominator for numerical stability. initializer_range: 0.02 # (float) The standard deviation for normal initialization. loss_type: 'CE' # (str) The type of loss function.

evalution settings

eval_setting: TO_LS,full #对数据按时间排序,设置留一法划分数据集,并使用全排序 eval_args: split: {'LS': 'valid_and_test'} #切分比例 mode: full order: TO metrics: ["Recall", "MRR","NDCG","Hit","Precision"] #评测标准 topk: [1,3,5,10] #评测标准使用topk,设置成10评测标准就是["Recall@10", "MRR@10", "NDCG@10", "Hit@10", "Precision@10"]

valid_metric: MRR@10 #选取哪个评测标准作为作为提前停止训练的标准

eval_batch_size: 256 #评测的batch_size

  1. 您的代码 python run_recbole.py --model=SASRecF --dataset=ml-100k --config_files=sasrecf_test.yaml

  2. 您的运行脚本 python run_recbole.py --model=SASRecF --dataset=ml-100k --config_files=bert_test.yaml

train_neg_sample_args has turned to None!!! 21 Feb 21:54 INFO ['run_recbole.py', '--model=SASRecF', '--dataset=ml-100k', '--config_files=bert_test.yaml'] 21 Feb 21:54 INFO
General Hyper Parameters: gpu_id = 0 use_gpu = True seed = 2020 state = INFO reproducibility = True data_path = /root/autodl-tmp/zzzzz/RecBole-1.2.0/recbole/config/../dataset_example/ml-100k checkpoint_dir = saved show_progress = True save_dataset = False dataset_save_path = None save_dataloaders = False dataloaders_save_path = None log_wandb = False

Training Hyper Parameters: epochs = 10 train_batch_size = 256 learner = adam learning_rate = 0.0001 train_neg_sample_args = {'distribution': 'none', 'sample_num': 'none', 'alpha': 'none', 'dynamic': False, 'candidate_num': 0} eval_step = 1 stopping_step = 5 clip_grad_norm = None weight_decay = 0.0 loss_decimal_place = 4

Evaluation Hyper Parameters: eval_args = {'split': {'LS': 'valid_and_test'}, 'order': 'TO', 'group_by': 'user', 'mode': {'valid': 'full', 'test': 'full'}} repeatable = True metrics = ['Recall', 'MRR', 'NDCG', 'Hit', 'Precision'] topk = [1, 3, 5, 10] valid_metric = MRR@10 valid_metric_bigger = True eval_batch_size = 4096 metric_decimal_place = 4

Dataset Hyper Parameters: field_separator = seq_separator =
USER_ID_FIELD = user_id ITEM_ID_FIELD = item_id RATING_FIELD = rating TIME_FIELD = timestamp seq_len = None LABEL_FIELD = label threshold = None NEGPREFIX = neg load_col = {'inter': ['user_id', 'item_id', 'rating', 'timestamp'], 'user': ['user_id', 'age', 'gender']} unload_col = None unused_col = None additional_feat_suffix = None rm_dup_inter = None val_interval = None filter_inter_by_user_or_item = True user_inter_num_interval = None item_inter_num_interval = None alias_of_user_id = None alias_of_item_id = None alias_of_entity_id = None alias_of_relation_id = None preload_weight = None normalize_field = None normalize_all = True ITEM_LIST_LENGTH_FIELD = product_length LIST_SUFFIX = _list MAX_ITEM_LIST_LENGTH = 100 POSITION_FIELD = position_id HEAD_ENTITY_ID_FIELD = head_id TAIL_ENTITY_ID_FIELD = tail_id RELATION_ID_FIELD = relation_id ENTITY_ID_FIELD = entity_id kg_reverse_r = False entity_kg_num_interval = None relation_kg_num_interval = None benchmark_filename = None

Other Hyper Parameters: worker = 0 wandb_project = recbole shuffle = True require_pow = False enable_amp = False enable_scaler = False transform = mask_itemseq n_layers = 8 n_heads = 8 hidden_size = 512 inner_size = 1024 hidden_dropout_prob = 0.2 attn_dropout_prob = 0.2 hidden_act = gelu layer_norm_eps = 1e-12 initializer_range = 0.02 selected_features = ['age', 'gender'] pooling_mode = sum loss_type = CE numerical_features = [] discretization = None MODEL_TYPE = ModelType.SEQUENTIAL training_neg_sample_args = None mask_ratio = 0.2 ft_ratio = 0.5 eval_setting = TO_LS,full MODEL_INPUT_TYPE = InputType.POINTWISE eval_type = EvaluatorType.RANKING single_spec = True local_rank = 0 device = cuda valid_neg_sample_args = {'distribution': 'uniform', 'sample_num': 'none'} test_neg_sample_args = {'distribution': 'uniform', 'sample_num': 'none'}

21 Feb 21:54 INFO ml-100k The number of users: 944 Average actions of users: 106.04453870625663 The number of items: 1683 Average actions of items: 59.45303210463734 The number of inters: 100000 The sparsity of the dataset: 93.70575143257098% Remain Fields: ['user_id', 'item_id', 'rating', 'timestamp', 'age', 'gender'] 21 Feb 21:54 INFO [Training]: train_batch_size = [256] train_neg_sample_args: [{'distribution': 'none', 'sample_num': 'none', 'alpha': 'none', 'dynamic': False, 'candidate_num': 0}] 21 Feb 21:54 INFO [Evaluation]: eval_batch_size = [4096] eval_args: [{'split': {'LS': 'valid_and_test'}, 'order': 'TO', 'group_by': 'user', 'mode': {'valid': 'full', 'test': 'full'}}] 21 Feb 21:54 INFO SASRecF( (item_embedding): Embedding(1683, 512, padding_idx=0) (position_embedding): Embedding(100, 512) (feature_embed_layer): FeatureSeqEmbLayer( (token_embedding_table): ModuleDict( (item): FMEmbedding( (embedding): Embedding(65, 512) ) ) (float_embedding_table): ModuleDict() (token_seq_embedding_table): ModuleDict( (item): ModuleList() ) (float_seq_embedding_table): ModuleDict( (item): ModuleList() ) ) (trm_encoder): TransformerEncoder( (layer): ModuleList( (0-7): 8 x TransformerLayer( (multi_head_attention): MultiHeadAttention( (query): Linear(in_features=512, out_features=512, bias=True) (key): Linear(in_features=512, out_features=512, bias=True) (value): Linear(in_features=512, out_features=512, bias=True) (softmax): Softmax(dim=-1) (attn_dropout): Dropout(p=0.2, inplace=False) (dense): Linear(in_features=512, out_features=512, bias=True) (LayerNorm): LayerNorm((512,), eps=1e-12, elementwise_affine=True) (out_dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): FeedForward( (dense_1): Linear(in_features=512, out_features=1024, bias=True) (dense_2): Linear(in_features=1024, out_features=512, bias=True) (LayerNorm): LayerNorm((512,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) ) ) (concat_layer): Linear(in_features=1536, out_features=512, bias=True) (LayerNorm): LayerNorm((512,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) (loss_fct): CrossEntropyLoss() ) Trainable parameters: 18556416 Traceback (most recent call last): File "run_recbole.py", line 46, in run( File "/root/autodl-tmp/zzzzz/RecBole-1.2.0/recbole/quick_start/quick_start.py", line 52, in run res = run_recbole( File "/root/autodl-tmp/zzzzz/RecBole-1.2.0/recbole/quick_start/quick_start.py", line 141, in run_recbole flops = get_flops(model, dataset, config["device"], logger, transform) File "/root/autodl-tmp/zzzzz/RecBole-1.2.0/recbole/utils/utils.py", line 347, in get_flops wrapper(inputs) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, *kwargs) File "/root/autodl-tmp/zzzzz/RecBole-1.2.0/recbole/utils/utils.py", line 288, in forward return self.model.predict(interaction) File "/root/autodl-tmp/zzzzz/RecBole-1.2.0/recbole/model/sequential_recommender/sasrecf.py", line 171, in predict seq_output = self.forward(item_seq, item_seq_len) File "/root/autodl-tmp/zzzzz/RecBole-1.2.0/recbole/model/sequential_recommender/sasrecf.py", line 117, in forward sparse_embedding, dense_embedding = self.feature_embed_layer(None, item_seq) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, **kwargs) File "/root/autodl-tmp/zzzzz/RecBole-1.2.0/recbole/model/layers.py", line 1217, in forward return self.embed_input_fields(user_idx, item_idx) File "/root/autodl-tmp/zzzzz/RecBole-1.2.0/recbole/model/layers.py", line 1175, in embed_input_fields feature = user_item_feat[type][field_name][user_item_idx[type]] File "/root/autodl-tmp/zzzzz/RecBole-1.2.0/recbole/data/interaction.py", line 135, in getitem return self.interaction[index] KeyError: 'age'

预期 selected features为token或token_seq时,会报上述的错;selected features为float或float_seq时会报错RuntimeError: torch.cat(): expected a non-empty list of Tensors 如何解决问题使得程序顺利运行呢?感谢您的关注与解答!

屏幕截图 添加屏幕截图以帮助解释您的问题。(可选)

链接 添加能够复现 bug 的代码链接,如 Colab 或者其他在线 Jupyter 平台。(可选)

实验环境(请补全下列信息):

Gabrielle240125 commented 4 months ago

selected_features是指item feature,而不是user feature