MultiDAE dataloader 加载数据参数有问题， recbole 0.2.1 版本

dxjjhm commented 3 years ago

`Train multidae example 31 May 11:12 INFO
General Hyper Parameters: gpu_id = 0 use_gpu = False seed = 2020 state = INFO reproducibility = True data_path = D:\source\RecBole-0.2.1\recbole\config../dataset_example/ml-100k show_progress = True

Training Hyper Parameters: checkpoint_dir = D:/data/recbole/checkpoint/ml-100k/multidae1/ epochs = 300 train_batch_size = 2048 learner = adam learning_rate = 0.001 training_neg_sample_num = 1 training_neg_sample_distribution = uniform eval_step = 1 stopping_step = 10 clip_grad_norm = None weight_decay = 0.0 draw_loss_pic = False loss_decimal_place = 4

Evaluation Hyper Parameters: eval_setting = RO_RS,full group_by_user = True split_ratio = [0.8, 0.1, 0.1] leave_one_num = 2 real_time_process = False metrics = ['Recall', 'MRR', 'NDCG', 'Hit', 'Precision'] topk = [10] valid_metric = MRR@10 eval_batch_size = 4096 metric_decimal_place = 4

Dataset Hyper Parameters: field_separator =
seq_separator =
USER_ID_FIELD = user_id ITEM_ID_FIELD = item_id RATING_FIELD = rating TIME_FIELD = timestamp seq_len = None LABEL_FIELD = label threshold = None NEGPREFIX = neg load_col = {'inter': ['user_id', 'item_id', 'rating', 'timestamp']} unload_col = None unused_col = None additional_feat_suffix = None rm_dup_inter = None lowest_val = None highest_val = None equal_val = None not_equal_val = None filter_inter_by_user_or_item = True max_user_inter_num = None min_user_inter_num = None max_item_inter_num = None min_item_inter_num = None fields_in_same_space = None preload_weight = None normalize_field = None normalize_all = True ITEM_LIST_LENGTH_FIELD = item_length LIST_SUFFIX = _list MAX_ITEM_LIST_LENGTH = 50 POSITION_FIELD = position_id HEAD_ENTITY_ID_FIELD = head_id TAIL_ENTITY_ID_FIELD = tail_id RELATION_ID_FIELD = relation_id ENTITY_ID_FIELD = entity_id

Other Hyper Parameters: mlp_hidden_size = [600] latent_dimension = 64 dropout_prob = 0.5 SOURCE_ID_FIELD = source_id TARGET_ID_FIELD = target_id benchmark_filename = None MODEL_TYPE = ModelType.GENERAL MODEL_INPUT_TYPE = InputType.PAIRWISE eval_type = EvaluatorType.RANKING valid_metric_bigger = True device = cpu train_neg_sample_args = {'strategy': 'by', 'by': 1, 'distribution': 'uniform'}

31 May 11:12 INFO Saving filtered dataset into [D:/data/recbole/checkpoint/ml-100k/multidae1/ml-100k-dataset.pth] 31 May 11:12 INFO ml-100k The number of users: 944 Average actions of users: 106.04453870625663 The number of items: 1683 Average actions of items: 59.45303210463734 The number of inters: 100000 The sparsity of the dataset: 93.70575143257098% Remain Fields: ['user_id', 'item_id', 'rating', 'timestamp'] 31 May 11:12 INFO Build [UserDataLoader] for [train] with format [InputType.PAIRWISE] 31 May 11:12 INFO [train] Negative Sampling: {'strategy': 'by', 'by': 1, 'distribution': 'uniform'} 31 May 11:12 INFO [train] batch_size = [2048], shuffle = [True]

Traceback (most recent call last): File "D:/source/RecBole-0.2.1/run_example/case_train.py", line 88, in multidae_ex_train() File "D:/source/RecBole-0.2.1/run_example/case_train.py", line 83, in multidae_ex_train run_recbole(model='MultiDAE', dataset='ml-100k', config_dict=param_dict) File "D:/source/RecBole-0.2.1/run_example/case_train.py", line 37, in run_recbole train_data, valid_data, test_data = data_preparation(config, dataset) File "D:\source\RecBole-0.2.1\recbole\data\utils.py", line 126, in data_preparation train_data = dataloader(**train_kwargs) TypeError: init() got an unexpected keyword argument 'sampler'`

dxjjhm commented 3 years ago

MultiVAE 也有同样的问题

dxjjhm commented 3 years ago

ENMF 也有同样的问题，有没有遇到过， 02.1版本的

补充源代码如下 `def enmf_ex_train(): print("Efficient Neural Matrix Factorization without Sampling for Recommendation.") params = { "use_gpu": False, "checkpoint_dir": "D:/data/recbole/checkpoint/ml-100k/enmf_1/" } mkdir_if_no_exist(params['checkpoint_dir']) run_recbole(model='ENMF', dataset='ml-100k', config_dict=params)

def mkdir_if_no_exist(path: str): if os.path.isdir(path): print(path + " already exists.") else: print(path + " is not exist, then mkdir " + path) os.mkdir(path)`

2017pxy commented 3 years ago

@dxjjhm 你好，AE类模型（例如CDAE，MultiVAE等）和非负采样类模型（例如ENMF）是不需要进行训练时负采样的，因此需要将train_neg_sample_num 设置成0，否则会报错。

这个问题已经在我们的文档中强调了，你可以再仔细阅读一下我们的文档，谢谢！

dxjjhm commented 3 years ago

发现了，设置后可以运行，但是在运行 RaCT 模型是，保存的模型与后面加载的模型名称不一致，报错如下，请问是否是配置问题？

01 Jun 17:09 INFO epoch 147 training [time: 0.17s, train loss: 525.6693] Train 148: 100%|██████████| 1/1 [00:00<00:00, 6.08it/s] 01 Jun 17:09 INFO epoch 148 training [time: 0.16s, train loss: 525.4373] Train 149: 100%|██████████| 1/1 [00:00<00:00, 6.23it/s] 01 Jun 17:09 INFO epoch 149 training [time: 0.16s, train loss: 525.1526] 01 Jun 17:09 INFO Saving current: D:/data/recbole/checkpoint/ml-100k/ract_1/RaCT-ml-100k-150.pth Traceback (most recent call last): File "D:/source/RecBole-0.2.1/run_example/case_train.py", line 218, in ract_ex_train() File "D:/source/RecBole-0.2.1/run_example/case_train.py", line 195, in ract_ex_train run_recbole(model='RaCT', dataset='ml-100k', config_dict=params) File "D:/source/RecBole-0.2.1/run_example/case_train.py", line 56, in run_recbole test_result = trainer.evaluate(test_data, load_best_model=saved, show_progress=config['show_progress']) File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\autograd\grad_mode.py", line 26, in decorate_context return func(*args, **kwargs) File "D:\source\RecBole-0.2.1\recbole\trainer\trainer.py", line 375, in evaluate checkpoint = torch.load(checkpoint_file) File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\serialization.py", line 581, in load with _open_file_like(f, 'rb') as opened_file: File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\serialization.py", line 230, in _open_file_like return _open_file(name_or_buffer, mode) File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\serialization.py", line 211, in init super(_open_file, self).init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: 'D:/data/recbole/checkpoint/ml-100k/ract_1/RaCT-Jun-01-2021_17-09-29.pth'

dxjjhm commented 3 years ago

RaCT 模型有 actor pretrain, critic pretrain, finetune train 三个步骤，已查看文档，是我这边没有看文档的问题，closed.

RUCAIBox / RecBole

MultiDAE dataloader 加载数据参数有问题， recbole 0.2.1 版本 #838