RUCAIBox / RecBole

A unified, comprehensive and efficient recommendation library
https://recbole.io/
MIT License
3.43k stars 614 forks source link

Fixed splitting dataset and saved recommendation #1273

Closed giuspillo closed 2 years ago

giuspillo commented 2 years ago

Hi! I just wanted to know if: (1) it is possible to provide the model an already split dataset (train/validation/test), avoiding a random splitting; (2) it is possible to read and save the recommendation lists computed by the algorithm.

Thank you!

hyp1231 commented 2 years ago

Hi, Certainly! :) (1) You can achieve this via adding an arg named benchmark_filename, and here is the corresponding API doc [link]. You can also refer to #1114. For sequential/session-based models, you can also refer to #1069 and an example for pre-split session-based benchmark loading [link]. (2) Please refer to this example [link] and #1032.

giuspillo commented 2 years ago

Thank you very much!

giuspillo commented 2 years ago

Hi, I've successfully trained a KGAT model (thanks again!), and now I'm trying to get the recommendation list. But if I follow the code examples you posted, I'm getting this error (I'm putting the full output) output.txt

Can you please help me?

hyp1231 commented 2 years ago

It seems that a shape mismatch error occurs. The model you would like to load has 6037 users and 20823 items, while the network you are currently building has 5661 users and 11896 items. Could you please check the differences between the configs of these two runs?

giuspillo commented 2 years ago

The config file I'm currently using is the following:

gpu_id: 1 training_batch_size: 1024 use_gpu: True data_path: datasets\ dataset: ml-100k benchmark_filename: ['part1', 'part2', 'part3']

eval_setting: RO_RS,full group_by_user: True leave_one_num: 2 real_time_process: False metrics: ['Recall', 'MRR', 'NDCG', 'Hit', 'Precision'] topk: [10] valid_metric: MRR@10 eval_batch_size: 4096 metric_decimal_place: 4

load_col: inter: ['user_id', 'item_id', 'rating'] kg: ['head_id', 'relation_id', 'tail_id'] link: ['item_id', 'entity_id']

hyp1231 commented 2 years ago

Hi, sorry for the late reply.

I've tried to reproduce this issue as follows,

python run_recbole.py -m KGAT -d ml-100k --config_files=kgat.yaml

where kgat.yaml is the config file you just provided. The dataset is located in recbole/dataset_example/ml-100k/. I split ml-100k.inter into three parts in a ratio of 79999, 10000, 10001.

Then I get,

command line args [-m KGAT -d ml-100k] will not be used in RecBole
07 May 20:01    INFO
General Hyper Parameters:
gpu_id = 1
use_gpu = True
seed = 2020
state = INFO
reproducibility = True
data_path = /home/houyupeng/RecBole/recbole/config/../dataset_example/ml-100k
checkpoint_dir = saved
show_progress = True
save_dataset = False
dataset_save_path = None
save_dataloaders = False
dataloaders_save_path = None
log_wandb = False

Training Hyper Parameters:
epochs = 300
train_batch_size = 2048
learner = adam
learning_rate = 0.001
neg_sampling = {'uniform': 1}
eval_step = 1
stopping_step = 10
clip_grad_norm = None
weight_decay = 0.0
loss_decimal_place = 4

Evaluation Hyper Parameters:
eval_args = {'split': {'RS': [0.8, 0.1, 0.1]}, 'group_by': 'user', 'order': 'RO', 'mode': 'full'}
repeatable = False
metrics = ['Recall', 'MRR', 'NDCG', 'Hit', 'Precision']
topk = [10]
valid_metric = MRR@10
valid_metric_bigger = True
eval_batch_size = 4096
metric_decimal_place = 4

Dataset Hyper Parameters:
field_separator =
seq_separator =
USER_ID_FIELD = user_id
ITEM_ID_FIELD = item_id
RATING_FIELD = rating
TIME_FIELD = timestamp
seq_len = None
LABEL_FIELD = label
threshold = None
NEG_PREFIX = neg_
load_col = None
unload_col = None
unused_col = None
additional_feat_suffix = None
rm_dup_inter = None
val_interval = None
filter_inter_by_user_or_item = True
user_inter_num_interval = None
item_inter_num_interval = None
alias_of_user_id = None
alias_of_item_id = None
alias_of_entity_id = None
alias_of_relation_id = None
preload_weight = None
normalize_field = None
normalize_all = None
ITEM_LIST_LENGTH_FIELD = item_length
LIST_SUFFIX = _list
MAX_ITEM_LIST_LENGTH = 50
POSITION_FIELD = position_id
HEAD_ENTITY_ID_FIELD = head_id
TAIL_ENTITY_ID_FIELD = tail_id
RELATION_ID_FIELD = relation_id
ENTITY_ID_FIELD = entity_id
benchmark_filename = ['part1', 'part2', 'part3']

Other Hyper Parameters:
wandb_project = recbole
require_pow = False
embedding_size = 64
kg_embedding_size = 64
layers = [64]
mess_dropout = 0.1
reg_weight = 1e-05
aggregator_type = bi
MODEL_TYPE = ModelType.KNOWLEDGE
training_batch_size = 1024
eval_setting = RO_RS,full
group_by_user = True
leave_one_num = 2
real_time_process = False
inter = ['user_id', 'item_id', 'rating']
kg = ['head_id', 'relation_id', 'tail_id']
link = ['item_id', 'entity_id']
MODEL_INPUT_TYPE = InputType.PAIRWISE
eval_type = EvaluatorType.RANKING
device = cuda
train_neg_sample_args = {'strategy': 'by', 'by': 1, 'distribution': 'uniform', 'dynamic': 'none'}
eval_neg_sample_args = {'strategy': 'full', 'distribution': 'uniform'}

07 May 20:01    INFO  ml-100k
The number of users: 944
Average actions of users: 106.04453870625663
The number of items: 1683
Average actions of items: 59.45303210463734
The number of inters: 100000
The sparsity of the dataset: 93.70575143257098%
Remain Fields: ['entity_id', 'user_id', 'item_id', 'rating', 'timestamp', 'age', 'gender', 'occupation', 'zip_code', 'movie_title', 'release_year', 'class', 'head_id', 'relation_id', 'tail_id']
The number of entities: 34713
The number of relations: 26
The number of triples: 91631
The number of items that have been linked to KG: 1598
07 May 20:01    INFO  [Training]: train_batch_size = [2048] negative sampling: [{'uniform': 1}]
07 May 20:01    INFO  [Evaluation]: eval_batch_size = [4096] eval_args: [{'split': {'RS': [0.8, 0.1, 0.1]}, 'group_by': 'user', 'order': 'RO', 'mode': 'full'}]
/home/houyupeng/miniconda3/envs/dl/lib/python3.7/site-packages/dgl/subgraph.py:289: DGLWarning: Key word argument preserve_nodes is deprecated. Use relabel_nodes instead.
  "Key word argument preserve_nodes is deprecated. Use relabel_nodes instead.")
/home/houyupeng/RecBole/recbole/model/knowledge_aware_recommender/kgat.py:134: RuntimeWarning: divide by zero encountered in power
  d_inv = np.power(rowsum, -1).flatten()
/home/houyupeng/RecBole/recbole/model/knowledge_aware_recommender/kgat.py:141: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  /opt/conda/conda-bld/pytorch_1646755953518/work/torch/csrc/utils/tensor_new.cpp:210.)
  indices = torch.LongTensor([final_adj_matrix.row, final_adj_matrix.col])
07 May 20:01    INFO  KGAT(
  (user_embedding): Embedding(944, 64)
  (entity_embedding): Embedding(34713, 64)
  (relation_embedding): Embedding(26, 64)
  (trans_w): Embedding(26, 4096)
  (aggregator_layers): ModuleList(
    (0): Aggregator(
      (message_dropout): Dropout(p=0.1, inplace=False)
      (W1): Linear(in_features=64, out_features=64, bias=True)
      (W2): Linear(in_features=64, out_features=64, bias=True)
      (activation): LeakyReLU(negative_slope=0.01)
    )
  )
  (tanh): Tanh()
  (mf_loss): BPRLoss()
  (reg_loss): EmbLoss()
)
Trainable parameters: 2398528
Train     0: 100%|█████████████████████████| 40/40 [00:02<00:00, 14.12it/s, GPU RAM: 0.20 G/31.75 G]
Train     0: 100%|█████████████████████████| 45/45 [00:00<00:00, 86.65it/s, GPU RAM: 0.32 G/31.75 G]
07 May 20:01    INFO  epoch 0 training [time: 3.42s, train_loss1: 24.5636, train_loss2: 29.7454]
Evaluate   : 100%|██████████████████████| 440/440 [00:01<00:00, 342.68it/s, GPU RAM: 0.48 G/31.75 G]
07 May 20:01    INFO  epoch 0 evaluating [time: 1.29s, valid_score: 0.089200]
07 May 20:01    INFO  valid result:
recall@10 : 0.0401    mrr@10 : 0.0892    ndcg@10 : 0.043    hit@10 : 0.267    precision@10 : 0.0355
07 May 20:01    INFO  Saving current: saved/KGAT-May-07-2022_20-01-11.pth
Train     1: 100%|█████████████████████████| 40/40 [00:01<00:00, 27.04it/s, GPU RAM: 0.48 G/31.75 G]
Train     1: 100%|█████████████████████████| 45/45 [00:01<00:00, 41.07it/s, GPU RAM: 0.48 G/31.75 G]
07 May 20:01    INFO  epoch 1 training [time: 2.79s, train_loss1: 21.2354, train_loss2: 19.7378]
Evaluate   : 100%|██████████████████████| 440/440 [00:00<00:00, 747.95it/s, GPU RAM: 0.48 G/31.75 G]
07 May 20:01    INFO  epoch 1 evaluating [time: 0.59s, valid_score: 0.135300]
07 May 20:01    INFO  valid result:
recall@10 : 0.0789    mrr@10 : 0.1353    ndcg@10 : 0.0778    hit@10 : 0.4057    precision@10 : 0.0628
07 May 20:01    INFO  Saving current: saved/KGAT-May-07-2022_20-01-11.pth
......
Train    84: 100%|█████████████████████████| 40/40 [00:01<00:00, 32.92it/s, GPU RAM: 0.48 G/31.75 G]
Train    84: 100%|█████████████████████████| 45/45 [00:01<00:00, 37.76it/s, GPU RAM: 0.48 G/31.75 G]
07 May 20:06    INFO  epoch 84 training [time: 2.48s, train_loss1: 5.8008, train_loss2: 1.0560]
Evaluate   : 100%|██████████████████████| 440/440 [00:00<00:00, 791.95it/s, GPU RAM: 0.48 G/31.75 G]
07 May 20:06    INFO  epoch 84 evaluating [time: 0.56s, valid_score: 0.371600]
07 May 20:06    INFO  valid result:
recall@10 : 0.2087    mrr@10 : 0.3716    ndcg@10 : 0.2288    hit@10 : 0.7239    precision@10 : 0.1605
07 May 20:06    INFO  Finished training, best eval result in epoch 73
07 May 20:06    INFO  Loading model structure and parameters from saved/KGAT-May-07-2022_20-01-11.pth
Evaluate   : 100%|██████████████████████| 439/439 [00:01<00:00, 355.43it/s, GPU RAM: 0.48 G/31.75 G]
07 May 20:06    INFO  best valid : OrderedDict([('recall@10', 0.2123), ('mrr@10', 0.3865), ('ndcg@10', 0.2332), ('hit@10', 0.7216), ('precision@10', 0.1616)])
07 May 20:06    INFO  test result: OrderedDict([('recall@10', 0.2308), ('mrr@10', 0.4564), ('ndcg@10', 0.2822), ('hit@10', 0.7244), ('precision@10', 0.196)])

Then I run the script

from recbole.quick_start import run_recbole, load_data_and_model

config, model, dataset, train_data, valid_data, test_data = load_data_and_model(
    model_file='saved/KGAT-May-07-2022_20-01-11.pth',
)

Then it run successfully.

So could you please try again on it to check whether there was some inconsistent configs?