RUCAIBox / RecBole

A unified, comprehensive and efficient recommendation library
https://recbole.io/
MIT License
3.48k stars 616 forks source link

Config yaml file and Eval Batch Size #1392

Closed shahjaidev closed 2 years ago

shahjaidev commented 2 years ago

image

Hello!

I'm training models on my custom data and have created a config file (picture attached above) which I use as:

python run_recbole.py --model=BPR --dataset=coclick_1d --config_file_list=['/home/core/shahjaidev/DirectAU/jaidev_configs/coclick_1d_config.yaml']

However, I notice that many of the config parameters I set in the yaml are not actually set in the actual training. For instance, min_user_inter_num remains set to 5 in the config printed in the terminal. And the split remains [0.8, 0.1, 0.1] despite the fact I set it differently , as in the screenshot.

Am I not formatting the config yaml correctly? Could you please suggest the fix?

It would be quite helpful if you could share a couple of entire config files used to train different models. (I'm sure many others could also benefit from this)


On a related note, I set the eval_batch_size to 10000 and despite this, evaluation is very slow (over 3 hours). I'm curious why evaluation is this slow and what could be done to speed it up. What is the denominator in the progress bar for eval, is it the number of users (seems unlikely)?

image

Thanks:) Great work with this library!

image

Ethan-TZ commented 2 years ago

@shahjaidev Hello, thanks for your attention to RecBole!

For the first question, current version of RecBole no longer uses config fields such as max_user_inter_num or min_user_inter_num, but uses user_inter_num_interval or item_inter_num_interval. An example file for movielens dataset is as follows:

# dataset config
field_separator: "\t"
seq_separator: " "
USER_ID_FIELD: user_id
ITEM_ID_FIELD: item_id
RATING_FIELD: rating
NEG_PREFIX: neg_
LABEL_FIELD: label
load_col:
    inter: [user_id, item_id, rating]
val_interval:
    rating: "[3,inf)"    
unused_col: 
    inter: [rating]
user_inter_num_interval: "[10,inf)"
item_inter_num_interval: "[10,inf)"

# training and evaluation
epochs: 500
train_batch_size: 4096
valid_metric: MRR@10

# model
embedding_size: 64

For the second question, it may be due to too many items, resulting in the slow evaluation mode of full. You can try to use pop100 mode to speed up the evaluation process. i.e., just set:

eval_args:
    mode: pop100
shahjaidev commented 2 years ago

Thanks for the answer!

How about the split in eval_args? I don't know why the split I set in the config file doesn't take effect.

Ethan-TZ commented 2 years ago

@shahjaidev The split takes the form of:

eval_args:
  split: {'RS':[0.95, 0.01, 0.04]}

Note that metrics is not a key of eval_args.

shahjaidev commented 2 years ago

Thanks for the reply! Actually, even after setting eval_args: split: {'RS':[0.95, 0.01, 0.04]} in the yaml file,

and train_batch_size: 500 in the yaml file, these are not reflected in the model training and eval.

This is the command: python run_recbole.py --model=BPR --dataset=coclick_1d --config_file_list=['/home/core/shahjaidev/DirectAU/jaidev_configs/coclick_1d_config.yaml']

image

Terminal Output: image

shahjaidev commented 2 years ago

image

Also, for my understanding what is the denominator during evaluation progress bar? (For reference, number of users in the data is ~2 million)

Note: I'm using pop100 as the eval mode, and despite this evaluation is really slow

Ethan-TZ commented 2 years ago

@shahjaidev Hello, there is an error in your run command. i.e., we will not use config_file_list as the command line parameter, but config_files. Therefore, the correct command should be: python run_recbole.py --model=BPR --dataset=coclick_1d --config_files=/home/core/shahjaidev/DirectAU/jaidev_configs/coclick_1d_config.yaml

shahjaidev commented 2 years ago

moved to new issue