RUCAIBox / RecBole

A unified, comprehensive and efficient recommendation library
https://recbole.io/
MIT License
3.37k stars 606 forks source link

[🐛BUG] Migration errors in SASRec #2049

Closed thigazholi-official closed 4 weeks ago

thigazholi-official commented 4 months ago

1.BUG Description

At first let me thank you for aggregating all the recommendation algorithms at one place. While the SASRec from recbole version 0.2.1 seems to provide desired ranking of items, I am not able to produce the same results after migrating to recbole 1.2.0, with the same dataset.

2.TRAINING CODE

1.Version 0.2.1

1.1. config:

old_config = {'data_path': '/content/data',
 'USER_ID_FIELD': 'user_id',
 'ITEM_ID_FIELD': 'item_id',
 'RATING_FIELD': 'rating',
 'TIME_FIELD': 'timestamp',
 'load_col': {'inter': ['user_id', 'item_id', 'timestamp']},
 'epochs': 6,
 'learning_rate': 0.001,
 'train_batch_size': 4096,
 'eval_batch_size': 4096,
 'learner': 'adam',
 'eval_setting': 'TO_LS,uni100',
 'hidden_size': 128,
 'reproducibility': False,
 'checkpoint_dir': '/content/model'} 

1.2. code for training

run_recbole(model='SASRec', dataset='set', config_dict=old_config)

2.Version 1.2.0

2.1. config:

new_config = {  'data_path': '/content/data',
                'USER_ID_FIELD': 'user_id',
                'ITEM_ID_FIELD': 'item_id',
                'RATING_FIELD': 'rating',
                'TIME_FIELD': 'timestamp',
                'load_col': {'inter': ['user_id', 'item_id', 'timestamp']},
                'epochs': 6,
                'learning_rate': 0.001,
                'train_batch_size': 4096,
                'eval_batch_size': 4096,
                'learner': 'adam',
               'train_neg_sample_args':None,
                'eval_args': { 'split': {'LS': 'valid_and_test'},
                               'group_by': 'user',
                               'order': 'TO',
                               'mode': {'valid': 'uni100', 'test': 'uni100'}},
                'hidden_size': 128,
                'reproducibility': False,
                'checkpoint_dir': '/content/model',
                'loss_type':'CE'
             }

note: this configuration has been tried for both train_neg_sample_args=~ and None.

2.2. code for training

run_recbole(model='SASRec', dataset='set', config_dict=old_config)

3.Expected behavior**

I would like to reproduce the same output as obtained from recbole 0.2.1 in recbole 1.2.0, using SASRec and the same datasets.

4.Observation

While both the versions are expected to throw an error when loss_type=CE and train_neg_sample_args is not none (or training_neg_sample_num = 1), the error is observed only in 1.2.0 and training is successful with 0.2.1.

1.VERSION 1.2.0

1.1.configurator file

Screenshot 2024-05-16 131928

1.2.produce configuration

config = Config(model = 'SASRec', dataset = 'set', config_dict = new_config)
ValueError: train_neg_sample_args [{'distribution': 'uniform', 'sample_num': 1, 'alpha': 1.0, 'dynamic': False, 'candidate_num': 0}] should be None when the loss_type is CE.

2.VERSION 0.2.1

2.1.configurator file

Screenshot 2024-05-16 133441

2.2.produce configuration

config = Config(model = 'SASRec', dataset = '/content/data', config_dict = old_config)
General Hyper Parameters:
gpu_id = 0
use_gpu = True
seed = 2020
state = INFO
reproducibility = False
data_path = /content/data
show_progress = True

Training Hyper Parameters:
checkpoint_dir = /content/model
epochs = 6
train_batch_size = 4096
learner = adam
learning_rate = 0.001
training_neg_sample_num = 1
training_neg_sample_distribution = uniform
eval_step = 1
stopping_step = 10
clip_grad_norm = None
weight_decay = 0.0
draw_loss_pic = False
loss_decimal_place = 4

Evaluation Hyper Parameters:
eval_setting = TO_LS,uni100
group_by_user = True
split_ratio = [0.8, 0.1, 0.1]
leave_one_num = 2
real_time_process = False
metrics = ['Recall', 'MRR', 'NDCG', 'Hit', 'Precision']
topk = [10]
valid_metric = MRR@10
eval_batch_size = 4096
metric_decimal_place = 4

Dataset Hyper Parameters:
field_separator =   
seq_separator =  
USER_ID_FIELD = user_id
ITEM_ID_FIELD = item_id
RATING_FIELD = rating
TIME_FIELD = timestamp
seq_len = None
LABEL_FIELD = label
threshold = None
NEG_PREFIX = neg_
load_col = {'inter': ['user_id', 'item_id', 'timestamp']}
unload_col = None
unused_col = None
additional_feat_suffix = None
lowest_val = None
highest_val = None
equal_val = None
not_equal_val = None
max_user_inter_num = None
min_user_inter_num = 0
max_item_inter_num = None
min_item_inter_num = 0
fields_in_same_space = None
preload_weight = None
normalize_field = None
normalize_all = None
ITEM_LIST_LENGTH_FIELD = item_length
LIST_SUFFIX = _list
MAX_ITEM_LIST_LENGTH = 50
POSITION_FIELD = position_id
HEAD_ENTITY_ID_FIELD = head_id
TAIL_ENTITY_ID_FIELD = tail_id
RELATION_ID_FIELD = relation_id
ENTITY_ID_FIELD = entity_id

Other Hyper Parameters: 
n_layers = 2
n_heads = 2
hidden_size = 128
inner_size = 256
hidden_dropout_prob = 0.5
attn_dropout_prob = 0.5
hidden_act = gelu
layer_norm_eps = 1e-12
initializer_range = 0.02
loss_type = CE
rm_dup_inter = None
filter_inter_by_user_or_item = True
SOURCE_ID_FIELD = source_id
TARGET_ID_FIELD = target_id
benchmark_filename = None
MODEL_TYPE = ModelType.SEQUENTIAL
MODEL_INPUT_TYPE = InputType.POINTWISE
eval_type = EvaluatorType.RANKING
valid_metric_bigger = True
device = cuda
train_neg_sample_args = {'strategy': 'by', 'by': 1, 'distribution': 'uniform'}
Yilu114 commented 4 weeks ago

Based on your description, the issue you are encountering when migrating from RecBole version 0.2.1 to 1.2.0 with the SASRec model is due to changes in how negative sampling is handled. In RecBole 1.2.0, when the loss_type is set to CE (Cross-Entropy), the train_neg_sample_args must be set to None. If this setting is not None, a ValueError will occur.

To resolve this, ensure that your configuration file for version 1.2.0 sets train_neg_sample_args to None or ~ (indicating None in YAML). Also, review other parameters, like eval_args, to match the format and requirements of the new version.

For further guidance, refer to the RecBole documentation and changelogs to understand the new parameters and changes in version 1.2.0. By aligning your configuration with the new requirements, you should be able to reproduce the same results as in version 0.2.1. If you continue to have issues, please provide additional details so we can assist further.