RUCAIBox / RecBole-CDR

This is a library built upon RecBole for cross-domain recommendation algorithms
MIT License
91 stars 13 forks source link

Hyperparameter tuning question #52

Closed ajaykv1 closed 1 year ago

ajaykv1 commented 1 year ago

Hi, I am a bit new to hyperparameter tuning in Recbole-CDR. I ran the CoNet algorithm on my dataset, and the results seemed to be very poor. For the test dataset I am using, I am getting NDCG@10 values to be around 0.9 on a model that I coded up, so I believe the CoNet values should be around the same range, since CoNet is a strong baseline in Cross-Domain Recommendation. Below are the results I am getting when running CoNet.

INFO test result: OrderedDict([('recall@10', 0.0179), ('mrr@10', 0.0063), ('ndcg@10', 0.0087), ('hit@10', 0.0183), ('precision@10', 0.0018)])

I believe I need to tune the parameters for the model for the numbers to be a lot better. I want to tune the batch size, embedding size, the number of dense layers, the learning rate, and any other parameter that can be tuned. After tuning the hyperparameters, I want to use the best model to make recommendations on the test set.

Please let me know how I can tune the different hyperparameters, and used the best model on the test set to collect the metric values.

Right now, I am using the default values that come with the run_recbole_cdr.py file. Below are the default values that I am running CoNet with:

Evaluation Hyper Parameters:
eval_args = {'group_by': 'user', 'order': 'TO', 'split': {'RS': [0.7, 0.2, 0.1]}, 'mode': 'full'}
repeatable = False
metrics = ['Recall', 'MRR', 'NDCG', 'Hit', 'Precision']
topk = [10]
valid_metric = MRR@10
valid_metric_bigger = True
eval_batch_size = 4096
metric_decimal_place = 4
Other Hyper Parameters:
wandb_project = recbole_cdr
train_epochs = ['BOTH:300']
require_pow = False
embedding_size = 64
reg_weight = 0.01
mlp_hidden_size = [64, 32, 16, 8]
MODEL_TYPE = ModelType.CROSSDOMAIN
MODEL_INPUT_TYPE = InputType.POINTWISE
eval_type = EvaluatorType.RANKING
train_modes = ['BOTH']
epoch_num = ['300']
source_split = False
device = cuda
train_neg_sample_args = {'strategy': 'by', 'by': 1, 'distribution': 'uniform', 'dynamic': 'none'}
eval_neg_sample_args = {'strategy': 'full', 'distribution': 'uniform'}
Source domain: ./comedy_data/comedy
The number of users: 2217
Average actions of users: 16.08528880866426
The number of items: 4977
Average actions of items: 7.16338424437299
The number of inters: 35645
The sparsity of the dataset: 99.6769533176926%
Remain Fields: ['source_user_id', 'source_item_id', 'source_rating', 'source_timestamp']
Target domain: ./action_data/action
The number of users: 2217
Average actions of users: 19.935469314079423
The number of items: 2927
Average actions of items: 15.098086124401913
The number of inters: 44177
The sparsity of the dataset: 99.31921840719268%
Remain Fields: ['target_user_id', 'target_item_id', 'target_rating', 'target_timestamp']
Num of overlapped user: 2217
Num of overlapped item: 1
INFO  [Training]: train_batch_size = [2048] negative sampling: [{'uniform': 1}]
INFO  [Evaluation]: eval_batch_size = [4096] eval_args: [{'group_by': 'user', 'order': 'TO', 'split': {'RS': [0.7, 0.2, 0.1]}, 'mode': 'full'}] 
WYH-han commented 1 year ago

You can refer to “Parameter Tuning” in the Recbole document to tune the parameters