Open JimLiu96 opened 1 year ago
@JimLiu96 Hello, thanks for your attention to RecBole! This is a normal since the size of ITEM_ID and NEG_ITEM_ID in interaction is always the same, and this could not reflect the ratio of negative samples from here. For example, if we have only two item, i,e, 1,2, and we set the number of training negative items to be 3. The total training set could be denoted as: <1, 3 4 5> and <2, 7 8 9>. In fact, the interaction is formulated as: | ITEM_ID | NEG_ITEM_ID |
---|---|---|
1 | 3 | |
1 | 4 | |
1 | 5 | |
2 | 7 | |
2 | 8 | |
2 | 9 |
Therefore, you can determine the number of negative items by checking the occurrence of each positive item.
Describe the bug When I changed the number of training negative items to be a 5, and run the BPR model to print the number of negative items retrieved for training, I find that the sampled negative items is still the same as positive items. I am not sure it is bug or my mis-use of the API. Could you please help me resolving this issue?
To Reproduce Steps to reproduce the behavior:
extra yaml file :
I just run
python run_recbole.py --model=BPR --dataset=Beauty --config_files "bpr_config.yaml"
. Within thecalculate_loss
function of BPR model, I print the shape pf both positive items and negative items asThe output is however, showing that
which indicates that the number of positive and negative items are the same.
Training Hyper Parameters: checkpoint_dir = saved epochs = 500 train_batch_size = 1024 learner = adam learning_rate = 0.001 training_neg_sample_num = 5 training_neg_sample_distribution = uniform eval_step = 20 stopping_step = 10 clip_grad_norm = None weight_decay = 1e-06 draw_loss_pic = False loss_decimal_place = 4
Evaluation Hyper Parameters: eval_setting = RO_RS,full group_by_user = True split_ratio = [0.8, 0.1, 0.1] leave_one_num = 2 real_time_process = False metrics = ['Recall', 'NDCG'] topk = [10, 20, 50] valid_metric = NDCG@20 eval_batch_size = 4096 metric_decimal_place = 4
Dataset Hyper Parameters: field_separator =
seq_separator =
USER_ID_FIELD = user_id ITEM_ID_FIELD = item_id RATING_FIELD = rating TIME_FIELD = timestamp seq_len = None LABEL_FIELD = label threshold = None NEGPREFIX = neg load_col = {'inter': ['user_id', 'item_id', 'timestamp']} unload_col = None unused_col = None additional_feat_suffix = None lowest_val = None highest_val = None equal_val = None not_equal_val = None max_user_inter_num = None min_user_inter_num = 5 max_item_inter_num = None min_item_inter_num = 5 fields_in_same_space = None preload_weight = None normalize_field = None normalize_all = None ITEM_LIST_LENGTH_FIELD = item_length LIST_SUFFIX = _list MAX_ITEM_LIST_LENGTH = 50 POSITION_FIELD = position_id HEAD_ENTITY_ID_FIELD = head_id TAIL_ENTITY_ID_FIELD = tail_id RELATION_ID_FIELD = relation_id ENTITY_ID_FIELD = entity_id
Other Hyper Parameters: valid_metric_bigger = True rm_dup_inter = None filter_inter_by_user_or_item = True SOURCE_ID_FIELD = source_id TARGET_ID_FIELD = target_id benchmark_filename = None MODEL_TYPE = ModelType.GENERAL embedding_size = 64 dropout_prob = 0.2 train_neg_sample_args = {'strategy': 'by', 'by': 5, 'distribution': 'uniform'} MODEL_INPUT_TYPE = InputType.PAIRWISE eval_type = EvaluatorType.RANKING device = cuda