Describe the bug
Training MovieLens-100K on algorithms DiffRec and LDiffRec crashes with exception "RuntimeError: shape mismatch: value tensor of shape [4040, 4040] cannot be broadcast to indexing result of shape [4040]".
CUDA available: True
command line args [--data_set_name MovieLens-100K --model_name LDiffRec] will not be used in RecBole
24 Jan 15:52 INFO
General Hyper Parameters:
gpu_id = 0
use_gpu = True
seed = 42
state = INFO
reproducibility = True
data_path = ./data_sets/MovieLens-100K
checkpoint_dir = ./data_sets/MovieLens-100K/recbole_checkpoints/
show_progress = True
save_dataset = False
dataset_save_path = None
save_dataloaders = False
dataloaders_save_path = None
log_wandb = False
Training Hyper Parameters:
epochs = 50
train_batch_size = 2048
learner = adam
learning_rate = 0.001
train_neg_sample_args = {'distribution': 'uniform', 'sample_num': 1, 'alpha': 1.0, 'dynamic': False, 'candidate_num': 0}
eval_step = 5
stopping_step = 10
clip_grad_norm = None
weight_decay = 0.0
loss_decimal_place = 4
Evaluation Hyper Parameters:
eval_args = {'split': {'LS': 'valid_and_test'}, 'order': 'RO', 'group_by': 'user', 'mode': {'valid': 'uni100', 'test': 'uni100'}}
repeatable = False
metrics = ['Recall', 'MRR', 'NDCG', 'Hit', 'MAP', 'Precision', 'GAUC', 'ItemCoverage', 'AveragePopularity', 'GiniIndex', 'ShannonEntropy', 'TailPercentage']
topk = [1, 3, 5, 10, 20]
valid_metric = NDCG@10
valid_metric_bigger = True
eval_batch_size = 4096
metric_decimal_place = 4
Dataset Hyper Parameters:
field_separator =
seq_separator =
USER_ID_FIELD = user_id
ITEM_ID_FIELD = item_id
RATING_FIELD = rating
TIME_FIELD = timestamp
seq_len = {}
LABEL_FIELD = label
threshold = None
NEG_PREFIX = neg_
load_col = {'inter': ['user_id', 'item_id', 'rating']}
unload_col = {}
unused_col = {}
additional_feat_suffix = []
rm_dup_inter = None
val_interval = {}
filter_inter_by_user_or_item = True
user_inter_num_interval = [0, inf)
item_inter_num_interval = [0, inf)
alias_of_user_id = None
alias_of_item_id = None
alias_of_entity_id = None
alias_of_relation_id = None
preload_weight = {}
normalize_field = []
normalize_all = False
ITEM_LIST_LENGTH_FIELD = item_length
LIST_SUFFIX = _list
MAX_ITEM_LIST_LENGTH = 50
POSITION_FIELD = position_id
HEAD_ENTITY_ID_FIELD = head_id
TAIL_ENTITY_ID_FIELD = tail_id
RELATION_ID_FIELD = relation_id
ENTITY_ID_FIELD = entity_id
benchmark_filename = None
Other Hyper Parameters:
worker = 0
wandb_project = recbole
shuffle = True
require_pow = False
enable_amp = False
enable_scaler = False
transform = None
n_cate = 1
reparam = True
in_dims = [300]
out_dims = []
ae_act_func = tanh
lamda = 0.03
anneal_cap = 0.005
anneal_steps = 1000
vae_anneal_cap = 0.3
vae_anneal_steps = 200
noise_schedule = linear
noise_scale = 0.1
noise_min = 0.001
noise_max = 0.005
sampling_noise = False
sampling_steps = 0
reweight = True
mean_type = x0
steps = 5
history_num_per_term = 10
beta_fixed = True
dims_dnn = [300]
embedding_size = 10
mlp_act_func = tanh
time-aware = False
w_max = 1
w_min = 0.1
numerical_features = []
discretization = None
kg_reverse_r = False
entity_kg_num_interval = [0, inf)
relation_kg_num_interval = [0, inf)
MODEL_TYPE = ModelType.GENERAL
encoding = utf-8
training_neg_sample_args = {'distribution': 'uniform', 'sample_num': 1, 'dynamic': False, 'candidate_num': 0}
MODEL_INPUT_TYPE = InputType.LISTWISE
eval_type = EvaluatorType.RANKING
single_spec = True
local_rank = 0
device = cuda
valid_neg_sample_args = {'distribution': 'uniform', 'sample_num': 100}
test_neg_sample_args = {'distribution': 'uniform', 'sample_num': 100}
24 Jan 15:52 INFO MovieLens-100K
The number of users: 944
Average actions of users: 106.04453870625663
The number of items: 1683
Average actions of items: 59.45303210463734
The number of inters: 100000
The sparsity of the dataset: 93.70575143257098%
Remain Fields: ['user_id', 'item_id', 'rating']
24 Jan 15:52 INFO [Training]: train_batch_size = [2048] train_neg_sample_args: [{'distribution': 'uniform', 'sample_num': 1, 'alpha': 1.0, 'dynamic': False, 'candidate_num': 0}]
24 Jan 15:52 INFO [Evaluation]: eval_batch_size = [4096] eval_args: [{'split': {'LS': 'valid_and_test'}, 'order': 'RO', 'group_by': 'user', 'mode': {'valid': 'uni100', 'test': 'uni100'}}]
24 Jan 15:52 WARNING Max value of users history interaction records has reached 43.672014260249554% of the total.
24 Jan 15:52 INFO LDiffRec(
(mlp): DNN(
(emb_layer): Linear(in_features=10, out_features=10, bias=True)
(mlp_layers): MLPLayers(
(mlp_layers): Sequential(
(0): Dropout(p=0, inplace=False)
(1): Linear(in_features=310, out_features=300, bias=True)
(2): Tanh()
(3): Dropout(p=0, inplace=False)
(4): Linear(in_features=300, out_features=300, bias=True)
)
)
(drop): Dropout(p=0.5, inplace=False)
)
(autoencoder): AutoEncoder(
(dropout): Dropout(p=0.1, inplace=False)
(encoder): MLPLayers(
(mlp_layers): Sequential(
(0): Dropout(p=0.0, inplace=False)
(1): Linear(in_features=1683, out_features=600, bias=True)
(2): Tanh()
)
)
(decoder): MLPLayers(
(mlp_layers): Sequential(
(0): Dropout(p=0.0, inplace=False)
(1): Linear(in_features=300, out_features=1683, bias=True)
)
)
)
)
Trainable parameters: 1700693
24 Jan 15:52 INFO epoch 0 training [time: 2.65s, train loss: 1853.4353]
24 Jan 15:52 INFO epoch 1 training [time: 0.18s, train loss: 1684.0792]
24 Jan 15:52 INFO epoch 2 training [time: 0.14s, train loss: 1610.4366]
24 Jan 15:52 INFO epoch 3 training [time: 0.13s, train loss: 1545.5997]
24 Jan 15:52 INFO epoch 4 training [time: 0.14s, train loss: 1487.6795]
Traceback (most recent call last):
File "/mnt/./run_recbole_test.py", line 158, in <module>
best_valid_score, best_valid_result = trainer.fit(train_data, valid_data)
File "/usr/local/lib/python3.10/site-packages/recbole/trainer/trainer.py", line 464, in fit
valid_score, valid_result = self._valid_epoch(
File "/usr/local/lib/python3.10/site-packages/recbole/trainer/trainer.py", line 283, in _valid_epoch
valid_result = self.evaluate(
File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/recbole/trainer/trainer.py", line 616, in evaluate
interaction, scores, positive_u, positive_i = eval_func(batched_data)
File "/usr/local/lib/python3.10/site-packages/recbole/trainer/trainer.py", line 558, in _neg_sample_batch_eval
scores[row_idx, col_idx] = origin_scores
RuntimeError: shape mismatch: value tensor of shape [4040, 4040] cannot be broadcast to indexing result of shape [4040]
Describe the bug Training MovieLens-100K on algorithms DiffRec and LDiffRec crashes with exception "RuntimeError: shape mismatch: value tensor of shape [4040, 4040] cannot be broadcast to indexing result of shape [4040]".
To Reproduce Steps to reproduce the behavior:
Expected behavior Models from the algorithms DiffRec and LDiffRec should be trained and evaluated on the MovieLens-100K data set without crashing.
Desktop (please complete the following information):