RUCAIBox / RecBole

A unified, comprehensive and efficient recommendation library
https://recbole.io/
MIT License
3.27k stars 590 forks source link

[🐛BUG] mat1 and mat2 shapes cannot be multiplied in SASRecF #1983

Closed arseny239 closed 4 months ago

arseny239 commented 5 months ago

Hi,

First of all let me thank you for such a great library. I really like it! But I still have some problem(s):

Describe the bug I tried to train and use some the SASRecF model. Before that, I tried the SASRec on the same data and it worked well, but I want to consider the item's features also.

But when I try to train it, I get an error: RuntimeError: mat1 and mat2 shapes cannot be multiplied (10x640 and 768x64)"

if I remove one feature from the 'selected_features' list it changes a little: RuntimeError: mat1 and mat2 shapes cannot be multiplied (10x576 and 704x64)

But if I leave only 1 feature in the 'selected_features' list, it starts to train ok: 'selected_features': ['name'],

My item's features are:

'item': ['vacancy_id', 'name', 'company_id', 'keySkills', 'compensation_from', 'compensation_to', 'area_id', 'area_regionId', 'employment', 'workSchedule', 'workExperience' ]}, 
'selected_features': ['name', 'company_id', 'keySkills', 'compensation_from', 'compensation_to', 'area_id', 'area_regionId', 'employment', 'workSchedule', 'workExperience' ],

and their types are:

"vacancy_id:token", 
"name:token_seq", 
"company_id:token", 
"description:token_seq", 
"keySkills:token_seq",
"compensation_from:float", 
"compensation_to:float", 
"currencyRate:float",
"area_id:token", 
"area_regionId:token", 
"employment:token", 
"workSchedule:token", 
"workExperience:token"

I use run_recbole to train the model: run_recbole(model='SASRecF', dataset=DATASET_NAME, config_dict=parameter_dict) At the same time, some other model(s), such as SASRec, works well.

Am I doing something wrong or is it a bug in the model? What reasons could cause this behavior?

I work with my own dataset, I created the .item and .inter "atomic files" (no .user file because I do not have any info about users - only id's)

I use recbole version 1.2.0 and Linux (Debian), without GPU (I train it on the CPU with 16 cores)

The full text of the error:

{
    "name": "RuntimeError",
    "message": "mat1 and mat2 shapes cannot be multiplied (10x640 and 768x64)",
    "stack": "---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[4], line 1
----> 1 run_recbole(model='SASRecF', dataset=DATASET_NAME, config_dict=parameter_dict)

File ~/.local/lib/python3.9/site-packages/recbole/quick_start/quick_start.py:141, in run_recbole(model, dataset, config_file_list, config_dict, saved, queue)
    138 logger.info(model)
    140 transform = construct_transform(config)
--> 141 flops = get_flops(model, dataset, config[\"device\"], logger, transform)
    142 logger.info(set_color(\"FLOPs\", \"blue\") + f\": {flops}\")
    144 # trainer loading and initialization

File ~/.local/lib/python3.9/site-packages/recbole/utils/utils.py:347, in get_flops(model, dataset, device, logger, transform, verbose)
    344 wrapper.apply(add_hooks)
    346 with torch.no_grad():
--> 347     wrapper(*inputs)
    349 def dfs_count(module: nn.Module, prefix=\"\\t\"):
    350     total_ops, total_params = module.total_ops.item(), 0

File ~/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.local/lib/python3.9/site-packages/recbole/utils/utils.py:288, in get_flops.<locals>.TracingAdapter.forward(self, interaction)
    287 def forward(self, interaction):
--> 288     return self.model.predict(interaction)

File ~/.local/lib/python3.9/site-packages/recbole/model/sequential_recommender/sasrecf.py:171, in SASRecF.predict(self, interaction)
    169 item_seq_len = interaction[self.ITEM_SEQ_LEN]
    170 test_item = interaction[self.ITEM_ID]
--> 171 seq_output = self.forward(item_seq, item_seq_len)
    172 test_item_emb = self.item_embedding(test_item)
    173 scores = torch.mul(seq_output, test_item_emb).sum(dim=1)

File ~/.local/lib/python3.9/site-packages/recbole/model/sequential_recommender/sasrecf.py:135, in SASRecF.forward(self, item_seq, item_seq_len)
    130 feature_emb = feature_table.view(
    131     table_shape[:-2] + (feat_num * embedding_size,)
    132 )
    133 input_concat = torch.cat((item_emb, feature_emb), -1)  # [B 1+field_num*H]
--> 135 input_emb = self.concat_layer(input_concat)
    136 input_emb = input_emb + position_embedding
    137 input_emb = self.LayerNorm(input_emb)

File ~/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1538, in Module._call_impl(self, *args, **kwargs)
   1535     bw_hook = hooks.BackwardHook(self, full_backward_hooks, backward_pre_hooks)
   1536     args = bw_hook.setup_input_hook(args)
-> 1538 result = forward_call(*args, **kwargs)
   1539 if _global_forward_hooks or self._forward_hooks:
   1540     for hook_id, hook in (
   1541         *_global_forward_hooks.items(),
   1542         *self._forward_hooks.items(),
   1543     ):

File ~/.local/lib/python3.9/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
    113 def forward(self, input: Tensor) -> Tensor:
--> 114     return F.linear(input, self.weight, self.bias)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (10x640 and 768x64)"
}

and also my parameter_dict:

parameter_dict = {
    'data_path': './',
    'USER_ID_FIELD': 'user_id',
    'ITEM_ID_FIELD': 'vacancy_id',
    'RATING_FIELD': 'action_type',
    'TIME_FIELD': 'timestamp',
    'user_inter_num_interval': "[10,inf)",
    'item_inter_num_interval': "[15,inf)",
    'seq_len': {'vacancy_id': 10}, 
    'MAX_ITEM_LIST_LENGTH': 10,    
    'load_col': {'inter': ['user_id', 'vacancy_id', 'action_type', 'timestamp'],
                 'user': ['user_id'],
                 'item': ['vacancy_id', 'name', 'company_id', 'keySkills', 'compensation_from', 'compensation_to', 'area_id', 'area_regionId', 'employment', 'workSchedule', 'workExperience' ]},  # не все, но на большее не хватает памяти
    'selected_features': ['name', 'company_id', 'keySkills', 'compensation_from', 'compensation_to', 'area_id', 'area_regionId', 'employment', 'workSchedule', 'workExperience' ],
#    'selected_features': ['name'], also tried only 1 feature
    'neg_sampling': None,
    'train_neg_sample_args': None,
    'worker': 16,   # на vm69
    'train_batch_size': 2048,
    'eval_batch_size': 2048,
    'epochs': 3,
    'use_gpu': False,
    'metrics': ['Recall', 'MRR'],
    'loss_type': 'CE',
    'topk': 100,
    'valid_metric': 'MRR@100',
    "stopping_step": 2, 
    'hidden_size': 64,  
    'inner_size': 256, 
    'hidden_dropout_prob': 0.3,
    'attn_dropout_prob': 0.3,  
    'seed': 42,
    'eval_args': {
        'split': {'RS': [0.95, 0.03, 0.02]},  
#        'split': {'RS': [10, 0, 0]},  # also tried 10,0,0 and 9,0,1
        'group_by': 'user',
        'order': 'TO',
        'mode': 'full'}  #  labeled ??
}

Thank you for the answer(s). Sincerely yours, Arseny

arseny239 commented 5 months ago

I did some additional research and I noticed that this error appears only if I have some float-type features in the 'selected_features' list. If I put only 'token' and/or 'token_seq' features there, the model starts to train ok. If I put only 'float' feature(s) in the 'selected_features' list, I get another error: RuntimeError: torch.cat(): expected a non-empty list of Tensors

Yilu114 commented 4 months ago

The errors you're encountering with the SASRecF model in RecBole seem to stem from the way item features are being processed and integrated into the model. SASRecF is designed to concatenate item representations with item attribute representations as inputs to the model. This process involves several hyperparameters and configurations that must align with the structure and content of your dataset.

The RuntimeError: mat1 and mat2 shapes cannot be multiplied suggests a mismatch in dimensions between the data provided to the model and the model's expected input sizes. When you change the selected_features list, the dimensions of the inputs change, hence the variation in error messages.

Key Points from the Documentation:

Addressing Your Issue:

Suggested Steps:

  1. Review Feature Selection: Ensure that all selected features are correctly defined in your dataset and correspond to the correct feature types (e.g., token, float).
  2. Adjust Hyperparameters: Consider adjusting the hidden_size and inner_size parameters based on the dimensions of your input data and the number of features you're including.
  3. Check Feature Processing: Ensure that float features are correctly processed and represented before being fed into the model. This might involve looking at how these features are embedded and concatenated with other feature embeddings.

For more detailed insights into configuring and running SASRecF, including hyperparameter settings and model usage, please refer to the official documentation provided by RecBole.https://www.recbole.io/docs/user_guide/model/sequential/sasrecf.html