[🐛BUG] mat1 and mat2 shapes cannot be multiplied in SASRecF

Hi,

First of all let me thank you for such a great library. I really like it! But I still have some problem(s):

Describe the bug I tried to train and use some the SASRecF model. Before that, I tried the SASRec on the same data and it worked well, but I want to consider the item's features also.

But when I try to train it, I get an error: RuntimeError: mat1 and mat2 shapes cannot be multiplied (10x640 and 768x64)"

if I remove one feature from the 'selected_features' list it changes a little: RuntimeError: mat1 and mat2 shapes cannot be multiplied (10x576 and 704x64)

i.e. the problematic dimension is reduced by 128 but the error itself is the same.

But if I leave only 1 feature in the 'selected_features' list, it starts to train ok: 'selected_features': ['name'],

My item's features are:

'item': ['vacancy_id', 'name', 'company_id', 'keySkills', 'compensation_from', 'compensation_to', 'area_id', 'area_regionId', 'employment', 'workSchedule', 'workExperience' ]}, 
'selected_features': ['name', 'company_id', 'keySkills', 'compensation_from', 'compensation_to', 'area_id', 'area_regionId', 'employment', 'workSchedule', 'workExperience' ],

and their types are:

"vacancy_id:token", 
"name:token_seq", 
"company_id:token", 
"description:token_seq", 
"keySkills:token_seq",
"compensation_from:float", 
"compensation_to:float", 
"currencyRate:float",
"area_id:token", 
"area_regionId:token", 
"employment:token", 
"workSchedule:token", 
"workExperience:token"

I use run_recbole to train the model: run_recbole(model='SASRecF', dataset=DATASET_NAME, config_dict=parameter_dict) At the same time, some other model(s), such as SASRec, works well.

Am I doing something wrong or is it a bug in the model? What reasons could cause this behavior?

I work with my own dataset, I created the .item and .inter "atomic files" (no .user file because I do not have any info about users - only id's)

I use recbole version 1.2.0 and Linux (Debian), without GPU (I train it on the CPU with 16 cores)

The full text of the error:

{
    "name": "RuntimeError",
    "message": "mat1 and mat2 shapes cannot be multiplied (10x640 and 768x64)",
    "stack": "---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[4], line 1
----> 1 run_recbole(model='SASRecF', dataset=DATASET_NAME, config_dict=parameter_dict)

File ~/.local/lib/python3.9/site-packages/recbole/quick_start/quick_start.py:141, in run_recbole(model, dataset, config_file_list, config_dict, saved, queue)
    138 logger.info(model)
    140 transform = construct_transform(config)
--> 141 flops = get_flops(model, dataset, config[\"device\"], logger, transform)
    142 logger.info(set_color(\"FLOPs\", \"blue\") + f\": {flops}\")
    144 # trainer loading and initialization

File ~/.local/lib/python3.9/site-packages/recbole/utils/utils.py:347, in get_flops(model, dataset, device, logger, transform, verbose)
    344 wrapper.apply(add_hooks)
    346 with torch.no_grad():
--> 347     wrapper(*inputs)
    349 def dfs_count(module: nn.Module, prefix=\"\\t\"):
    350     total_ops, total_params = module.total_ops.item(), 0

File ~/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.local/lib/python3.9/site-packages/recbole/utils/utils.py:288, in get_flops.<locals>.TracingAdapter.forward(self, interaction)
    287 def forward(self, interaction):
--> 288     return self.model.predict(interaction)

File ~/.local/lib/python3.9/site-packages/recbole/model/sequential_recommender/sasrecf.py:171, in SASRecF.predict(self, interaction)
    169 item_seq_len = interaction[self.ITEM_SEQ_LEN]
    170 test_item = interaction[self.ITEM_ID]
--> 171 seq_output = self.forward(item_seq, item_seq_len)
    172 test_item_emb = self.item_embedding(test_item)
    173 scores = torch.mul(seq_output, test_item_emb).sum(dim=1)

File ~/.local/lib/python3.9/site-packages/recbole/model/sequential_recommender/sasrecf.py:135, in SASRecF.forward(self, item_seq, item_seq_len)
    130 feature_emb = feature_table.view(
    131     table_shape[:-2] + (feat_num * embedding_size,)
    132 )
    133 input_concat = torch.cat((item_emb, feature_emb), -1)  # [B 1+field_num*H]
--> 135 input_emb = self.concat_layer(input_concat)
    136 input_emb = input_emb + position_embedding
    137 input_emb = self.LayerNorm(input_emb)

File ~/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1538, in Module._call_impl(self, *args, **kwargs)
   1535     bw_hook = hooks.BackwardHook(self, full_backward_hooks, backward_pre_hooks)
   1536     args = bw_hook.setup_input_hook(args)
-> 1538 result = forward_call(*args, **kwargs)
   1539 if _global_forward_hooks or self._forward_hooks:
   1540     for hook_id, hook in (
   1541         *_global_forward_hooks.items(),
   1542         *self._forward_hooks.items(),
   1543     ):

File ~/.local/lib/python3.9/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
    113 def forward(self, input: Tensor) -> Tensor:
--> 114     return F.linear(input, self.weight, self.bias)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (10x640 and 768x64)"
}

and also my parameter_dict:

parameter_dict = {
    'data_path': './',
    'USER_ID_FIELD': 'user_id',
    'ITEM_ID_FIELD': 'vacancy_id',
    'RATING_FIELD': 'action_type',
    'TIME_FIELD': 'timestamp',
    'user_inter_num_interval': "[10,inf)",
    'item_inter_num_interval': "[15,inf)",
    'seq_len': {'vacancy_id': 10}, 
    'MAX_ITEM_LIST_LENGTH': 10,    
    'load_col': {'inter': ['user_id', 'vacancy_id', 'action_type', 'timestamp'],
                 'user': ['user_id'],
                 'item': ['vacancy_id', 'name', 'company_id', 'keySkills', 'compensation_from', 'compensation_to', 'area_id', 'area_regionId', 'employment', 'workSchedule', 'workExperience' ]},  # не все, но на большее не хватает памяти
    'selected_features': ['name', 'company_id', 'keySkills', 'compensation_from', 'compensation_to', 'area_id', 'area_regionId', 'employment', 'workSchedule', 'workExperience' ],
#    'selected_features': ['name'], also tried only 1 feature
    'neg_sampling': None,
    'train_neg_sample_args': None,
    'worker': 16,   # на vm69
    'train_batch_size': 2048,
    'eval_batch_size': 2048,
    'epochs': 3,
    'use_gpu': False,
    'metrics': ['Recall', 'MRR'],
    'loss_type': 'CE',
    'topk': 100,
    'valid_metric': 'MRR@100',
    "stopping_step": 2, 
    'hidden_size': 64,  
    'inner_size': 256, 
    'hidden_dropout_prob': 0.3,
    'attn_dropout_prob': 0.3,  
    'seed': 42,
    'eval_args': {
        'split': {'RS': [0.95, 0.03, 0.02]},  
#        'split': {'RS': [10, 0, 0]},  # also tried 10,0,0 and 9,0,1
        'group_by': 'user',
        'order': 'TO',
        'mode': 'full'}  #  labeled ??
}

Thank you for the answer(s). Sincerely yours, Arseny

The errors you're encountering with the SASRecF model in RecBole seem to stem from the way item features are being processed and integrated into the model. SASRecF is designed to concatenate item representations with item attribute representations as inputs to the model. This process involves several hyperparameters and configurations that must align with the structure and content of your dataset.

The RuntimeError: mat1 and mat2 shapes cannot be multiplied suggests a mismatch in dimensions between the data provided to the model and the model's expected input sizes. When you change the selected_features list, the dimensions of the inputs change, hence the variation in error messages.

Key Points from the Documentation:

Hyperparameters: The model has several important hyperparameters like hidden_size, n_layers, n_heads, inner_size, and selected_features. The hidden_size hyperparameter, for example, defines the number of features in the hidden state and also serves as the initial embedding size of items, which by default is 64.
Feature Handling: The selected_features parameter controls which item context information is used. This parameter must include names that match fields in your dataset. The documentation highlights the importance of ensuring that these features must be present in the dataset and properly loaded by the data module in RecBole.
Model Implementation: The source code shows the model's handling of item and item feature embeddings. The forward function in the model's source code outlines how item embeddings and feature embeddings (both sparse and dense) are concatenated and processed. This includes applying a linear transformation (concat_layer) to match the expected dimensionality for further processing within the model.

Addressing Your Issue:

The dimensions of the embeddings (hidden_size) and the number of selected features (selected_features) directly influence the input dimensionality to the model. Adjusting these parameters affects the model's ability to process your data correctly.
For float-type features, the error you encountered (RuntimeError: torch.cat(): expected a non-empty list of Tensors) indicates that there might be an issue with how these features are being processed or concatenated. It's crucial to ensure that float features are correctly represented and included in the feature embeddings.
Given the detail that the error changes when adjusting the selected_features list, and considering the error doesn't occur with only token-type features, it's possible that the handling or representation of float-type features in the feature embeddings might be the root cause of the issue.

Suggested Steps:

Review Feature Selection: Ensure that all selected features are correctly defined in your dataset and correspond to the correct feature types (e.g., token, float).
Adjust Hyperparameters: Consider adjusting the hidden_size and inner_size parameters based on the dimensions of your input data and the number of features you're including.
Check Feature Processing: Ensure that float features are correctly processed and represented before being fed into the model. This might involve looking at how these features are embedded and concatenated with other feature embeddings.

For more detailed insights into configuring and running SASRecF, including hyperparameter settings and model usage, please refer to the official documentation provided by RecBole.https://www.recbole.io/docs/user_guide/model/sequential/sasrecf.html

RUCAIBox / RecBole

[🐛BUG] mat1 and mat2 shapes cannot be multiplied in SASRecF #1983