NVIDIA-Merlin / Transformers4Rec

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.
https://nvidia-merlin.github.io/Transformers4Rec/main
Apache License 2.0
1.07k stars 142 forks source link

[BUG] CausalLanguageModeling masking error on last item only condition #762

Closed sungho-ham closed 7 months ago

sungho-ham commented 7 months ago

Description

When using clm for masking, it generates wrong masking schema. It can be checked by following simple code. In case of length 2 input, there should not be difference between last item only and all items conditions. However, there is difference between them.

import torch
from transformers4rec.torch import masking

def get_masking_info(train_on_last:bool):
    item_ids = torch.tensor([[1, 2, 0], ])
    mask = masking.CausalLanguageModeling(hidden_size=10, train_on_last_item_seq_only=train_on_last)
    masking_info = mask.compute_masked_targets(item_ids, training=True)
    return masking_info

print(get_masking_info(False))
print(get_masking_info(True))
MaskingInfo(schema=tensor([[ True, False, False]]), targets=tensor([[2, 0, 0]]))
MaskingInfo(schema=tensor([[ True,  True, False]]), targets=tensor([[2, 0, 0]]))  -> schema shoule be [ True,  True, False]

Related Code

https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/348c9636399535c566d20e8ebff2b7aa0775f136/transformers4rec/torch/masking.py#L298

I think following code is correct: mask_labels = labels != self.padding_idx

sungho-ham commented 7 months ago

I've found my suggestion was wrong and already fixed. https://github.com/NVIDIA-Merlin/Transformers4Rec/pull/723#pullrequestreview-1490574571 However, from what I understand, the results is not correct. I reported it again according to the bug reporting format.