Evaluation is incorrect because it can see the label. [BUG]

Bug description

When trainer.evaluate() is called, the model can see all the inputs, including the targets, whose embeddings influence the all latent embeddings. I believe, that targets should be truncated to simulate the production environment.

Steps/Code to reproduce bug

Take any model and any sequence from a dataset.
To evaluate the model in a production-like environment split the sequence into input, target = sequence[:-1], sequence[-1] and run pred = trainer.evaluate(input_dataset).predictions[0] (on input_dataset created from the input sequence) and compute recall_simulated = recall(target, pred).
Evaluate model recall_eval = trainer.evaluate(sequence)
The resultrecall_eval.recall is different from recall_simulated, which shouldn't be.

Expected behavior

recall_eval.recall should return the same recall as recall_simulated

Environment details

Transformers4Rec version: 23.6.0
Platform: Linux + Docker image (nvcr.io/nvidia/merlin/merlin-pytorch:23.06)
Python version: 3.8.10
Huggingface Transformers version: 4.12.0
PyTorch version (GPU?): torch==2.0.1, pytorch-lightning==2.0.4
Tensorflow version (GPU?): --

Additional context

Find attached masking.patch file, which fixed the result discrepancy for me.

NVIDIA-Merlin / Transformers4Rec

Evaluation is incorrect because it can see the label. [BUG] #738

Bug description

Steps/Code to reproduce bug

Expected behavior

Environment details

Additional context