NVIDIA-Merlin / Transformers4Rec

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.
https://nvidia-merlin.github.io/Transformers4Rec/main
Apache License 2.0
1.07k stars 142 forks source link

Evaluation is incorrect because it can see the label. [BUG] #738

Closed korchi closed 10 months ago

korchi commented 12 months ago

Bug description

When trainer.evaluate() is called, the model can see all the inputs, including the targets, whose embeddings influence the all latent embeddings. I believe, that targets should be truncated to simulate the production environment.

Steps/Code to reproduce bug

  1. Take any model and any sequence from a dataset.
  2. To evaluate the model in a production-like environment split the sequence into input, target = sequence[:-1], sequence[-1] and run pred = trainer.evaluate(input_dataset).predictions[0] (on input_dataset created from the input sequence) and compute recall_simulated = recall(target, pred).
  3. Evaluate model recall_eval = trainer.evaluate(sequence)
  4. The resultrecall_eval.recall is different from recall_simulated, which shouldn't be.

Expected behavior

recall_eval.recall should return the same recall as recall_simulated

Environment details

Additional context

Find attached masking.patch file, which fixed the result discrepancy for me.

rnyak commented 11 months ago

@korchi if you are truncating the target you should use trainer.predict(), which is using n-1 inputs to predict nth item. we are not masking anything if we use .predict().

However, if you are using trainer.evaluate() , we are automatically masking the last item under the hood, so that we generate prediction result for the last item in the given input. So you dont need to truncate the input sequence if you use .evaluate().