Closed gabrielspmoreira closed 1 year ago
https://nvidia-merlin.github.io/Transformers4Rec/review/pr-721
@gabrielspmoreira the test_stochastic_swap_noise_with_tabular_features
is failing. this test is floppy. can we increase the delta abs
in the assert of the test to avoid any fails?
Fixes #693
Goals :soccer:
Trainer.evaluate()
orTrainer.predict()
on large datasetsImplementation Details :construction:
Trainer.evaluate()
orTrainer.predict()
methods both call theTrainer.evaluation_loop()
method. Besides computing metrics, it also keeps accumulating thepredictions
(batch size, item cardinality) andlabels
(batch size) tensors as batches are processed.predictions
become too large very quickly and leads to OOM on CUDA. IfT4RecTrainingArguments.eval_accumulation_steps
is set, each N steps the accumulated predictions/labels will be moved from the GPU memory to CPU larger memory. But then you might get OOM on CPU too after some steps. For example, for a batch size of 1024 and item cardinality of 300000, each batch would require 1.14 GB of memory, and be accumulated until OOM occurs.T4RecTrainingArguments.predict_top_k
argument had been created since the former versions of the library. It limits predictions to accumulate only the topk predictions in memory or in the resulting tensor when you usetrainer.predict()
. But it was set to None by default, which means that the user would have to be aware of it and set it manually to avoid these issues when evaluating and predicting on large datasets. Note: Thepredict_top_k
does not affect the metrics calculation, but does affects the memory consumption oftrainer.evaluate()
.T4RecTrainingArguments.predict_top_k = 100
as a default value that will avoid OOM issues in most cases, and still returns a reasonable number of top-k predictions inmodel.predict()
.IMPORTANT: This changes the default output of
trainer.predict()
API, that returns aPredictionOutput
object with apredictions
property. Before this change, when thepredict_top_k
option was not set (default) thepredictions
property was as 2D tensor (batch size, item cardinality) with the scores for all the items. As now we setT4RecTrainingArguments.predict_top_k
by default, thepredictions
property returns a tuple with (top-100 predicted item ids, top-100 prediction scores) .Trainer.evaluation_loop()
to fix and clarify the interplay between theT4RecTrainingArguments.predict_top_k
option and themodel.top_k
property. Themodel.top_k
was created recently to allow for the model to return top-k prediction scores/item ids instead of all items, in order to serve T4Rec models more efficiently in Triton.model.top_k
only returns the top-k items in inference mode (when not training or evaluating), which is the case for both Triton inference and whentrainer.predict()
. So, settingmodel.top_k
limits to that amount the number of predictions we can get intrainer.predict()
. For that reason, we raise now an exception ifT4RecTrainingArguments.predict_top_k > model.top_k
Testing Details :mag:
test_trainer_predict_top_k_x_top_k
to check for all possible combinations betweenT4RecTrainingArguments.predict_top_k
andmodel.top_k
valuestest_trainer_predict_topk
to test if the number of top-k preds matches theT4RecTrainingArguments.predict_top_k
and also to ensure the test breaks if the default value changes from100