NVIDIA-Merlin / models

Merlin Models is a collection of deep learning recommender system model reference implementations
https://nvidia-merlin.github.io/models/main/index.html
Apache License 2.0
248 stars 50 forks source link

[BUG] Ranking model predict constant #1228

Open PaulSteffen-betclic opened 7 months ago

PaulSteffen-betclic commented 7 months ago

Bug description

After a training which seems to be ok, the ranking model predict constant.

Steps/Code to reproduce bug

import nvtabular as nvt
import merlin.models.tf as mm
import merlin.io
from merlin.models.tf.transforms.negative_sampling import InBatchNegatives

output_path = "data/processed"
processed_train = nvt.Dataset(f"{output_path}/interactions/train/*.parquet")
processed_valid = nvt.Dataset(f"{output_path}/interactions/valid/*.parquet")

n_per_positive = 12
add_negatives = InBatchNegatives(processed_train.schema, n_per_positive, seed=42, prep_features=True, run_when_testing=True)

train_ranking_loader = Loader(processed_train, schema=schema, batch_size=batch_size, shuffle=True)
valid_ranking_loader = Loader(processed_valid, schema=schema, batch_size=batch_size, shuffle=True)

model = mm.DLRMModel(
    processed_train.schema,
    embedding_dim=64,
    bottom_block=mm.MLPBlock([128, 64]),
    top_block=mm.MLPBlock([64, 128, 512]),
    prediction_tasks=mm.BinaryClassificationTask("Click"),
)

compile_args = {
    "optimizer": tf.keras.optimizers.legacy.Adam(learning_rate=learning_rate),
    "run_eagerly": False,
    "metrics": [mm.RecallAt(10), mm.NDCGAt(10)],
    "weighted_metrics": [tf.keras.metrics.BinaryAccuracy(),tf.keras.metrics.AUC()]
}

model.compile(**compile_args)
model.fit(train_ranking_loader.map(add_negatives),              
          validation_data=valid_ranking_loader.map(add_negatives), 
          class_weights={0: 1, 1: n_per_positive}, 
          epochs=5)

This code produce the following output:

image

But when I try to predict with this model ranking_scores = model.batch_predict(potential_interactions_loader, batch_size=1024), I have the following warning message:

image

& the prediction is constant:

image

I'm asking if it's due to the 2nd warning message during prediction.

N.B: it's not due to potential_interactions_loader because I obtain the same kind of issue trying to predict with valid_ranking_loader.

Expected behavior

Get probability of click, obtained in the past but impossible to reproduce without identified reason.

Environment details

notebook is run in a container from the following nightly image available here: nvcr.io/nvidia/merlin/merlin-tensorflow:nightly

in which the last version of merlin models is pulled.

Thanks.