NVIDIA-Merlin / models

Merlin Models is a collection of deep learning recommender system model reference implementations
https://nvidia-merlin.github.io/models/main/index.html
Apache License 2.0
262 stars 50 forks source link

[TASK] Metrics that require predicted item ids (e.g. ItemCoverageAt, PopularityBiasAt, NoveltyAt) are not supported in model fit() and evaluate() #372

Open gabrielspmoreira opened 2 years ago

gabrielspmoreira commented 2 years ago

Bug description

Metrics that require predicted item ids (e.g. ItemCoverageAt, PopularityBiasAt, NoveltyAt) are not supported in model fit() and evaluate().

Currently, they can be used only after model.fit() is called, like the following example

The metrics cannot currently be included in the PredictionTask because they require to receive in update_state() the predicted item ids instead of the prediction scores (y_pred) and labels (y_true), as usual. Have tested the metrics locally following the example from the unit test and they do work properly when been called after model.fit() and after exporting to topk recommender. I am doing something like this:

def get_items_topk_recommender_model(dataset, schema, model, k):
    item_features = schema.select_by_tag(Tags.ITEM).column_names
    item_dataset = dataset.to_ddf()[item_features].drop_duplicates().compute()
    item_dataset = Dataset(item_dataset)
    recommender = model.to_top_k_recommender(item_dataset, k=k)
    return recommender

def compute_metrics_results(schema, model, items_frequencies, eval_dataset, eval_batch_size):
    item_id_cardinality = get_item_id_cardinality(schema)
    cutoffs = [100, 300, 500]

    additional_metrics = []
    for k in cutoffs:
        additional_metrics.extend(
            [
                # Item coverage
                ItemCoverageAt(num_unique_items=item_id_cardinality, k=k),
                # Popularity-bias
                PopularityBiasAt(
                    item_freq_probs=items_frequencies, is_prob_distribution=False, k=k
                ),
                NoveltyAt(
                    item_freq_probs=items_frequencies, is_prob_distribution=False, k=k
                ),
            ]
        )

    max_cutoff = max(cutoffs)

    topk_rec_model = get_items_topk_recommender_model(
        dataset, schema, model, k=max_cutoff
    )

    batched_dataset = tf_dataloader.BatchedDataset(
        dataset, batch_size=eval_batch_size, shuffle=False,
    )

    additional_metrics_results = dict()
    for inputs, _ in batched_dataset:
        _, top_indices = topk_rec_model(inputs)
        for metric in additional_metrics:
            metric.update_state(predicted_ids=top_indices)

    for metric in additional_metrics:
        additional_metrics_results[metric.name] = float(metric.result().numpy())

    return additional_metrics_results

Steps/Code to reproduce bug

  1. Try to pass one of the following metrics ItemCoverageAt(), PopularityBiasAt(), NoveltyAt() as metrics for any PredictionTask. It is going to raise an Exception, because typical metrics receive in update_state() prediction scores (y_pred) and labels (y_true), but these metrics require the top item ids recommended

Expected behavior

We should be able to set those metrics together with the other ranking metrics for the PredictionTask. In the future, when #368 is merged, we should be able to provide different sets of metrics for model.fit() and model.evaluate(), include these ones

rnyak commented 2 years ago

Linked to https://github.com/NVIDIA-Merlin/models/issues/350

rnyak commented 2 years ago

This should be further discussed when the evaluation framework come closer to work on.