NVIDIA-Merlin / models

Merlin Models is a collection of deep learning recommender system model reference implementations
https://nvidia-merlin.github.io/models/main/index.html
Apache License 2.0
262 stars 50 forks source link

[BUG] Can't reuse a loaded model saved after training #1224

Closed PaulSteffen-betclic closed 10 months ago

PaulSteffen-betclic commented 11 months ago

Bug description

Can't use a loaded model, saved after using .fit() method.

Steps/Code to reproduce bug

import pandas as pd
import tensorflow as tf

import nvtabular as nvt
import merlin.models.tf as mm
from merlin.schema.tags import Tags
from merlin.models.utils.dataset import unique_by_tag

interactions_df = pd.DataFrame({
   'CustomerIdCat': [42, 76],
   'ItemIdCat': [1, 2],
   'ItemFeature1': [3, 3],
   'ItemFeature2': [72, 15]
})

items_df = pd.DataFrame({
   'ItemIdCat': [2, 2],
   'ItemFeature1': [3, 3],
   'ItemFeature2': [15, 15]
})

train = nvt.Dataset(interactions_df)
item_candidates = nvt.Dataset(items_df, schema=schema.select_by_tag(Tags.ITEM))

train_retrieval_loader = Loader(train, schema=train.schema, batch_size=1024)

tower_dim = 8

# create user schema using USER tag
user_schema = schema.select_by_tag(Tags.USER_ID)
# create user (query) tower input block
user_inputs = mm.InputBlockV2(user_schema)
# create user (query) encoder block
query_tower = mm.Encoder(user_inputs, mm.MLPBlock([16, tower_dim], no_activation_last_layer=True))

# create item schema using ITEM tag
item_schema = schema.select_by_tag(Tags.ITEM)
# create item (candidate) tower input block
item_inputs = mm.InputBlockV2(item_schema)
# create item (candidate) encoder block
candidate_tower = mm.Encoder(item_inputs, mm.MLPBlock([16, tower_dim], no_activation_last_layer=True))

retrieval_model = mm.TwoTowerModelV2(query_tower, candidate_tower)

with tf.device('/cpu:0'):
    retrieval_model.compile(optimizer="adam", run_eagerly=True, metrics=[mm.RecallAt(10), mm.NDCGAt(10)])
    retrieval_model.fit(train_retrieval_loader, epochs=1, batch_size=1024)

retrieval_model.save("dir_models/two_tower")

loaded_model = tf.keras.models.load_model("dir_models/two_tower")

candidate_features = unique_by_tag(item_candidates, Tags.ITEM, Tags.ITEM_ID)

topk_model = loaded_model.to_top_k_encoder(candidate_features, k=10, batch_size=128)

First, a warning is logged when saving the retrieval model:

image

Then, an error occured when using the loaded model:

image

Expected behavior

Use the loaded model to create a top k encoder.

Environment details

rnyak commented 11 months ago

@PaulSteffen-betclic please pull the latest branches. there was a recent fix. if you are using our merlin-tensorflow:23.08 image, you need to do

cd /models
git pull origin main
pip install .
PaulSteffen-betclic commented 11 months ago

@PaulSteffen-betclic please pull the latest branches. there was a recent fix. if you are using our merlin-tensorflow:23.08 image, you need to do

cd /models
git pull origin main
pip install .

I'm currently using the version 23.8.0+5.g16d289a77, which include this recent fix

PaulSteffen-betclic commented 11 months ago

@PaulSteffen-betclic please pull the latest branches. there was a recent fix. if you are using our merlin-tensorflow:23.08 image, you need to do

cd /models
git pull origin main
pip install .

I also tried to use the latest branches, using the merlin-tensorflow:nightly image (23.08 do not work on macOS) and always the same error.

rnyak commented 11 months ago

@PaulSteffen-betclic after you load the model can you please do this step before you convert it to topk_encoder model?

loaded_model = tf.keras.models.load_model(path)
     # this is necessary when re-loading the model, before building the top K
  _ = loaded_model(mm.sample_batch(dataset, batch_size=128, include_targets=False))
PaulSteffen-betclic commented 10 months ago

@PaulSteffen-betclic after you load the model can you please do this step before you convert it to topk_encoder model?

loaded_model = tf.keras.models.load_model(path)
     # this is necessary when re-loading the model, before building the top K
  _ = loaded_model(mm.sample_batch(dataset, batch_size=128, include_targets=False))

It fix this issue ! Thanks !