NVIDIA-Merlin / Merlin

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.
Apache License 2.0
715 stars 111 forks source link

[QST] How to serve merlin-tensorflow model in Triton Inference Server and convert it to ONNX? #1070

Open tuanavu opened 9 months ago

tuanavu commented 9 months ago

❓ Questions & Help

Details

Hi, I have been experimenting with an existing TF2 model using the merlin-tensorflow image. This has allowed me to leverage the SOK toolkit for the SparseEmbedding Layer. Post training of the new TF2 model with SOK, I find that I need to separately export the sok_model and the tf2 model. The resulting outputs are as follows:

When I need to execute a local test prediction request, I have to load both models independently. I then call the inference_step as follows:

# Load the model
sok_model.load_pretrained_embedding_table()

tf_model = tf.saved_model.load(save_dir)

# Inference steps
@tf.function(experimental_relax_shapes=True, reduce_retracing=True)
def inference_step(inputs):
    return tf_model(sok_model(inputs, training=False), training=False)

# Call inference
res = inference_step(inputs)

Questions

Environment details

rnyak commented 9 months ago

@FDecaYed fyi.