NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.
Apache License 2.0
715
stars
111
forks
source link
[QST] How to serve merlin-tensorflow model in Triton Inference Server and convert it to ONNX? #1070
Hi, I have been experimenting with an existing TF2 model using the merlin-tensorflow image. This has allowed me to leverage the SOK toolkit for the SparseEmbedding Layer. Post training of the new TF2 model with SOK, I find that I need to separately export the sok_model and the tf2 model. The resulting outputs are as follows:
sokmodel: This results in a collection of files named `EmbeddingVariablekeys.fileandEmbeddingVariable_values.file`.
tf2 model: This exports saved_model.pb, variables files.
When I need to execute a local test prediction request, I have to load both models independently. I then call the inference_step as follows:
# Load the model
sok_model.load_pretrained_embedding_table()
tf_model = tf.saved_model.load(save_dir)
# Inference steps
@tf.function(experimental_relax_shapes=True, reduce_retracing=True)
def inference_step(inputs):
return tf_model(sok_model(inputs, training=False), training=False)
# Call inference
res = inference_step(inputs)
Questions
Serving the Model: I'm interested in how to serve this model in AWS EKS using the Triton Inference Server. What would be the required structure? Should I treat it as an ensemble model that includes both the sok and TensorFlow 2 backends? Which would be the most suitable backend - HugeCTR, TensorFlow 2, or something else? Do you have any guides or resources that can help me with this?
Converting the Model to ONNX: According to the Hierarchical Parameter Server Demo, HugeCTR can load both the sparse and dense models and convert them to a single ONNX model. I'm wondering how I can perform a similar conversion for this merlin-tensorflow model that uses the SOK toolkit and exports both the sparse and dense model.
❓ Questions & Help
Details
Hi, I have been experimenting with an existing TF2 model using the merlin-tensorflow image. This has allowed me to leverage the SOK toolkit for the SparseEmbedding Layer. Post training of the new TF2 model with SOK, I find that I need to separately export the sok_model and the tf2 model. The resulting outputs are as follows:
and
EmbeddingVariable_values.file`.saved_model.pb
,variables
files.When I need to execute a local test prediction request, I have to load both models independently. I then call the inference_step as follows:
Questions
Environment details
nvcr.io/nvidia/merlin/merlin-tensorflow:23.02