❓ Questions & Help

Details

Hi, I have been experimenting with an existing TF2 model using the merlin-tensorflow image. This has allowed me to leverage the SOK toolkit for the SparseEmbedding Layer. Post training of the new TF2 model with SOK, I find that I need to separately export the sok_model and the tf2 model. The resulting outputs are as follows:

sokmodel: This results in a collection of files named `EmbeddingVariablekeys.fileandEmbeddingVariable_values.file`.
tf2 model: This exports saved_model.pb, variables files.

When I need to execute a local test prediction request, I have to load both models independently. I then call the inference_step as follows:

# Load the model
sok_model.load_pretrained_embedding_table()

tf_model = tf.saved_model.load(save_dir)

# Inference steps
@tf.function(experimental_relax_shapes=True, reduce_retracing=True)
def inference_step(inputs):
    return tf_model(sok_model(inputs, training=False), training=False)

# Call inference
res = inference_step(inputs)

Questions

Serving the Model: I'm interested in how to serve this model in AWS EKS using the Triton Inference Server. What would be the required structure? Should I treat it as an ensemble model that includes both the sok and TensorFlow 2 backends? Which would be the most suitable backend - HugeCTR, TensorFlow 2, or something else? Do you have any guides or resources that can help me with this?
Converting the Model to ONNX: According to the Hierarchical Parameter Server Demo, HugeCTR can load both the sparse and dense models and convert them to a single ONNX model. I'm wondering how I can perform a similar conversion for this merlin-tensorflow model that uses the SOK toolkit and exports both the sparse and dense model.

Environment details

Merlin version: nvcr.io/nvidia/merlin/merlin-tensorflow:23.02

NVIDIA-Merlin / Merlin

[QST] How to serve merlin-tensorflow model in Triton Inference Server and convert it to ONNX? #1070

❓ Questions & Help

Details

Questions

Environment details