NVIDIA-Merlin / HugeCTR

HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
Apache License 2.0
905 stars 196 forks source link

[Question] How to serve TF2 SOK model in Triton Inference and convert it to ONNX? #422

Closed tuanavu closed 8 months ago

tuanavu commented 9 months ago

Details

I'm currently working with an existing TensorFlow 2 (TF2) model and the SparseOperationsKit. This set up allows me to utilize the SparseEmbedding Layer of the SOK toolkit. However, I've found that I have to define the sok_model and tf_model separately for training.

strategy = tf.distribute.MirroredStrategy()
with strategy.scope():   
    sok_model = SOKModel(
        dense_feature_stats=dense_feature_stats, 
        trainable_sparse_feature_vocab_dict=trainable_sparse_feature_vocab_dict, 
        pretrained_sparse_feature_info_map=pretrained_sparse_feature_info_map,
        dense_dim=dense_dim,
        sparse_dim=sparse_dim,
    )
    tf_model = TFModel(
        all_feature_names=all_feature_names,
        pretrained_sparse_feature_info_map=pretrained_sparse_feature_info_map,
        dense_dim=dense_dim,
        sparse_dim=sparse_dim,
    )

    sok_opt = sok.optimizers.Adam()
    tf_opt = tf.keras.optimizers.Adam()

After training the new TF2 model with SOK, I found that I need to export both the sok_model and the tf_model separately.

tf_model.save(save_dir)
saver = sok.Saver()
saver.dump_to_file()

The resulting outputs are as follows:

When I need to execute a local test prediction request, I have to load both models independently. I then call the inference_step as follows:

# Load the model
sok_model.load_pretrained_embedding_table()

tf_model = tf.saved_model.load(save_dir)

# Inference steps
@tf.function(experimental_relax_shapes=True, reduce_retracing=True)
def inference_step(inputs):
    return tf_model(sok_model(inputs, training=False), training=False)

# Call inference
res = inference_step(inputs)

Questions

Environment details

KingsleyLiu-NV commented 9 months ago

Hi @tuanavu, there are two solutions of deploying models trained with TF2 SOK: