MahmoudWahdan / dialog-nlu

Tensorflow and Keras implementation of the state of the art researches in Dialog System NLU
Apache License 2.0
98 stars 40 forks source link

Support TensorRT conversion and serving feature #32

Open redrussianarmy opened 3 years ago

redrussianarmy commented 3 years ago

I realized that the Tensorflow Lite does not support inference with using Nvidia GPU. I have a device of Nvidia Jetson Xavier. My current inference is made with unoptimized transformers model on GPU. It is faster than inference with TF Lite model on CPU.

After my research, I have found 2 types of model optimization such as TensorRT or TF-TRT. I have made some trials to achieve the conversion of fine-tuned transformers model to TensorRT but I could not achieve. It would be better if the dialog-nlu supports TensorRT conversion and serving feature.

MahmoudWahdan commented 3 years ago

Hi @redrussianarmy Thank you for sharing your experience. I'll give it a try and let you know.

Tflite doesn't support serving on PC GPUs, but supports mobile GPUs. I don't know if it supports all edge devices GPUs or not.

One question that came to my mind: Did you try mixing transformers with layer_pruning feature and tflite conversion with hybrid_quantization?

k_layers_to_prune = 4 # try different values
config = {
...
...
    "layer_pruning": {
        "strategy": "top",
        "k": k_layers_to_prune
    }
}

nlu = TransformerNLU.from_config(config)
nlu.train(train_dataset, val_dataset, epochs, batch_size)

nlu.save(save_path, save_tflite=True, conversion_mode="hybrid_quantization")

nlu = TransformerNLU.load(model_path, quantized=True, num_process=4)

utterance = "add sabrina salerno to the grime instrumentals playlist"
result = nlu.predict(utterance)
redrussianarmy commented 3 years ago

Hi @MahmoudWahdan Thank you for your quick reply.

I have tried mixing transformers with layer_pruning feature and tflite conversion with hybrid_quantization as you mentioned. Unfortunately, the result is same. Prediction does not work on GPU of Nvidia Jetson Xavier.

I am looking forward to seeing new TensorRT conversion feature :)

MahmoudWahdan commented 3 years ago

Hi @redrussianarmy Sure, This is a new thing that I'll try and of course it will be useful. I'll keep you updated.