Support TensorRT conversion and serving feature

redrussianarmy commented 3 years ago

I realized that the Tensorflow Lite does not support inference with using Nvidia GPU. I have a device of Nvidia Jetson Xavier. My current inference is made with unoptimized transformers model on GPU. It is faster than inference with TF Lite model on CPU.

After my research, I have found 2 types of model optimization such as TensorRT or TF-TRT. I have made some trials to achieve the conversion of fine-tuned transformers model to TensorRT but I could not achieve. It would be better if the dialog-nlu supports TensorRT conversion and serving feature.

MahmoudWahdan commented 3 years ago

Hi @redrussianarmy Thank you for sharing your experience. I'll give it a try and let you know.

Tflite doesn't support serving on PC GPUs, but supports mobile GPUs. I don't know if it supports all edge devices GPUs or not.

One question that came to my mind: Did you try mixing transformers with layer_pruning feature and tflite conversion with hybrid_quantization?

k_layers_to_prune = 4 # try different values
config = {
...
...
    "layer_pruning": {
        "strategy": "top",
        "k": k_layers_to_prune
    }
}

nlu = TransformerNLU.from_config(config)
nlu.train(train_dataset, val_dataset, epochs, batch_size)

nlu.save(save_path, save_tflite=True, conversion_mode="hybrid_quantization")

nlu = TransformerNLU.load(model_path, quantized=True, num_process=4)

utterance = "add sabrina salerno to the grime instrumentals playlist"
result = nlu.predict(utterance)

redrussianarmy commented 3 years ago

Hi @MahmoudWahdan Thank you for your quick reply.

I have tried mixing transformers with layer_pruning feature and tflite conversion with hybrid_quantization as you mentioned. Unfortunately, the result is same. Prediction does not work on GPU of Nvidia Jetson Xavier.

I am looking forward to seeing new TensorRT conversion feature :)

MahmoudWahdan commented 3 years ago

Hi @redrussianarmy Sure, This is a new thing that I'll try and of course it will be useful. I'll keep you updated.

MahmoudWahdan / dialog-nlu

Support TensorRT conversion and serving feature #32