[Question] LSTM via Loop vs RNNv2

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

https://developer.nvidia.com/tensorrt

Apache License 2.0

10.54k stars 2.1k forks source link

[Question] LSTM via Loop vs RNNv2 #1572

Closed anxietymonger closed 2 years ago

anxietymonger commented 2 years ago

From the document I found that RNNv2 is deprecated. Since the document does not metion much about it (replacing rnnv2 layer with loop API), may I ask a few questions?

What is the motivation behind? Or, how will I benefit from it?
Is it expected with better performance (inference speed)?
Is it expected with int8 quantization support?

ttyio commented 2 years ago

Hello @o0fatigue ,

We switch to new backend for the loop based API. it is more generic and efficient.
Yes.
Only INT8 QAT now, but we will also support INT8 PTQ in the future.

Thanks!