Failure of TensorRT 8.5.1.7 when running Transformer on GPU

ekaterinatretyak commented 1 year ago

I tried to run the Transformer model Helsinki-NLP/opus-mt-en-de converted to ONNX (without using the Optimum library) on TensorrtExecutionProvider. According to the logs, installation is successful and assert onnx_model.providers == ["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"] works.

However running inference:

inp = tokenizer.encode(text, return_tensors="pt", padding=True)
result = ort_model.generate(inp)

fails with the error below:

2023-05-24 10:45:45.822110678 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-24 10:45:45 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.

Could you kindly help with this issue

Environment

TensorRT Version: 8.5.1.7 NVIDIA GPU: Tesla T4 CUDA Version and NVIDIA Driver Version:

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Thu_Nov_18_09:45:30_PST_2021 Cuda compilation tools, release 11.5, V11.5.119 Build cuda_11.5.r11.5/compiler.30672275_0

CUDNN Version: 8.9.1.23

Python Version: 3.7

zerollzeng commented 1 year ago

2023-05-24 10:45:45.822110678 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-24 10:45:45 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.

This is a warning so it should not lead to the engine building fails. Could you please provide a full verbose log? Thanks!

zerollzeng commented 1 year ago

FYI kFASTER_DYNAMIC_SHAPES_0805 is a preview feature in TRT 8.5. you can search it in our api doc for more details

ekaterinatretyak commented 1 year ago

2023-05-24 10:45:45.822110678 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-24 10:45:45 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.

This is a warning so it should not lead to the engine building fails. Could you please provide a full verbose log? Thanks!

This is indeed a warning, but after it, the inference result is not displayed even 15 minutes after the script is run. Here's the full verbose log:

2023-05-26 07:53:42.767100: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-05-26 07:53:42.929198: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2023-05-26 07:53:50.614469616 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:53:50 WARNING] nx_tensorrt-src/onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not nati vely support INT64. Attempting to cast down to INT32. 2023-05-26 07:53:56.805284602 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:53:56 WARNING] nx_tensorrt-src/onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped 2023-05-26 07:53:59.117760047 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:53:59 WARNING] nx_tensorrt-src/onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'] /.local/lib/python3.10/site-packages/transformers/generation/utils.py:1313: UserWarning: Using max_length's default (512) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using max_new_tokens to control the maximum length of the generation. warnings.warn( 2023-05-26 07:57:12.497281161 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.497339000 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.530422164 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.530449359 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.563540657 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.563571577 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.596757391 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.596784951 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.630399311 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.630430221 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.663455934 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.663486733 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.708153525 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.708186904 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.708200032 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.708213095 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.774254833 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.774284949 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.774297878 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.774311081 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.829890708 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.829924042 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.829938365 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.829949718 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.871520852 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.871550799 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.871563854 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.871576719 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.913067515 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.913098839 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.913106877 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.913115801 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.954106631 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.954139251 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.954152394 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.954165533 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.

Did I do something wrong when trying to generate text? Result of generation isn't displayed. Kindly tell how to generate text using the Hugging Face model and methods on TensorRT execution provider. Below I give my command for exporting Helsinki-NLP/opus-mt-en-de model to onnx:

python3 -m transformers.onnx --model=Helsinki-NLP/opus-mt-en-de ./onnx_models/

and here's my script for generating inference:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_id = "./onnx_models/"   #path to the onnx model 
ort_model = ORTModelForSeq2SeqLM.from_pretrained(model_id, provider="TensorrtExecutionProvider")
tokenizer = AutoTokenizer.from_pretrained(model_id)
print(ort_model.providers)
assert ort_model.providers == ["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"]
text = "Most search engines can detect and significantly lower the ranking of plagiarism, resulting in lower positions in search results."

inp = tokenizer(text, return_tensors="pt", padding=True)
result = ort_model.generate(**inp.to(device))
decoded = tokenizer.decode(result, skip_special_tokens=True)
print(decoded)

Thanks in advance for help.

zerollzeng commented 1 year ago

I think you can try to enable the kFASTER_DYNAMIC_SHAPES_0805 feature, or it would be good if you can try our latest 8.6.1. we have many optimizations on this version.

ekaterinatretyak commented 1 year ago

I've updated TensorRT to 8.6.1 and also installed other dependencies, and you're right, it helped, and the warning about kFASTER_DYNAMIC_SHAPES_0805 is no longer output. But I still don't get any inference of the model, while assert onnx_model.providers == ["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"] works. There is the logs:

2023-05-30 19:47:01.249679766 [W:onnxruntime:Default, tensorrt_execution_provider.h:60 log] [2023-05-30 19:47:01 WARNING] external/onnx-tensorrt/onnx2trt_utils.cpp:367: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. 2023-05-30 19:47:09.993024329 [W:onnxruntime:Default, tensorrt_execution_provider.h:60 log] [2023-05-30 19:47:09 WARNING] external/onnx-tensorrt/onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped ./lib/python3.7/site-packages/transformers/generation/utils.py:1350: UserWarning: Using max_length's default (512) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using max_new_tokens to control the maximum length of the generation.

By the way, there are no problems with inference on the CUDAExecutionProvider with the same settings, the result is deplayed. But when using TensorrtExecutionProvider, the result isn't displayed. Please tell me what I'm doing wrong

zerollzeng commented 1 year ago

external/onnx-tensorrt/onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped

Perhaps cause by this, you weights exceed int32 range so it's be clamped.

ttyio commented 1 year ago

@ekaterinatretyak , is this onnxruntime issue? have you tried run using tensorrt python API? thanks!

ttyio commented 1 year ago

closing since no activity for more than 3 weeks, thanks!

NVIDIA / TensorRT

Failure of TensorRT 8.5.1.7 when running Transformer on GPU #3007

Environment