Closed ekaterinatretyak closed 1 year ago
2023-05-24 10:45:45.822110678 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-24 10:45:45 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
This is a warning so it should not lead to the engine building fails. Could you please provide a full verbose log? Thanks!
FYI kFASTER_DYNAMIC_SHAPES_0805 is a preview feature in TRT 8.5. you can search it in our api doc for more details
2023-05-24 10:45:45.822110678 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-24 10:45:45 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
This is a warning so it should not lead to the engine building fails. Could you please provide a full verbose log? Thanks!
This is indeed a warning, but after it, the inference result is not displayed even 15 minutes after the script is run. Here's the full verbose log:
2023-05-26 07:53:42.767100: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-05-26 07:53:42.929198: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable
TF_ENABLE_ONEDNN_OPTS=0
. 2023-05-26 07:53:50.614469616 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:53:50 WARNING] nx_tensorrt-src/onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not nati vely support INT64. Attempting to cast down to INT32. 2023-05-26 07:53:56.805284602 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:53:56 WARNING] nx_tensorrt-src/onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped 2023-05-26 07:53:59.117760047 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:53:59 WARNING] nx_tensorrt-src/onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'] /.local/lib/python3.10/site-packages/transformers/generation/utils.py:1313: UserWarning: Usingmax_length
's default (512) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend usingmax_new_tokens
to control the maximum length of the generation. warnings.warn( 2023-05-26 07:57:12.497281161 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.497339000 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.530422164 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.530449359 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.563540657 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.563571577 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.596757391 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.596784951 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.630399311 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.630430221 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.663455934 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.663486733 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.708153525 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.708186904 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.708200032 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.708213095 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.774254833 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.774284949 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.774297878 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.774311081 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.829890708 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.829924042 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.829938365 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.829949718 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.871520852 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.871550799 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.871563854 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.871576719 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.913067515 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.913098839 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.913106877 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.913115801 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.954106631 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.954139251 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.954152394 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. 2023-05-26 07:57:12.954165533 [W:onnxruntime:Default, tensorrt_execution_provider.h:63 log] [2023-05-26 07:57:12 WARNING] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
Did I do something wrong when trying to generate text? Result of generation isn't displayed. Kindly tell how to generate text using the Hugging Face model and methods on TensorRT execution provider. Below I give my command for exporting Helsinki-NLP/opus-mt-en-de model to onnx:
python3 -m transformers.onnx --model=Helsinki-NLP/opus-mt-en-de ./onnx_models/
and here's my script for generating inference:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_id = "./onnx_models/" #path to the onnx model
ort_model = ORTModelForSeq2SeqLM.from_pretrained(model_id, provider="TensorrtExecutionProvider")
tokenizer = AutoTokenizer.from_pretrained(model_id)
print(ort_model.providers)
assert ort_model.providers == ["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"]
text = "Most search engines can detect and significantly lower the ranking of plagiarism, resulting in lower positions in search results."
inp = tokenizer(text, return_tensors="pt", padding=True)
result = ort_model.generate(**inp.to(device))
decoded = tokenizer.decode(result, skip_special_tokens=True)
print(decoded)
Thanks in advance for help.
I think you can try to enable the kFASTER_DYNAMIC_SHAPES_0805 feature, or it would be good if you can try our latest 8.6.1. we have many optimizations on this version.
I've updated TensorRT to 8.6.1 and also installed other dependencies, and you're right, it helped, and the warning about kFASTER_DYNAMIC_SHAPES_0805 is no longer output. But I still don't get any inference of the model, while assert onnx_model.providers == ["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"]
works. There is the logs:
2023-05-30 19:47:01.249679766 [W:onnxruntime:Default, tensorrt_execution_provider.h:60 log] [2023-05-30 19:47:01 WARNING] external/onnx-tensorrt/onnx2trt_utils.cpp:367: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. 2023-05-30 19:47:09.993024329 [W:onnxruntime:Default, tensorrt_execution_provider.h:60 log] [2023-05-30 19:47:09 WARNING] external/onnx-tensorrt/onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped ./lib/python3.7/site-packages/transformers/generation/utils.py:1350: UserWarning: Using max_length's default (512) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using max_new_tokens to control the maximum length of the generation.
By the way, there are no problems with inference on the CUDAExecutionProvider with the same settings, the result is deplayed. But when using TensorrtExecutionProvider, the result isn't displayed. Please tell me what I'm doing wrong
external/onnx-tensorrt/onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped
Perhaps cause by this, you weights exceed int32 range so it's be clamped.
@ekaterinatretyak , is this onnxruntime issue? have you tried run using tensorrt python API? thanks!
closing since no activity for more than 3 weeks, thanks!
I tried to run the Transformer model Helsinki-NLP/opus-mt-en-de converted to ONNX (without using the Optimum library) on TensorrtExecutionProvider. According to the logs, installation is successful and
assert onnx_model.providers == ["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"]
works.However running inference:
fails with the error below:
Could you kindly help with this issue
Environment
TensorRT Version: 8.5.1.7 NVIDIA GPU: Tesla T4 CUDA Version and NVIDIA Driver Version:
CUDNN Version: 8.9.1.23
Python Version: 3.7