NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.84k stars 2.14k forks source link

[BERT] Build Engine Failure on Nvidia Jetson Ampere GPUs #4049

Open JoAnn0812 opened 3 months ago

JoAnn0812 commented 3 months ago

I tried to run model Bert on Jetson, Ampere GPU for evaluating PTQ (post-training quantization) Int8 accuracy using SQuAD dataset , but it fails with the error below during building the engine:

WARNING:tensorflow:From /home/ecnd/TensorRT/demo/BERT/bert_test_env/lib/python3.8/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term [08/05/2024-00:36:17] [TRT] [I] Using configuration file: models/fine-tuned/bert_tf_ckpt_large_qa_squad2_amp_128_v19.03.1/bert_config.json [08/05/2024-00:36:17] [TRT] [I] Found 394 entries in weight map [08/05/2024-00:36:22] [TRT] [E] Could not convert non-contiguous NumPy array to Weights. Please use numpy.ascontiguousarray() to fix this. [08/05/2024-00:36:23] [TRT] [I] [MemUsageChange] Init CUDA: CPU +215, GPU +0, now: CPU 2813, GPU 10432 (MiB) [08/05/2024-00:36:26] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +303, GPU +285, now: CPU 3138, GPU 10740 (MiB) builder.py:401: DeprecationWarning: Use set_memory_pool_limit instead. builder_config.max_workspace_size = workspace_size (1024 1024) builder.py:109: DeprecationWarning: Use add_matrix_multiply instead. mult_all = network.add_fully_connected(input_tensor, 3 * hidden_size, Wall, Ball) builder.py:232: DeprecationWarning: Use add_matrix_multiply instead. attention_out_fc = network.add_fully_connected(attention_heads, hidden_size, W_aout, B_aout) builder.py:247: DeprecationWarning: Use add_matrix_multiply instead. mid_dense = network.add_fully_connected(attention_ln, config.intermediate_size, W_mid, B_mid) builder.py:292: DeprecationWarning: Use add_matrix_multiply instead. out_dense = network.add_fully_connected(intermediate_act, hidden_size, W_lout, B_lout) Traceback (most recent call last): File "builder.py", line 553, in main() File "builder.py", line 544, in main with build_engine(args.batch_size, args.workspace_size, args.sequence_length, config, weights_dict, args.squad_json, args.vocab_file, calib_cache, args.calib_num) as engine: File "builder.py", line 441, in build_engine bert_out = bert_model(config, weights_dict, network, embeddings, mask_idx) File "builder.py", line 312, in bert_model out_layer = transformer_layer_opt(ss, config, init_dict, network, prev_input, input_mask) File "builder.py", line 211, in transformer_layer_opt context_transposed = attention_layeropt(prefix + "attention", config, init_dict, network, input_tensor, imask) File "builder.py", line 102, in attention_layer_opt Wall = init_dict[prefix + WQKV] KeyError: 'l1_attention_self_qkv_kernel'

JoAnn0812 commented 3 months ago

How can I resolve this issue? thanks

lix19937 commented 3 months ago

You should be careful here

File "builder.py", line 102, in attention_layer_opt
Wall = init_dict[prefix + WQKV]
KeyError: 'l1_attention_self_qkv_kernel'
JoAnn0812 commented 3 months ago

Thanks for reply, I am new to this TensorRT things. Can you guide on how to modify the builder.py script?

lix19937 commented 3 months ago

You can single step debugging.

ttyio commented 3 months ago

@JoAnn0812 where did you download the checkpoint? If you use are using custom checkpoint instead of the official one in this repo README, you need change the weights mapping function load_xxx in https://github.com/NVIDIA/TensorRT/blob/release/10.2/demo/BERT/builder_utils.py

JoAnn0812 commented 3 months ago

@JoAnn0812 where did you download the checkpoint? If you use are using custom checkpoint instead of the official one in this repo README, you need change the weights mapping function load_xxx in https://github.com/NVIDIA/TensorRT/blob/release/10.2/demo/BERT/builder_utils.py

I am using the official one in README 'bash scripts/download_model.sh', I even download with 'ngc registry model download-version "nvidia/bert_tf_ckpt_large_qa_squad2_amp_128:19.03.1"' but it still having the same issue. I just want to reproduce the bert benchmarking without any change to the script. Can you share with me other way to download a workable checkpoint? Thank you

krishnarajk commented 1 month ago

@JoAnn0812 Hi, did you able to run the demo in your jetson device?