NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.55k stars 2.1k forks source link

[Question] If huggingface transformer can be directly converted by using builder.py in bert demo #1048

Closed Slyne closed 2 years ago

Slyne commented 3 years ago

Description

Currently, I'm having an issue that the the engine output result I get by following bert demo is different from the tensorflow2 model result (https://huggingface.co/bert-base-chinese). The weights names are different so I changed the weights names in order to make sure the weights can be fit into bert demo.

However, I checked the output layer by layer but it turns out only the embedding(including mask, input, segment) layer result is the same, and the rest of the transformer layers are all different.

Another thing I find is that even the output of each transformer layers are different, the final results(I'm working on named entity recognition task) are similar (after argmax) but not the same.

Just wanna ask if someone has done similar experiment and can share some experience or point out the mistakes I made.

Environment

TensorRT Version: TensorRT-7.2.1.6 GPU Type: V100 Nvidia Driver Version: 450.51 CUDA Version: cuda-11 CUDNN Version: Operating System + Version: Ubuntu-18.04 Python Version (if applicable): 3.6 TensorFlow Version (if applicable): model trained in tf2.4, in bert demo docker tensorflow is 1.15 PyTorch Version (if applicable): Baremetal or Container (if container which image + tag): Built by demo suggested

Relevant Files

Steps To Reproduce

You may follow the BERT DEMO. The weights can be downloaded from https://huggingface.co/bert-base-chinese.

pommedeterresautee commented 3 years ago

Hi, I have the same issue. I tried to import both TF and onnx model (Pytorch based) in Tensorrt using master branch following instructions from TensorRT/demo/BERT/ and main README but it didn't work at layer name step.

With onnx model from https://github.com/onnx/models/blob/master/text/machine_comprehension/bert-squad/model/bertsquad-10.tar.gz:

trtuser@506f98718a5b:/workspace/TensorRT/demo/BERT$ mkdir -p engines && python3 builder.py -x models/fine-tuned/bertsquad10.onnx -o engines/bert_large_128.engine -b 1 -s 128 --fp16 -c models/fine-tuned/bert_tf_ckpt_large_qa_squad2_amp_128_v19.03.1
2021-03-15 09:28:35.777835: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
[TensorRT] INFO: Using configuration file: models/fine-tuned/bert_tf_ckpt_large_qa_squad2_amp_128_v19.03.1/bert_config.json
Encountered unknown case: mul/y:0
Traceback (most recent call last):
  File "builder.py", line 707, in <module>
    main()
  File "builder.py", line 691, in main
    weights_dict = load_onnx_weights_and_quant(args.onnx, config)
  File "builder.py", line 490, in load_onnx_weights_and_quant
    tensor_dict = dict([(onnx_to_trt_name(w.name), np.frombuffer(w.raw_data, np.float32).reshape(w.dims)) for w in weights])
  File "builder.py", line 490, in <listcomp>
    tensor_dict = dict([(onnx_to_trt_name(w.name), np.frombuffer(w.raw_data, np.float32).reshape(w.dims)) for w in weights])
  File "builder.py", line 476, in onnx_to_trt_name
    assert(False)
AssertionError

Also tried with TF and mBERT model from Google repo:

trtuser@506f98718a5b:/workspace/TensorRT/demo/BERT$ mkdir -p engines && python3 builder.py -m models/multilingual/bert_model.ckpt -o engines/bert-multilingual.engine -b 1 -s 128 --fp16 -c models/multilingual
2021-03-15 09:15:33.604962: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
[TensorRT] INFO: Using configuration file: models/multilingual/bert_config.json
[TensorRT] INFO: Found 204 entries in weight map
Traceback (most recent call last):
  File "builder.py", line 707, in <module>
    main()
  File "builder.py", line 698, in main
    with build_engine(args.batch_size, args.workspace_size, args.sequence_length, config, weights_dict, args.squad_json, args.vocab_file, calib_cache, args.calib_num) as engine:
  File "builder.py", line 612, in build_engine
    squad_logits = squad_output("cls_", config, weights_dict, network, bert_out)
  File "builder.py", line 349, in squad_output
    W_out = init_dict[prefix + SQD_W]
KeyError: 'cls_squad_output_weights'
Slyne commented 3 years ago

@pommedeterresautee I think my issue is not the same as yours. You may first try to convert the related weights names. For example, the 'cls_squad_output_weights' is only available in this demo not in the tensorflow1 bert ckpt, so you need to convert the weight or remove this weight name. I can successully convert models trained based on tensorflow1 (tf2 not work) bert by adding or deleting some weights in the checkpoints. Just wondering if there is any example by converting transformer models trained on hugging face to this BERT demo ?

pommedeterresautee commented 3 years ago

Tks, I have not found any resources making a link between things availables through Huggingface and tensorrt. Did you manually mapped each layer name for the chinese Bert?

nvpohanh commented 2 years ago

@Slyne Could you try TRT 8.2/8.4 and see if the issue still exists? If it does, we will debug it. Thanks

Slyne commented 2 years ago

Tks, I have not found any resources making a link between things availables through Huggingface and tensorrt. Did you manually mapped each layer name for the chinese Bert? Yes.

Slyne commented 2 years ago

@Slyne Could you try TRT 8.2/8.4 and see if the issue still exists? If it does, we will debug it. Thanks

Hi Pohanh, I no longer work on this and have moved all my pipeline to pytorch checkpoints. Things seem pretty good in huggingface pytorch ckpts.

nvpohanh commented 2 years ago

Sounds good. I will close this. Thanks