Closed shhn1 closed 1 year ago
IIRC the accuracy is bad when you use INT8 PTQ for BERT, so TRT will use FP16/FP32 internally even if specify INT8(you can check the verbose output, "Engine Layer Information" section).
@nvpohanh should know more about this ^ ^
Thank you very much for your reply. Maybe I should try QAT.
I used https://github.com/NVIDIA/TensorRT/blob/main/demo/BERT/builder.py to build int8 PTQ tinybert. But the skipLayerNormPlugin always go wrong.
Could you share your full command? That demoBERT script is supposed to work even with INT8 PTQ. cc @ttyio
python builder.py -pt /tinybert_4l/pytorch_model.bin -o /tinybert_4l/tinybert_4l_int8.engine -b 256 -s 256 -c /tinybert_4l/ --int8 --fp16 --strict --squad-json calib/test_dataset.txt -v /tinybert_4l/vocab.txt --calib-num 1000 -iln -imh
I rewrote the Calibrator class to use my own dataset. The command works when building FP16 model. However, when I build the engine with INT8 PTQ, it failed. @nvpohanh
Could you add --verbose
flag and share the stdout/stderr logs?
The following is the log:
2023-03-27 07:32:24.330427: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
[03/27/2023-07:32:25] [TRT] [I] Using configuration file: /tinybert_4l/bert_config.json
[03/27/2023-07:32:26] [TRT] [I] Found 74 entries in weight map
[03/27/2023-07:32:26] [TRT] [I] [MemUsageChange] Init CUDA: CPU +316, GPU +0, now: CPU 931, GPU 3708 (MiB)
[03/27/2023-07:32:26] [TRT] [V] Trying to load shared library libnvinfer_builder_resource.so.8.5.1
[03/27/2023-07:32:26] [TRT] [V] Loaded shared library libnvinfer_builder_resource.so.8.5.1
[03/27/2023-07:32:28] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +551, GPU +152, now: CPU 1537, GPU 3860 (MiB)
[03/27/2023-07:32:28] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See CUDA_MODULE_LOADING
in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
builder.py:448: DeprecationWarning: Use set_memory_pool_limit instead.
builder_config.max_workspace_size = workspace_size (1024 1024)
[03/27/2023-07:32:30] [TRT] [V] Setting dynamic range for l0_gelu to [-10,10]
[03/27/2023-07:32:30] [TRT] [V] Setting dynamic range for l1_gelu to [-10,10]
[03/27/2023-07:32:30] [TRT] [V] Setting dynamic range for l2_gelu to [-10,10]
[03/27/2023-07:32:30] [TRT] [V] Setting dynamic range for l3_gelu to [-10,10]
builder.py:515: DeprecationWarning: Use build_serialized_network instead.
engine = builder.build_engine(network, builder_config)
[03/27/2023-07:32:31] [TRT] [V] Original: 84 layers
[03/27/2023-07:32:31] [TRT] [V] After dead-layer removal: 84 layers
[03/27/2023-07:32:31] [TRT] [V] After Myelin optimization: 84 layers
[03/27/2023-07:32:31] [TRT] [V] Running: ActivationToPointwiseConversion on (Unnamed Layer 18) [Activation]
[03/27/2023-07:32:31] [TRT] [V] Swap the layer type of (Unnamed Layer 18) [Activation] from ACTIVATION to POINTWISE
[03/27/2023-07:32:31] [TRT] [V] Running: ActivationToPointwiseConversion on (Unnamed Layer 38) [Activation]
[03/27/2023-07:32:31] [TRT] [V] Swap the layer type of (Unnamed Layer 38) [Activation] from ACTIVATION to POINTWISE
[03/27/2023-07:32:31] [TRT] [V] Running: ActivationToPointwiseConversion on (Unnamed Layer 58) [Activation]
[03/27/2023-07:32:31] [TRT] [V] Swap the layer type of (Unnamed Layer 58) [Activation] from ACTIVATION to POINTWISE
[03/27/2023-07:32:31] [TRT] [V] Running: ActivationToPointwiseConversion on (Unnamed Layer 78) [Activation]
[03/27/2023-07:32:31] [TRT] [V] Swap the layer type of (Unnamed Layer 78) [Activation] from ACTIVATION to POINTWISE
[03/27/2023-07:32:31] [TRT] [V] After final dead-layer removal: 84 layers
[03/27/2023-07:32:31] [TRT] [V] After vertical fusions: 84 layers
[03/27/2023-07:32:31] [TRT] [V] After final dead-layer removal: 84 layers
[03/27/2023-07:32:31] [TRT] [V] After slice removal: 84 layers
[03/27/2023-07:32:31] [TRT] [V] After concat removal: 84 layers
[03/27/2023-07:32:31] [TRT] [V] After tensor merging: 84 layers
[03/27/2023-07:32:31] [TRT] [V] Trying to split Reshape and strided tensor
[03/27/2023-07:32:31] [TRT] [V] Trying to load shared library libcublas.so.11
[03/27/2023-07:32:31] [TRT] [V] Loaded shared library libcublas.so.11
[03/27/2023-07:32:31] [TRT] [V] Using cublas as plugin tactic source
[03/27/2023-07:32:31] [TRT] [V] Trying to load shared library libcublasLt.so.11
[03/27/2023-07:32:31] [TRT] [V] Loaded shared library libcublasLt.so.11
[03/27/2023-07:32:31] [TRT] [V] Using cublasLt as core library tactic source
[03/27/2023-07:32:31] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +850, GPU +358, now: CPU 2803, GPU 4604 (MiB)
[03/27/2023-07:32:31] [TRT] [I] Timing cache disabled. Turning it on will improve builder speed.
[03/27/2023-07:32:31] [TRT] [W] Calibration Profile is not defined. Calibrating with Profile 0
[03/27/2023-07:32:31] [TRT] [V] Constructing calibration profile.
[03/27/2023-07:32:31] [TRT] [E] 9: [pluginV2Builder.cpp::reportPluginError::23] Error Code 9: Internal Error ((Unnamed Layer* 3) [PluginV2DynamicExt]: could not find any supported formats consistent with input/output data types)
[03/27/2023-07:32:31] [TRT] [I] build engine in 1.015 Sec
Traceback (most recent call last):
File "builder.py", line 620, in
Looks like it failed to build the engine for calibration. @zerollzeng Could you try to repro and create an internal tracker? Thanks
++ @ttyio @rajeevsrao Anything is wrong with the command to run demoBERT INT8-PTQ?
Hi @shhn1 , The command line looks good to me, the failed line number show that you are running on a modified script, you failed in line 620 but there are total 565 lines in https://github.com/NVIDIA/TensorRT/blob/main/demo/BERT/builder.py.
Also if there are 3 shuffle at begin of the network to transpose the batch
and sequence
, then it is embLayernorm
instead of skipLayerNormPlugin
for the failed ((Unnamed Layer* 3) [PluginV2DynamicExt]
.
Could you provide a repro for @zerollzeng , thanks!
In the modified script builder.py
, I defined the pooler layer which is not used. And I did modify part of the code from emb_layernorm
function. Because I don't quite understand why the length of batch-size list is used to decide whether to add optimization_profile. My modified code is as follows:
def emb_layernorm(builder, network, config, weights_dict, builder_config, sequence_lengths, batch_sizes):
# int8 only support some of the sequence length, we dynamic on sequence length is not allowed.
# input_ids = network.add_input(name="input_ids", dtype=trt.int32, shape=(-1 if len(batch_sizes) > 1 else batch_sizes[0], -1 if len(sequence_lengths) > 1 else sequence_lengths[0]))
# segment_ids = network.add_input(name="segment_ids", dtype=trt.int32, shape=(-1 if len(batch_sizes) > 1 else batch_sizes[0], -1 if len(sequence_lengths) > 1 else sequence_lengths[0]))
# input_mask = network.add_input(name="input_mask", dtype=trt.int32, shape=(-1 if len(batch_sizes) > 1 else batch_sizes[0], -1 if len(sequence_lengths) > 1 else sequence_lengths[0]))
input_ids = network.add_input(name="input_ids", dtype=trt.int32, shape=(-1 if batch_sizes[0] > 1 else batch_sizes[0], -1 if len(sequence_lengths) > 1 else sequence_lengths[0]))
segment_ids = network.add_input(name="segment_ids", dtype=trt.int32, shape=(-1 if batch_sizes[0] > 1 else batch_sizes[0], -1 if len(sequence_lengths) > 1 else sequence_lengths[0]))
input_mask = network.add_input(name="input_mask", dtype=trt.int32, shape=(-1 if batch_sizes[0] > 1 else batch_sizes[0], -1 if len(sequence_lengths) > 1 else sequence_lengths[0]))
# Specify profiles for the batch sizes we're interested in.
# Make sure the profile also works for all sizes not covered by the previous profile.
# if len(sequence_lengths) > 1 or len(batch_sizes) > 1:
#if batch_sizes[0] > 1:
for batch_size in sorted(batch_sizes):
if len(sequence_lengths) == 1:
profile = builder.create_optimization_profile()
min_shape = (1, sequence_lengths[0])
shape = (batch_size, sequence_lengths[0])
profile.set_shape("input_ids", min=min_shape, opt=shape, max=shape)
profile.set_shape("segment_ids", min=min_shape, opt=shape, max=shape)
profile.set_shape("input_mask", min=min_shape, opt=shape, max=shape)
builder_config.add_optimization_profile(profile)
print("set profile shape: ", shape)
else:
for sequence_length in sorted(sequence_lengths):
profile = builder.create_optimization_profile()
min_shape = (1, sequence_length)
shape = (batch_size, sequence_length)
profile.set_shape("input_ids", min=min_shape, opt=shape, max=shape)
profile.set_shape("segment_ids", min=min_shape, opt=shape, max=shape)
profile.set_shape("input_mask", min=min_shape, opt=shape, max=shape)
builder_config.add_optimization_profile(profile)
I don't know if this part of the modification will affect it? But when build FP16 engine, there is no problem.
I would also need the /tinybert_4l/pytorch_model.bin
and other stuff, can you provide the full reproduce?
A screenshot of the diff
I would also need the
/tinybert_4l/pytorch_model.bin
and other stuff, can you provide the full reproduce?A screenshot of the diff
Sorry for the late reply because I'm too busy with work. I'm afraid I can't provide you with my model, but I think a 4-layer tinybert from huggingface should be able to reproduce this problem.
I met the same problem,have this problem been solved?
@lg19990925 could you provide repro to @zerollzeng ? thanks!
I will close this, please reopen when you have prepared a repro, thanks!
Description
Environment
TensorRT Version: 8.5.1 NVIDIA GPU: A100
I used
https://github.com/NVIDIA/TensorRT/blob/main/demo/BERT/builder.py
to build int8 PTQ tinybert. But the skipLayerNormPlugin always go wrong.[03/25/2023-08:47:30] [TRT] [E] 9: [pluginV2Builder.cpp::reportPluginError::23] Error Code 9: Internal Error ((Unnamed Layer* 3) [PluginV2DynamicExt]: could not find any supported formats consistent with input/output data types)
I have no idea about how to slove this situation. Can anybody help me?
ps: I have tried build tinybert int8 PTQ with trtexec command. However, I found that the perf of int8 PTQ engine is same with FP32, which is much slower than FP16. It seems that int8 PTQ engine didn't use int8 tensor core. Why does this happen? I will be very grateful if anyone can answer my doubts.