[PluginV2DynamicExt]: could not find any supported formats consistent with input/output data types)

shhn1 commented 1 year ago

Description

Environment

TensorRT Version: 8.5.1 NVIDIA GPU: A100

I used https://github.com/NVIDIA/TensorRT/blob/main/demo/BERT/builder.py to build int8 PTQ tinybert. But the skipLayerNormPlugin always go wrong.

[03/25/2023-08:47:30] [TRT] [E] 9: [pluginV2Builder.cpp::reportPluginError::23] Error Code 9: Internal Error ((Unnamed Layer* 3) [PluginV2DynamicExt]: could not find any supported formats consistent with input/output data types)

I have no idea about how to slove this situation. Can anybody help me?

ps: I have tried build tinybert int8 PTQ with trtexec command. However, I found that the perf of int8 PTQ engine is same with FP32, which is much slower than FP16. It seems that int8 PTQ engine didn't use int8 tensor core. Why does this happen? I will be very grateful if anyone can answer my doubts.

zerollzeng commented 1 year ago

IIRC the accuracy is bad when you use INT8 PTQ for BERT, so TRT will use FP16/FP32 internally even if specify INT8(you can check the verbose output, "Engine Layer Information" section).

@nvpohanh should know more about this ^ ^

shhn1 commented 1 year ago

Thank you very much for your reply. Maybe I should try QAT.

nvpohanh commented 1 year ago

I used https://github.com/NVIDIA/TensorRT/blob/main/demo/BERT/builder.py to build int8 PTQ tinybert. But the skipLayerNormPlugin always go wrong.

Could you share your full command? That demoBERT script is supposed to work even with INT8 PTQ. cc @ttyio

shhn1 commented 1 year ago

python builder.py -pt /tinybert_4l/pytorch_model.bin -o /tinybert_4l/tinybert_4l_int8.engine -b 256 -s 256 -c /tinybert_4l/ --int8 --fp16 --strict --squad-json calib/test_dataset.txt -v /tinybert_4l/vocab.txt --calib-num 1000 -iln -imh

I rewrote the Calibrator class to use my own dataset. The command works when building FP16 model. However, when I build the engine with INT8 PTQ, it failed. @nvpohanh

nvpohanh commented 1 year ago

Could you add --verbose flag and share the stdout/stderr logs?

shhn1 commented 1 year ago

The following is the log：

2023-03-27 07:32:24.330427: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term [03/27/2023-07:32:25] [TRT] [I] Using configuration file: /tinybert_4l/bert_config.json [03/27/2023-07:32:26] [TRT] [I] Found 74 entries in weight map [03/27/2023-07:32:26] [TRT] [I] [MemUsageChange] Init CUDA: CPU +316, GPU +0, now: CPU 931, GPU 3708 (MiB) [03/27/2023-07:32:26] [TRT] [V] Trying to load shared library libnvinfer_builder_resource.so.8.5.1 [03/27/2023-07:32:26] [TRT] [V] Loaded shared library libnvinfer_builder_resource.so.8.5.1 [03/27/2023-07:32:28] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +551, GPU +152, now: CPU 1537, GPU 3860 (MiB) [03/27/2023-07:32:28] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See CUDA_MODULE_LOADING in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars builder.py:448: DeprecationWarning: Use set_memory_pool_limit instead. builder_config.max_workspace_size = workspace_size (1024 1024) [03/27/2023-07:32:30] [TRT] [V] Setting dynamic range for l0_gelu to [-10,10] [03/27/2023-07:32:30] [TRT] [V] Setting dynamic range for l1_gelu to [-10,10] [03/27/2023-07:32:30] [TRT] [V] Setting dynamic range for l2_gelu to [-10,10] [03/27/2023-07:32:30] [TRT] [V] Setting dynamic range for l3_gelu to [-10,10] builder.py:515: DeprecationWarning: Use build_serialized_network instead. engine = builder.build_engine(network, builder_config) [03/27/2023-07:32:31] [TRT] [V] Original: 84 layers [03/27/2023-07:32:31] [TRT] [V] After dead-layer removal: 84 layers [03/27/2023-07:32:31] [TRT] [V] After Myelin optimization: 84 layers [03/27/2023-07:32:31] [TRT] [V] Running: ActivationToPointwiseConversion on (Unnamed Layer 18) [Activation] [03/27/2023-07:32:31] [TRT] [V] Swap the layer type of (Unnamed Layer 18) [Activation] from ACTIVATION to POINTWISE [03/27/2023-07:32:31] [TRT] [V] Running: ActivationToPointwiseConversion on (Unnamed Layer 38) [Activation] [03/27/2023-07:32:31] [TRT] [V] Swap the layer type of (Unnamed Layer 38) [Activation] from ACTIVATION to POINTWISE [03/27/2023-07:32:31] [TRT] [V] Running: ActivationToPointwiseConversion on (Unnamed Layer 58) [Activation] [03/27/2023-07:32:31] [TRT] [V] Swap the layer type of (Unnamed Layer 58) [Activation] from ACTIVATION to POINTWISE [03/27/2023-07:32:31] [TRT] [V] Running: ActivationToPointwiseConversion on (Unnamed Layer 78) [Activation] [03/27/2023-07:32:31] [TRT] [V] Swap the layer type of (Unnamed Layer 78) [Activation] from ACTIVATION to POINTWISE [03/27/2023-07:32:31] [TRT] [V] After final dead-layer removal: 84 layers [03/27/2023-07:32:31] [TRT] [V] After vertical fusions: 84 layers [03/27/2023-07:32:31] [TRT] [V] After final dead-layer removal: 84 layers [03/27/2023-07:32:31] [TRT] [V] After slice removal: 84 layers [03/27/2023-07:32:31] [TRT] [V] After concat removal: 84 layers [03/27/2023-07:32:31] [TRT] [V] After tensor merging: 84 layers [03/27/2023-07:32:31] [TRT] [V] Trying to split Reshape and strided tensor [03/27/2023-07:32:31] [TRT] [V] Trying to load shared library libcublas.so.11 [03/27/2023-07:32:31] [TRT] [V] Loaded shared library libcublas.so.11 [03/27/2023-07:32:31] [TRT] [V] Using cublas as plugin tactic source [03/27/2023-07:32:31] [TRT] [V] Trying to load shared library libcublasLt.so.11 [03/27/2023-07:32:31] [TRT] [V] Loaded shared library libcublasLt.so.11 [03/27/2023-07:32:31] [TRT] [V] Using cublasLt as core library tactic source [03/27/2023-07:32:31] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +850, GPU +358, now: CPU 2803, GPU 4604 (MiB) [03/27/2023-07:32:31] [TRT] [I] Timing cache disabled. Turning it on will improve builder speed. [03/27/2023-07:32:31] [TRT] [W] Calibration Profile is not defined. Calibrating with Profile 0 [03/27/2023-07:32:31] [TRT] [V] Constructing calibration profile. [03/27/2023-07:32:31] [TRT] [E] 9: [pluginV2Builder.cpp::reportPluginError::23] Error Code 9: Internal Error ((Unnamed Layer* 3) [PluginV2DynamicExt]: could not find any supported formats consistent with input/output data types) [03/27/2023-07:32:31] [TRT] [I] build engine in 1.015 Sec Traceback (most recent call last): File "builder.py", line 620, in main() File "builder.py", line 611, in main with build_engine(args.batch_size, args.workspace_size, args.sequence_length, config, weights_dict, args.squad_json, args.vocab_file, calib_cache, args.calib_num, args.verbose) as engine: AttributeError: enter

nvpohanh commented 1 year ago

Looks like it failed to build the engine for calibration. @zerollzeng Could you try to repro and create an internal tracker? Thanks

++ @ttyio @rajeevsrao Anything is wrong with the command to run demoBERT INT8-PTQ?

ttyio commented 1 year ago

Hi @shhn1 , The command line looks good to me, the failed line number show that you are running on a modified script, you failed in line 620 but there are total 565 lines in https://github.com/NVIDIA/TensorRT/blob/main/demo/BERT/builder.py.

Also if there are 3 shuffle at begin of the network to transpose the batch and sequence, then it is embLayernorm instead of skipLayerNormPlugin for the failed ((Unnamed Layer* 3) [PluginV2DynamicExt].

Could you provide a repro for @zerollzeng , thanks!

shhn1 commented 1 year ago

In the modified script builder.py, I defined the pooler layer which is not used. And I did modify part of the code from emb_layernorm function. Because I don't quite understand why the length of batch-size list is used to decide whether to add optimization_profile. My modified code is as follows:

def emb_layernorm(builder, network, config, weights_dict, builder_config, sequence_lengths, batch_sizes):
    # int8 only support some of the sequence length, we dynamic on sequence length is not allowed.
    # input_ids = network.add_input(name="input_ids", dtype=trt.int32, shape=(-1 if len(batch_sizes) > 1 else batch_sizes[0], -1 if len(sequence_lengths) > 1 else sequence_lengths[0]))
    # segment_ids = network.add_input(name="segment_ids", dtype=trt.int32, shape=(-1 if len(batch_sizes) > 1 else batch_sizes[0], -1 if len(sequence_lengths) > 1 else sequence_lengths[0]))
    # input_mask = network.add_input(name="input_mask", dtype=trt.int32, shape=(-1 if len(batch_sizes) > 1 else batch_sizes[0], -1 if len(sequence_lengths) > 1 else sequence_lengths[0]))
    input_ids = network.add_input(name="input_ids", dtype=trt.int32, shape=(-1 if batch_sizes[0] > 1 else batch_sizes[0], -1 if len(sequence_lengths) > 1 else sequence_lengths[0]))
    segment_ids = network.add_input(name="segment_ids", dtype=trt.int32, shape=(-1 if batch_sizes[0] > 1 else batch_sizes[0], -1 if len(sequence_lengths) > 1 else sequence_lengths[0]))
    input_mask = network.add_input(name="input_mask", dtype=trt.int32, shape=(-1 if batch_sizes[0] > 1 else batch_sizes[0], -1 if len(sequence_lengths) > 1 else sequence_lengths[0]))

    # Specify profiles for the batch sizes we're interested in.
    # Make sure the profile also works for all sizes not covered by the previous profile.

    # if len(sequence_lengths) > 1 or len(batch_sizes) > 1:
    #if batch_sizes[0] > 1:
    for batch_size in sorted(batch_sizes):
        if len(sequence_lengths) == 1:
            profile = builder.create_optimization_profile()
            min_shape = (1, sequence_lengths[0])
            shape = (batch_size, sequence_lengths[0])
            profile.set_shape("input_ids", min=min_shape, opt=shape, max=shape)
            profile.set_shape("segment_ids", min=min_shape, opt=shape, max=shape)
            profile.set_shape("input_mask", min=min_shape, opt=shape, max=shape)
            builder_config.add_optimization_profile(profile)
            print("set profile shape: ", shape)
        else:
            for sequence_length in sorted(sequence_lengths):
                profile = builder.create_optimization_profile()
                min_shape = (1, sequence_length)
                shape = (batch_size, sequence_length)
                profile.set_shape("input_ids", min=min_shape, opt=shape, max=shape)
                profile.set_shape("segment_ids", min=min_shape, opt=shape, max=shape)
                profile.set_shape("input_mask", min=min_shape, opt=shape, max=shape)
                builder_config.add_optimization_profile(profile)

I don't know if this part of the modification will affect it? But when build FP16 engine, there is no problem.

zerollzeng commented 1 year ago

I would also need the /tinybert_4l/pytorch_model.bin and other stuff, can you provide the full reproduce?

A screenshot of the diff

shhn1 commented 1 year ago

I would also need the /tinybert_4l/pytorch_model.bin and other stuff, can you provide the full reproduce?

A screenshot of the diff

Sorry for the late reply because I'm too busy with work. I'm afraid I can't provide you with my model, but I think a 4-layer tinybert from huggingface should be able to reproduce this problem.

lg19990925 commented 1 year ago

I met the same problem，have this problem been solved？

ttyio commented 1 year ago

@lg19990925 could you provide repro to @zerollzeng ? thanks!

ttyio commented 1 year ago

I will close this, please reopen when you have prepared a repro, thanks!

NVIDIA / TensorRT

[PluginV2DynamicExt]: could not find any supported formats consistent with input/output data types) #2812

Description

Environment