Internal Error (Assertion !n->candidateRequirements.empty() failed. no supported formats)

zhouzx17 commented 1 year ago

Description

I am compiling an UNet model (from demo/Diffusion) from ONNX to TensorRT with plugins and int8 calibration (code from here) During compilation, I am getting The following error:

[E] 2: [optimizer.cpp::getFormatRequirements::3015] Error Code 2: Internal Error (Assertion !n->candidateRequirements.empty() failed. no supported formats)
[E] 2: [builder.cpp::buildSerializedNetwork::738] Error Code 2: Internal Error (Assertion engine != nullptr failed. )

I use the demo from release/8.5, since the release/8.6 cannot work for me (this might be caused by the older version of my CUDA)

I can run the demo without int8 calibration normally, but when I use the following code in demo/Diffusion/utilities.py:56 to build the engine with the int8 calibration, the above error would occur.

int8_calibrator = Calibrator(data_loader=self._random_data_generator())
engine = engine_from_network(
        network_from_onnx_path(onnx_path),
        config=CreateConfig(
            fp16=fp16, int8=False, profiles=[p],
            calibrator=int8_calibrator,
            preview_features=preview_features
        )
)
save_engine(engine, path=self.engine_path)

def _random_data_generator(self):
    for _ in range(1000):
        sample = np.random.random(size=[2,4,64,64]).astype("float16")
        timestep = np.random.random(size=[1]).astype("float16")
        encoder_hidden_states = np.random.random(size=[2,77,768]).astype("float16")
        data = {
            "sample": sample,
            "timestep": timestep,
            "encoder_hidden_states": encoder_hidden_states
         }
        yield data

Besides the above part, I didn't modify other code in this demo.

Some other information might be help:

I successfully use the trtexec tools to convert the ONNX to TRT engine with int8 mode. But trtexec tools cannot support to use my own calibrator.
I have suspected the above error might come from the unsupported plugins for the Stable Diffusion model during the int8 calibration. But after I disable the plugin and only convert the original ONNX model before optimizing, I got the exactly same error messages as above.

Environment

TensorRT Version: 8.5.3.1

NVIDIA GPU: RTX2080/RTX3090

NVIDIA Driver Version: 510.54

CUDA Version: 11.6

CUDNN Version: 8.6.0

Operating System:

Python Version (if applicable): 3.8.10

Tensorflow Version (if applicable):

PyTorch Version (if applicable): 1.12.0+cu116

Baremetal or Container (if so, version): nvcr.io/nvidia/tensorrt:22.10-py3

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

zhouzx17 commented 1 year ago

A small mistake occurs in code in demo/Diffusion/utilities.py:56: (int8=True)

engine = engine_from_network(
        network_from_onnx_path(onnx_path),
        config=CreateConfig(
            fp16=fp16, int8=True, profiles=[p],
            calibrator=int8_calibrator,
            preview_features=preview_features
        )
)

zerollzeng commented 1 year ago

@nvpohanh @rajeevsrao ^ ^

nvpohanh commented 1 year ago

Could you provide full verbose logs? This error is usually caused by plugins not supporting INT8

zhouzx17 commented 1 year ago

@nvpohanh @zerollzeng Hi, thanks for the reply~

The following is the building log with --verbose

[I] Initializing StableDiffusion demo with TensorRT Plugins
Building TensorRT engine for test_onnx/unet_fp16.opt.onnx: test_engine_quant/unet_fp16.plan
[W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
[W] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[W] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[I]     Configuring with profiles: [Profile().add('sample', min=(2, 4, 64, 64), opt=(2, 4, 64, 64), max=(2, 4, 64, 64)).add('encoder_hidden_states', min=(2, 77, 768), opt=(2, 77, 768), max=(2, 77, 768)).add('timestep', min=[1], opt=[1], max=[1])]
[I] Building engine with configuration:
    Flags                  | [FP16, INT8]
    Engine Capability      | EngineCapability.DEFAULT
    Memory Pools           | [WORKSPACE: 7982.44 MiB]
    Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
    Calibrator             | Calibrator(<generator object Engine._random_data_generator at 0x7f1d98275e40>, BaseClass=<class 'tensorrt.tensorrt.IInt8EntropyCalibrator2'>)
[E] 2: [optimizer.cpp::getFormatRequirements::3103] Error Code 2: Internal Error (Assertion !n->candidateRequirements.empty() failed. No supported formats for /conv_in/Cast)
[E] 2: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )

Do you have idea about why /conv_in/Cast/ might not support the int8 format?

Besides, I also try to use TRT8.6 to make the int8 quantization based on the 8.5/demo, and I find a different error message:

[I] Initializing StableDiffusion demo with TensorRT Plugins
Building TensorRT engine for test_onnx/unet_fp16.opt.onnx: test_engine_quant/unet_fp16.plan
[W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[W] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
[I]     Configuring with profiles: [Profile().add('sample', min=(2, 4, 64, 64), opt=(2, 4, 64, 64), max=(2, 4, 64, 64)).add('encoder_hidden_states', min=(2, 77, 768), opt=(2, 77, 768), max=(2, 77, 768)).add('timestep', min=[1], opt=[1], max=[1])]
[I] Building engine with configuration:
    Flags                  | [FP16, INT8]
    Engine Capability      | EngineCapability.DEFAULT
    Memory Pools           | [WORKSPACE: 7982.44 MiB, TACTIC_DRAM: 7982.44 MiB]
    Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
    Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
    Calibrator             | Calibrator(<generator object Engine._random_data_generator at 0x7f042c782f20>, BaseClass=<class 'tensorrt.tensorrt.IInt8EntropyCalibrator2'>)
[W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.6.0
[W] BuilderFlag::kENABLE_TACTIC_HEURISTIC has been ignored in this builder run. This feature is only supported on Ampere and beyond.
[E] 9: GroupNormN-0: could not find any supported formats consistent with input/output data types
[E] 9: [pluginV2Builder.cpp::reportPluginError::24] Error Code 9: Internal Error (GroupNormN-0: could not find any supported formats consistent with input/output data types)

It seems that the SD plugin used in 8.5/demo does not support int8 format. Can this error be fixed by modifying the plugin kernel?

nvpohanh commented 1 year ago

Does adding --minimal-optimizations help?

I don't think the plugins in 8.5/demo work well with INT8.

zhouzx17 commented 1 year ago

@nvpohanh When using TRT8.5, even with --onnx-minimal-optimization, it gets the the exactly same error message:

[E] 2: [optimizer.cpp::getFormatRequirements::3103] Error Code 2: Internal Error (Assertion !n->candidateRequirements.empty() failed. No supported formats for /conv_in/Cast)
[E] 2: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[!] Invalid Engine. Please ensure the engine was built correctly

But when using TRT8.6 with --onnx-minimal-optimization, it works.

Even though the plugins in 8.5/demo do not work well with INT8, i'm wondering whether I can support the int8 I/O for the plugins (as here) to pass the compilation. At least, this might help to both do int8 quant for other supported operations and inference with the plugin operations (which might actually use fp16 to simulate the int8 computation).

NVIDIA / TensorRT