undefined symbol in SD FP8 workflow

chrjxj commented 4 months ago

Env

Inside docker, nvcr.io/nvidia/pytorch:24.06-py3
L20 GPU, Driver Version: 550.90.07 CUDA Version: 12.4
TensorRT 10.1.0

Steps

1 make plugins and copy plugins folder to /workspace/examples/plugins

2 adjust script

# Assume this script is launched with the docker built from "docker" according to README.md
fp8_plugin="/workspace/examples/plugins/bin/FP8Conv2DPlugin.so"
groupNorm_plugin="/workspace/examples/plugins/bin/groupNormPlugin.so"

echo "=====>Assume the script is launched with the docker built from "docker" according to README.md. If not, please include the path to the plugins manually."
export LD_LIBRARY_PATH=/workspace/examples/plugins/prebuilt:$LD_LIBRARY_PATH

3 run FP8 workflow

./build_sdxl_8bit_engine.sh --format fp8

Issue - undefined symbol: getPluginRegistry


[07/02/2024-03:21:47] [I] Plugins: /workspace/examples/plugins/bin/groupNormPlugin.so /workspace/examples/plugins/bin/FP8Conv2DPlugin.so
[07/02/2024-03:21:47] [I] setPluginsToSerialize:
(....)
[07/02/2024-03:21:47] [I]
[07/02/2024-03:21:47] [I] TensorRT version: 10.1.0
[07/02/2024-03:21:47] [I] Loading standard plugins
[07/02/2024-03:21:47] [I] Loading supplied plugin library: /workspace/examples/plugins/bin/groupNormPlugin.so
trtexec: symbol lookup error: /workspace/examples/plugins/bin/groupNormPlugin.so: undefined symbol: getPluginRegistry

xxx:/local/TensorRT-Model-Optimizer/diffusers/quantization# ls /workspace/examples/plugins/bin/groupNormPlugin.so
/workspace/examples/plugins/bin/groupNormPlugin.so

xxx:/local/TensorRT-Model-Optimizer/diffusers/quantization# grep -n -R --include="*.so"  "getPluginRegistry" /workspace/examples/plugins/bin/
grep: /workspace/examples/plugins/bin/groupNormPlugin.so: binary file matches
grep: /workspace/examples/plugins/bin/FP8Conv2DPlugin.so: binary file matches

Edwardf0t1 commented 4 months ago

Thanks for reporting the issue.

If not using the default docker, please check that groupNormPlugin.so is built correctly and all dependent libraries are on the correct path, e.g.,

Follow readme:

Before building, update the addresses of "TRT" and "CUDA" according to your environment in file plugins/Makefile.config
Check TRT_LIBPATH, e.g., in our dockerfile, TRT_LIBPATH=/usr/local/lib/python3.10/dist-packages/tensorrt_libs

chrjxj commented 4 months ago

thanks for the note. it's useful.

/usr/local/lib/python3.10/dist-packages/tensorrt_libs the path and libs came from installation of trt-llm. ip install tensorrt-llm~=0.10 -U SD workflow doesn't depend on tensorrt-llm ...

chrjxj commented 4 months ago

Used the https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/docker/build.sh to build docker, and inside the docker docker.io/library/modelopt_examples:latest , still the same:

[07/04/2024-01:11:32] [I] TensorRT version: 10.0.1
[07/04/2024-01:11:32] [I] Loading standard plugins
[07/04/2024-01:11:32] [I] Loading supplied plugin library: /workspace/examples/plugins/bin/groupNormPlugin.so
trtexec: symbol lookup error: /workspace/examples/plugins/bin/groupNormPlugin.so: undefined symbol: getPluginRegistry

Some env


xxx:/local/TensorRT-Model-Optimizer/diffusers/quantization# echo $TRT_LIBPATH
/usr/local/lib/python3.10/dist-packages/tensorrt_libs

xxx:/local/TensorRT-Model-Optimizer/diffusers/quantization# echo $LD_LIBRARY_PATH
/usr/local/lib/python3.10/dist-packages/tensorrt_libs:/usr/local/nvidia/lib:/usr/local/nvidia/lib64

xxx:/local/TensorRT-Model-Optimizer/diffusers/quantization# ls /usr/local/lib/python3.10/dist-packages/tensorrt_libs
__init__.py  __pycache__  libfp8convkernel.so  libnvinfer.so  libnvinfer.so.10  libnvinfer_builder_resource.so.10.1.0  libnvinfer_plugin.so.10  libnvonnxparser.so.10

xxx:/local/TensorRT-Model-Optimizer/diffusers/quantization# ls /workspace/examples/plugins/bin/groupNormPlugin.so
/workspace/examples/plugins/bin/groupNormPlugin.so

xxx:/local/TensorRT-Model-Optimizer/diffusers/quantization# grep -n -R --include="*.so"  "getPluginRegistry" /workspace/examples/plugins/bin/
grep: /workspace/examples/plugins/bin/groupNormPlugin.so: binary file matches
grep: /workspace/examples/plugins/bin/FP8Conv2DPlugin.so: binary file matches

xxx:/local/TensorRT-Model-Optimizer/diffusers/quantization# grep -n -R --include="*.so"  "getPluginRegistry" /usr/local/lib/python3.10/dist-packages/tensorrt_libs
grep: /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so: binary file matches

Edwardf0t1 commented 4 months ago

Unfortunately we cannot reproduce this error. We will have a new release next Wednesday. Please stay tuned for the new version.

songh11 commented 4 months ago

Thanks for reporting the issue.

If not using the default docker, please check that groupNormPlugin.so is built correctly and all dependent libraries are on the correct path, e.g.,

Follow readme:

Before building, update the addresses of "TRT" and "CUDA" according to your environment in file plugins/Makefile.config

Check TRT_LIBPATH, e.g., in our dockerfile, TRT_LIBPATH=/usr/local/lib/python3.10/dist-packages/tensorrt_libs

I had the same problem, and set TRT_LIBPATH can solve it.

chrjxj commented 3 months ago

please close.

i guess, the root cause is: in the prebuild docker image, FP8Conv2DPlugin.so and groupNormPlugin.so in /workspace/examples/plugins/bin/ folder missed the symbol (didn't link to lib) during the build time. after i set TRT_LIBPATH=/usr/local/lib/python3.10/dist-packages/tensorrt_libs and re-build those plugs, it works fine.

NVIDIA / TensorRT-Model-Optimizer

undefined symbol in SD FP8 workflow #34

Env

Steps

Issue - undefined symbol: getPluginRegistry