NVIDIA / TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
https://nvidia.github.io/TensorRT-Model-Optimizer
Other
458 stars 30 forks source link

undefined symbol in SD FP8 workflow #34

Closed chrjxj closed 2 months ago

chrjxj commented 3 months ago

Env

Steps

1 make plugins and copy plugins folder to /workspace/examples/plugins

2 adjust script

# Assume this script is launched with the docker built from "docker" according to README.md
fp8_plugin="/workspace/examples/plugins/bin/FP8Conv2DPlugin.so"
groupNorm_plugin="/workspace/examples/plugins/bin/groupNormPlugin.so"

echo "=====>Assume the script is launched with the docker built from "docker" according to README.md. If not, please include the path to the plugins manually."
export LD_LIBRARY_PATH=/workspace/examples/plugins/prebuilt:$LD_LIBRARY_PATH

3 run FP8 workflow

./build_sdxl_8bit_engine.sh --format fp8

Issue - undefined symbol: getPluginRegistry


[07/02/2024-03:21:47] [I] Plugins: /workspace/examples/plugins/bin/groupNormPlugin.so /workspace/examples/plugins/bin/FP8Conv2DPlugin.so
[07/02/2024-03:21:47] [I] setPluginsToSerialize:
(....)
[07/02/2024-03:21:47] [I]
[07/02/2024-03:21:47] [I] TensorRT version: 10.1.0
[07/02/2024-03:21:47] [I] Loading standard plugins
[07/02/2024-03:21:47] [I] Loading supplied plugin library: /workspace/examples/plugins/bin/groupNormPlugin.so
trtexec: symbol lookup error: /workspace/examples/plugins/bin/groupNormPlugin.so: undefined symbol: getPluginRegistry

xxx:/local/TensorRT-Model-Optimizer/diffusers/quantization# ls /workspace/examples/plugins/bin/groupNormPlugin.so
/workspace/examples/plugins/bin/groupNormPlugin.so

xxx:/local/TensorRT-Model-Optimizer/diffusers/quantization# grep -n -R --include="*.so"  "getPluginRegistry" /workspace/examples/plugins/bin/
grep: /workspace/examples/plugins/bin/groupNormPlugin.so: binary file matches
grep: /workspace/examples/plugins/bin/FP8Conv2DPlugin.so: binary file matches
Edwardf0t1 commented 3 months ago

Thanks for reporting the issue.

If not using the default docker, please check that groupNormPlugin.so is built correctly and all dependent libraries are on the correct path, e.g.,

chrjxj commented 3 months ago

thanks for the note. it's useful.

/usr/local/lib/python3.10/dist-packages/tensorrt_libs the path and libs came from installation of trt-llm. ip install tensorrt-llm~=0.10 -U SD workflow doesn't depend on tensorrt-llm ...

chrjxj commented 3 months ago

Used the https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/docker/build.sh to build docker, and inside the docker docker.io/library/modelopt_examples:latest , still the same:

[07/04/2024-01:11:32] [I] TensorRT version: 10.0.1
[07/04/2024-01:11:32] [I] Loading standard plugins
[07/04/2024-01:11:32] [I] Loading supplied plugin library: /workspace/examples/plugins/bin/groupNormPlugin.so
trtexec: symbol lookup error: /workspace/examples/plugins/bin/groupNormPlugin.so: undefined symbol: getPluginRegistry

Some env


xxx:/local/TensorRT-Model-Optimizer/diffusers/quantization# echo $TRT_LIBPATH
/usr/local/lib/python3.10/dist-packages/tensorrt_libs

xxx:/local/TensorRT-Model-Optimizer/diffusers/quantization# echo $LD_LIBRARY_PATH
/usr/local/lib/python3.10/dist-packages/tensorrt_libs:/usr/local/nvidia/lib:/usr/local/nvidia/lib64

xxx:/local/TensorRT-Model-Optimizer/diffusers/quantization# ls /usr/local/lib/python3.10/dist-packages/tensorrt_libs
__init__.py  __pycache__  libfp8convkernel.so  libnvinfer.so  libnvinfer.so.10  libnvinfer_builder_resource.so.10.1.0  libnvinfer_plugin.so.10  libnvonnxparser.so.10

xxx:/local/TensorRT-Model-Optimizer/diffusers/quantization# ls /workspace/examples/plugins/bin/groupNormPlugin.so
/workspace/examples/plugins/bin/groupNormPlugin.so

xxx:/local/TensorRT-Model-Optimizer/diffusers/quantization# grep -n -R --include="*.so"  "getPluginRegistry" /workspace/examples/plugins/bin/
grep: /workspace/examples/plugins/bin/groupNormPlugin.so: binary file matches
grep: /workspace/examples/plugins/bin/FP8Conv2DPlugin.so: binary file matches

xxx:/local/TensorRT-Model-Optimizer/diffusers/quantization# grep -n -R --include="*.so"  "getPluginRegistry" /usr/local/lib/python3.10/dist-packages/tensorrt_libs
grep: /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so: binary file matches
Edwardf0t1 commented 2 months ago

Unfortunately we cannot reproduce this error. We will have a new release next Wednesday. Please stay tuned for the new version.

songh11 commented 2 months ago

Thanks for reporting the issue.

If not using the default docker, please check that groupNormPlugin.so is built correctly and all dependent libraries are on the correct path, e.g.,

  • Follow readme:

Before building, update the addresses of "TRT" and "CUDA" according to your environment in file plugins/Makefile.config

  • Check TRT_LIBPATH, e.g., in our dockerfile, TRT_LIBPATH=/usr/local/lib/python3.10/dist-packages/tensorrt_libs

I had the same problem, and set TRT_LIBPATH can solve it.

chrjxj commented 2 months ago

please close.

i guess, the root cause is: in the prebuild docker image, FP8Conv2DPlugin.so and groupNormPlugin.so in /workspace/examples/plugins/bin/ folder missed the symbol (didn't link to lib) during the build time. after i set TRT_LIBPATH=/usr/local/lib/python3.10/dist-packages/tensorrt_libs and re-build those plugs, it works fine.