PaddlePaddle / FastDeploy

⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end optimization, multi-platform and multi-framework support.
https://www.paddlepaddle.org.cn/fastdeploy
Apache License 2.0
3k stars 465 forks source link

fastdeploy编译成功,example中的yolov5编译失败 #1875

Closed hch-baobei closed 1 year ago

hch-baobei commented 1 year ago

环境

问题日志及出现问题的操作流程

-- *****FastDeploy Building Summary** -- CMake version : 3.20.0 -- CMake command : /usr/local/src/cmake-3.20.0-linux-x86_64/bin/cmake -- System : Linux -- C++ compiler : /usr/bin/c++ -- C++ compiler version : 9.4.0 -- CXX flags : -Wno-format -- EXE linker flags : -- Shared linker flags : -- Compile definitions : _GLIBCXX_USE_CXX11_ABI=1 -- CMAKE_PREFIX_PATH : -- CMAKE_INSTALL_PREFIX : /usr/local -- CMAKE_MODULE_PATH : -- -- WITH_GPU : ON -- ENABLE_ORT_BACKEND : OFF -- ENABLE_RKNPU2_BACKEND : OFF -- ENABLE_SOPHGO_BACKEND : OFF -- ENABLE_PADDLE_BACKEND : OFF -- ENABLE_POROS_BACKEND : OFF -- ENABLE_OPENVINO_BACKEND : OFF -- ENABLE_TRT_BACKEND : ON -- ENABLE_LITE_BACKEND : OFF -- ENABLE_TRT_BACKEND : ON -- ENABLE_VISION : OFF -- ENABLE_CVCUDA : OFF -- ENABLE_TEXT : OFF -- ENABLE_ENCRYPTION : OFF -- CUDA_DIRECTORY : /usr/local/cuda -- OPENCV_DIRECTORY : /usr/include/opencv4 -- DEPENDENCY_LIBS : FDLIB-NOTFOUND;/usr/lib/x86_64-linux-gnu/libcudart.so;TRT_INFER_LIB-NOTFOUND;TRT_ONNX_LIB-NOTFOUND;TRT_PLUGIN_LIB-NOTFOUND;PADDLE2ONNX_LIB-NOTFOUND -- Configuring done CMake Error: The following variables are used in this project, but they are set to NOTFOUND.

部分变量找不到,我去修改时发现其他几个都找到了,唯独没有找到 lib文件夹,我是按照 docs/cn/build_and_install/gpu.md 文档编译的,编译成功后并没有这个lib文件

hch-baobei commented 1 year ago

下载了一个预编译包,发现还要将编译好的文件 libfast*.so 等放在lib中放在代码根目录文件夹下,并且将 three_lib也放在代码根目录文件夹下。 或者编译时直接在代码根目录编译,不要新建文件夹去编译

mahesh11T commented 11 months ago

@hch-baobei, i am getting same error, how did you solve this? can you share more details?

下载了一个预编译包,发现还要将编译好的文件 libfast*.so 等放在lib中放在代码根目录文件夹下,并且将 three_lib也放在代码根目录文件夹下。 或者编译时直接在代码根目录编译,不要新建文件夹去编译

danny-zhu commented 11 months ago

我这边编译时碰到下面的错误: CMake Error: The following variables are used in this project, but they are set to NOTFOUND. Please set them or make sure they are set and tested correctly in the CMake files: TRT_INFER_LIB linked by target "fastdeploy" in directory /home/top/LLM/FastDeploy-release-1.0.0 TRT_ONNX_LIB linked by target "fastdeploy" in directory /home/top/LLM/FastDeploy-release-1.0.0 TRT_PLUGIN_LIB linked by target "fastdeploy" in directory /home/top/LLM/FastDeploy-release-1.0.0

-- Generating done CMake Generate step failed. Build files cannot be regenerated correctly.

后来发现是TRT_DIRECTORY路径填写错了导致的,改正后,可以正常编译

mahesh11T commented 11 months ago
rm -rf dist
rm -rf .setuptools-cmake-build/ 
rm -rf build

pip3 install tensort
export TRT_DIRECTORY=/usr/local/lib/python3.7/dist-packages/tensorrt

--   TRT_DRECTORY : /usr/local/lib/python3.7/dist-packages/tensorrt

the issue is  /usr/local/lib/python3.7/dist-packages/tensorrt this path does not have lib folder, which it is trying to copy, from where you got tensorrt package?
mahesh11T commented 11 months ago

@danny-zhu, thanks solved it by installing via tar method from official website.

do you know what should i modify in setup.py to cross-compile for arm?

AriannST commented 4 months ago

@hch-baobei It seems that we studied a bit the code of trt_common.py from object_detection def loadModel for TensorRT 10.0. Can you check if this def is okey? def allocate_buffers(engine): inputs = [] outputs = [] bindings = [] stream = cuda.Stream()

for i in range(engine.num_io_tensors):
    tensor_name = engine.get_tensor_name(i)
    size = trt.volume(engine.get_tensor_shape(tensor_name))
    dtype = trt.nptype(engine.get_tensor_dtype(tensor_name))

    # Allocate host and device buffers
    host_mem = cuda.pagelocked_empty(size, dtype)
    device_mem = cuda.mem_alloc(host_mem.nbytes)

    # Append the device buffer to device bindings
    bindings.append(int(device_mem))

    # Append to the appropriate list
    if engine.get_tensor_mode(tensor_name) == trt.TensorIOMode.INPUT:
        inputs.append(HostDeviceMem(host_mem, device_mem))
    else:
        outputs.append(HostDeviceMem(host_mem, device_mem))

return inputs, outputs, bindings, stream

Because with that I get the output:

File "/content/drive/MyDrive/SSLTests/ssl-detector-master/src/trt_common.py", line 143, in allocate_buffers host_mem = cuda.pagelocked_empty(size, dtype) pycuda._driver.MemoryError: cuMemHostAlloc failed: out of memory

Please give me some advice :(