enazoe / yolo-tensorrt

TensorRT8.Support Yolov5n,s,m,l,x .darknet -> tensorrt. Yolov4 Yolov3 use raw darknet *.weights and *.cfg fils. If the wrapper is useful to you,please Star it.
MIT License
1.18k stars 313 forks source link

Using libdetector.so from Python #83

Open IsraelLencina opened 3 years ago

IsraelLencina commented 3 years ago

Hi, i'm trying to make a wrapper to use the library from python with ctypes, the issue comes when i charge the library from python with:

from ctypes import * lib = CDLL("./build/libdetector.so", RTLD_GLOBAL)

When the line above is executed crash because there is a symbol undefined:

Traceback (most recent call last): File "", line 1, in File "pathtopython/ctypes/init.py", line 356, in init self._handle = _dlopen(self._name, mode) OSError: ./build/libdetector.so: undefined symbol: _ZN2cv3MatD1Ev

enazoe commented 3 years ago

sry, I am not test about this, but ,you could try the trt python api to load the *.engine file

IsraelLencina commented 3 years ago

I'm working around that, cause i'm interested to work from python with the library.

Using: ld -ldetector (after copy libdetector.so to /usr/lib) due to next output.

ld: warning: cannot find entry symbol _start; not setting start address ld: /lib/libdetector.so: undefined reference to cv::Mat::~Mat()' ld: /lib/libdetector.so: undefined reference tocv::Mat::operator=(cv::Mat&&)' ld: /lib/libdetector.so: undefined reference to cv::Mat::operator=(cv::Mat const&)' ld: /lib/libdetector.so: undefined reference tocv::copyMakeBorder(cv::_InputArray const&, cv::OutputArray const&, int, int, int, int, int, cv::Scalar const&)' ld: /lib/libdetector.so: undefined reference to cv::Mat::Mat(cv::Mat const&)' ld: /lib/libdetector.so: undefined reference tocv::dnn::dnn4_v20200908::blobFromImages(cv::InputArray const&, double, cv::Size, cv::Scalar_ const&, bool, bool, int)' ld: /lib/libdetector.so: undefined reference to cv::putText(cv::_InputOutputArray const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cv::Point_<int>, int, double, cv::Scalar_<double>, int, int, bool)' ld: /lib/libdetector.so: undefined reference tocv::Mat::Mat(cv::Size_, int)' ld: /lib/libdetector.so: undefined reference to cv::namedWindow(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)' ld: /lib/libdetector.so: undefined reference tocv::resize(cv::_InputArray const&, cv::OutputArray const&, cv::Size, double, double, int)' ld: /lib/libdetector.so: undefined reference to cv::imread(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)' ld: /lib/libdetector.so: undefined reference tocv::imshow(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, cv::_InputArray const&)' ld: /lib/libdetector.so: undefined reference to cv::waitKey(int)' ld: /lib/libdetector.so: undefined reference tocv::cvtColor(cv::_InputArray const&, cv::_OutputArray const&, int, int)' ld: /lib/libdetector.so: undefined reference to cv::Mat::Mat()' ld: /lib/libdetector.so: undefined reference tocv::imwrite(std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, cv::_InputArray const&, std::vector<int, std::allocator > const&)' ld: /lib/libdetector.so: undefined reference to cv::rectangle(cv::_InputOutputArray const&, cv::Rect_<int>, cv::Scalar_<double> const&, int, int, int)' ld: /lib/libdetector.so: undefined reference tocv::getTextSize(std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, double, int, int*)' ld: /lib/libdetector.so: undefined reference to `cv::Mat::copyTo(cv::_OutputArray const&) const'

Seems like find_package(OpenCV REQUIRED) is not getting correctly the OpenCV library.

Edit: Sorry, the problem that can i figure out (i'm not a ninja of CMake) is that make can't link the opencv functions to libdetector.so. Can you try to make ld -ldetector in your machine to see if this is a repository problem or of my OpenCV installation?

Edit2: i've solved the part of OpenCV, when target_link_libraries is called to generate the libdetector.so ${OpenCV_LIBS} was not present, so it cannot be called. Now i'm fighting with "ld: warning: cannot find entry symbol _start; not setting start address"

Edit3: actually las edit have solve to import with ctypes from python, so it's solved, if you want i can share the CMakeLists.txt. But i've changed a little bit the structure of the entire repository -.-' splitting the modules folder in modules (where now is located the src code) and include (where now is located the header files).

IsraelLencina commented 3 years ago

Have you done the test of charge the engine from python with the TensorRT Python API?

I'm triying that and result with:

# Imports
import tensorrt as trt
import ctypes

lib = ctypes.CDLL("build/libdetector.so")

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)

with open("configs/yolov4-kHALF-batch4.engine", "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
    engine = runtime.deserialize_cuda_engine(f.read())

Resulting in:

[TensorRT] ERROR: deserializationUtils.cpp (635) - Serialization Error in load: 0 (Serialized engine contains plugin, but no plugin factory was provided. To deserialize an engine without a factory, please use IPluginV2 instead.)
[TensorRT] ERROR: INVALID_STATE: std::exception
[TensorRT] ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.

I've tried to link the cpp in the target include but the error is still there, i thought that the problem is in that line: engine = runtime.deserialize_cuda_engine(f.read()) but i've done the test with trtexec loading the engine and linking the plugin but still persist the error.

trtexec --loadEngine=configs/yolov4-kHALF-batch12.engine --plugins=build/libdetector.so --verbose resulting in:

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=configs/yolov4-kHALF-batch12.engine --plugins=build/libdetector.so --verbose
[12/04/2020-12:14:42] [I] === Model Options ===
[12/04/2020-12:14:42] [I] Format: *
[12/04/2020-12:14:42] [I] Model: 
[12/04/2020-12:14:42] [I] Output:
[12/04/2020-12:14:42] [I] === Build Options ===
[12/04/2020-12:14:42] [I] Max batch: 1
[12/04/2020-12:14:42] [I] Workspace: 16 MiB
[12/04/2020-12:14:42] [I] minTiming: 1
[12/04/2020-12:14:42] [I] avgTiming: 8
[12/04/2020-12:14:42] [I] Precision: FP32
[12/04/2020-12:14:42] [I] Calibration: 
[12/04/2020-12:14:42] [I] Refit: Disabled
[12/04/2020-12:14:42] [I] Safe mode: Disabled
[12/04/2020-12:14:42] [I] Save engine: 
[12/04/2020-12:14:42] [I] Load engine: configs/yolov4-kHALF-batch12.engine
[12/04/2020-12:14:42] [I] Builder Cache: Enabled
[12/04/2020-12:14:42] [I] NVTX verbosity: 0
[12/04/2020-12:14:42] [I] Tactic sources: Using default tactic sources
[12/04/2020-12:14:42] [I] Input(s)s format: fp32:CHW
[12/04/2020-12:14:42] [I] Output(s)s format: fp32:CHW
[12/04/2020-12:14:42] [I] Input build shapes: model
[12/04/2020-12:14:42] [I] Input calibration shapes: model
[12/04/2020-12:14:42] [I] === System Options ===
[12/04/2020-12:14:42] [I] Device: 0
[12/04/2020-12:14:42] [I] DLACore: 
[12/04/2020-12:14:42] [I] Plugins: build/libdetector.so
[12/04/2020-12:14:42] [I] === Inference Options ===
[12/04/2020-12:14:42] [I] Batch: 1
[12/04/2020-12:14:42] [I] Input inference shapes: model
[12/04/2020-12:14:42] [I] Iterations: 10
[12/04/2020-12:14:42] [I] Duration: 3s (+ 200ms warm up)
[12/04/2020-12:14:42] [I] Sleep time: 0ms
[12/04/2020-12:14:42] [I] Streams: 1
[12/04/2020-12:14:42] [I] ExposeDMA: Disabled
[12/04/2020-12:14:42] [I] Data transfers: Enabled
[12/04/2020-12:14:42] [I] Spin-wait: Disabled
[12/04/2020-12:14:42] [I] Multithreading: Disabled
[12/04/2020-12:14:42] [I] CUDA Graph: Disabled
[12/04/2020-12:14:42] [I] Separate profiling: Disabled
[12/04/2020-12:14:42] [I] Skip inference: Disabled
[12/04/2020-12:14:42] [I] Inputs:
[12/04/2020-12:14:42] [I] === Reporting Options ===
[12/04/2020-12:14:42] [I] Verbose: Enabled
[12/04/2020-12:14:42] [I] Averages: 10 inferences
[12/04/2020-12:14:42] [I] Percentile: 99
[12/04/2020-12:14:42] [I] Dump refittable layers:Disabled
[12/04/2020-12:14:42] [I] Dump output: Disabled
[12/04/2020-12:14:42] [I] Profile: Disabled
[12/04/2020-12:14:42] [I] Export timing to JSON file: 
[12/04/2020-12:14:42] [I] Export output to JSON file: 
[12/04/2020-12:14:42] [I] Export profile to JSON file: 
[12/04/2020-12:14:42] [I] 
[12/04/2020-12:14:42] [I] === Device Information ===
[12/04/2020-12:14:42] [I] Selected Device: GeForce RTX 2060
[12/04/2020-12:14:42] [I] Compute Capability: 7.5
[12/04/2020-12:14:42] [I] SMs: 30
[12/04/2020-12:14:42] [I] Compute Clock Rate: 1.2 GHz
[12/04/2020-12:14:42] [I] Device Global Memory: 5934 MiB
[12/04/2020-12:14:42] [I] Shared Memory per SM: 64 KiB
[12/04/2020-12:14:42] [I] Memory Bus Width: 192 bits (ECC disabled)
[12/04/2020-12:14:42] [I] Memory Clock Rate: 5.501 GHz
[12/04/2020-12:14:42] [I] 
[12/04/2020-12:14:42] [V] [TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[12/04/2020-12:14:42] [V] [TRT] Registered plugin creator - ::NMS_TRT version 1
[12/04/2020-12:14:42] [V] [TRT] Registered plugin creator - ::Reorg_TRT version 1
[12/04/2020-12:14:42] [V] [TRT] Registered plugin creator - ::Region_TRT version 1
[12/04/2020-12:14:42] [V] [TRT] Registered plugin creator - ::Clip_TRT version 1
[12/04/2020-12:14:42] [V] [TRT] Registered plugin creator - ::LReLU_TRT version 1
[12/04/2020-12:14:42] [V] [TRT] Registered plugin creator - ::PriorBox_TRT version 1
[12/04/2020-12:14:42] [V] [TRT] Registered plugin creator - ::Normalize_TRT version 1
[12/04/2020-12:14:42] [V] [TRT] Registered plugin creator - ::RPROI_TRT version 1
[12/04/2020-12:14:42] [V] [TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[12/04/2020-12:14:42] [V] [TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[12/04/2020-12:14:42] [V] [TRT] Registered plugin creator - ::FlattenConcat_TRT version 1
[12/04/2020-12:14:42] [V] [TRT] Registered plugin creator - ::CropAndResize version 1
[12/04/2020-12:14:42] [V] [TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[12/04/2020-12:14:42] [V] [TRT] Registered plugin creator - ::Proposal version 1
[12/04/2020-12:14:42] [V] [TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[12/04/2020-12:14:42] [V] [TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[12/04/2020-12:14:42] [V] [TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[12/04/2020-12:14:42] [V] [TRT] Registered plugin creator - ::Split version 1
[12/04/2020-12:14:42] [V] [TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
[12/04/2020-12:14:42] [V] [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
[12/04/2020-12:14:42] [I] Loading supplied plugin library: build/libdetector.so
[12/04/2020-12:14:43] [E] [TRT] deserializationUtils.cpp (635) - Serialization Error in load: 0 (Serialized engine contains plugin, but no plugin factory was provided. To deserialize an engine without a factory, please use IPluginV2 instead.)
[12/04/2020-12:14:43] [E] [TRT] INVALID_STATE: std::exception
[12/04/2020-12:14:43] [E] [TRT] INVALID_CONFIG: Deserialize the cuda engine failed.
[12/04/2020-12:14:43] [E] Engine creation failed
[12/04/2020-12:14:43] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=configs/yolov4-kHALF-batch12.engine --plugins=build/libdetector.so --verbose
IsraelLencina commented 3 years ago

Hi, i've tried again, with the engine generated on "yolov4-scaled" branch, but it's still stucked on that situation, i think that it's because the "libdetector.so" is not founding the PluginCreator in "trt_utils.*"

IsraelLencina commented 3 years ago

Also, i've seen that, all the plugins generated to make run YoloV4 are implementing IPluginV2IOExt (MishPlugin) and MishPluginCreator (IPluginCreator), but, if i've understood good, you should implement IPluginFactory too in order to be deserialized by trtexec (what is a condition to make an engine work in Triton inference server)

stolyarchuk commented 3 years ago

Hi, i'm trying to make a wrapper to use the library from python with ctypes, the issue comes when i charge the library from python with:

from ctypes import * lib = CDLL("./build/libdetector.so", RTLD_GLOBAL)

When the line above is executed crash because there is a symbol undefined:

Traceback (most recent call last): File "", line 1, in File "pathtopython/ctypes/init.py", line 356, in init self._handle = _dlopen(self._name, mode) OSError: ./build/libdetector.so: undefined symbol: _ZN2cv3MatD1Ev

Hi. You coukd check the example (wrapper and python script) here - https://github.com/stolyarchuk/yolo-tensorrt

IsraelLencina commented 3 years ago

Hi. You coukd check the example (wrapper and python script) here - https://github.com/stolyarchuk/yolo-tensorrt

The is that you are doing same as me, i've done this work in my branch of 'enazoe' but with ctypes, what i want is the engine compatible with trtexec, which is a requirement for triton inference server.

421psh commented 3 years ago

@IsraelLencina You've done a great job. I followed your advices and managed to run Yolov5 engine in python. But I still have the same problem with Yolov4: [TensorRT] ERROR: deserializationUtils.cpp (578) - Serialization Error in load: 0 (Serialized engine contains plugin, but no plugin factory was provided. To deserialize an engine without a factory, please use IPluginV2 instead.) [TensorRT] ERROR: INVALID_STATE: std::exception [TensorRT] ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.

Do you have any progress in solving this issue?

421psh commented 3 years ago

@IsraelLencina It seems to me that Modules directory should have something like yoloplugin.cu for correct engine deserialization, but it is missed. Or maybe I think the wrong way.

IsraelLencina commented 3 years ago

I have no experience with .cu files, i've finally moved to another engine loadable from trtexec, and python API, once loaded the inference time is great, if you are interested i've done the bindings with this repo from python with ctypes, it's possible and work!.

421psh commented 3 years ago

@IsraelLencina It's great that you finally succeeded. May I ask you to provide these bindings?