Emgu CUDA Memory consumption and performance

emgucv / emgucv

Emgu CV is a cross platform .Net wrapper to the OpenCV image processing library.

https://www.emgu.com/

Other

2.11k stars 580 forks source link

Emgu CUDA Memory consumption and performance #756

Open kleberbueno opened 2 years ago

kleberbueno commented 2 years ago

Hello folks

I apologize for any mistake. But I started using Emgu recently, I came from Python opencv. So, I have couple of question I am facing. I am using Emgu 4.5.5.4823 and installed manually cuda runtime windows using nupkg packages (Emgu.CV.runtime.windows.cuda.4.5.5.4823.nupkg and Emgu.runtime.windows.cuda.dnn.cnn.infer.8.2.0.nupkg).

When I check Emgu.CV.Cuda.CudaInvoke.HasCuda. It returns True.

Using Visual studio 2022, as soon as my Yolov4 model is loaded, memory consumption jumps from 50MB to 1.5GB. Then after1st detection memory goes up to 6GB. Is that an expected behavior?
Using .SetPreferableBackend(Emgu.CV.Dnn.Backend.Cuda) and .SetPreferableTarget(Target.Cuda) I am getting same execution time of CPU. It is taking 50ms or 0.05 seconds to process the detection. Running the same detection in python with same yolo model I get 0.001 seconds. It seems using CUDA as backend does not really have any impact. My GPU is a RTX 3070.

Is anyone facing same issue?

Thanks

emgucv commented 2 years ago

Using Visual studio 2022, as soon as my Yolov4 model is loaded, memory consumption jumps from 50MB to 1.5GB. Then after1st detection memory goes up to 6GB. Is that an expected behavior?

Is this the CPU backend or the GPU backend? Is that the memory comsumption for CPU or GPU?

When using the demo from the "libemgucv-windesktop_x64-cuda-4.5.5.4823.zip" release package (same binary as the nuget package), using Yolo v4 with CPU backend from the "XamarinForms.WPF" project. Once the project is loaded, the memory comsumption is ~1.7GB, after the Yolo v4 model is used, the memory consumption is at ~2.4 GB, after the 1st detection, memory comsumption is ~2.4 GB and remains at the same level for the rest of the detection runs.

The demo with the specific image (resolution 416x416) using Yolo 4 on CPU tooks an average of 356 milliseconds (on HP Z6 G4 with dual Xeon(R) Silver 4210 CPU) after the 1st detection (1st detection tooks about 730 milliseconds).

kleberbueno commented 2 years ago

I am using GPU as backend and all the consumption is related to GPU backend.

Please take a look at these screenshots I have attached. It reproduces step by step the issue.

Just check if it HasCuda increase memory to 1.7GB.
After loading Yolo model and setting Backend as CUDA it goes up to 1.9GB
Now you can see I have stopped the execution at the line of Detection method. At this time memory consumption is 2.1GB.
Now after Detect method of DetectionModel class runs, memory goes up to 7.0GB.

Actually, now my hands are tied. I don't know what to do. I have removed and added nutget packages again. Did not work.

Regarding the detection time. It is ok. Performance is quite good. But memory consumption is a big issue. Even I disposed all objects. It is still using 7GB.

If I take a snapshot of memory usage. There seems to be a byte matrix byte[,,,] that is causing the problem.

Sem step5