High CPU usage with C++/Python sample, even though discrete GPU + TensorFlow is present

mikkac commented 1 year ago

Hi, I noticed that CPU usage is very high, even though I have discrete GPU and TensorFlow installed.

Here is all relevant information about my hardware:

OS: Ubuntu 22.10 x86_64 
Host: 82JU Legion 5 15ACH6H 
Kernel: 5.19.0-26-generic 
DE: GNOME 
CPU: AMD Ryzen 5 5600H with Radeon Graphics (12) @ 4.280GHz 
GPU: NVIDIA GeForce RTX 3070 Mobile / Max-Q 
GPU: AMD ATI 05:00.0 Cezanne 
Memory: 9164MiB / 13834MiB

With TensorFlow 1.14

I tried to use sample provided in the repository (both Python and C++), modified to see how the SDK works with video file (running UltAlprSdkEngine::process on each frame). Example code is available here. Basically, it's sample recognizer.cxx from this repository, but simplified (for readability) and modified to enable recognizing license plates from video. I explained how to run this code at the end of this description.

Here are logs from first ~20 seconds of run.

nvidia-smi output when running the demo:

$ nivida-smi
Sun Dec 11 22:24:20 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
| N/A   42C    P0    39W /  N/A |    928MiB /  8192MiB |     20%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2617      G   /usr/lib/xorg/Xorg                408MiB |
|    0   N/A  N/A      3411      G   /usr/bin/gnome-shell               78MiB |
|    0   N/A  N/A      6979      G   ...4/usr/lib/firefox/firefox      166MiB |
|    0   N/A  N/A      9541      G   ...mviewer/tv_bin/TeamViewer       21MiB |
|    0   N/A  N/A    242880      G   /usr/bin/nautilus                  22MiB |
|    0   N/A  N/A    254828      G   ...AAAAAAAA== --shared-files       48MiB |
|    0   N/A  N/A    260313      C   ./recognizer_video                134MiB |
+-----------------------------------------------------------------------------+

As you can see, recognizer_video is visible among GPU-associated processes. Nevertheless, CPU usage is still high:

$ top -d 10
top - 22:25:39 up 11:10,  1 user,  load average: 3,44, 1,61, 1,04
Tasks: 429 total,   1 running, 428 sleeping,   0 stopped,   0 zombie
%Cpu(s): 29,3 us,  2,2 sy,  0,0 ni, 68,4 id,  0,0 wa,  0,0 hi,  0,1 si,  0,0 st
MiB Mem :  13834,3 total,    226,0 free,   9673,8 used,   3934,6 buff/cache
MiB Swap:   2048,0 total,      0,0 free,   2048,0 used.   3643,4 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                 
 260313 miki      20   0   14,4g 858012 210736 S 342,6   6,1   4:33.85 recognizer_video                                                                                                                              
   2617 miki      20   0   26,5g 131724  79232 S   6,6   0,9   6:42.18 Xorg                                                                                                                                    
 252850 miki      20   0  109088  53120   6908 S   5,8   0,4   2:03.95 WD-TabNine                                                                                                                              
   6979 miki      20   0   20,3g 635300 186056 S   5,1   4,5  40:06.60 firefox                                                                                                                                 
   3411 miki      20   0 4540780 399920  84556 S   4,0   2,8   8:09.32 gnome-shell                                                                                                                             
  14372 miki      20   0  812532  71232  35152 S   3,2   0,5   0:56.14 terminator

Of course I checked what is the resource usage caused only by reading and displaying frame with OpenCV and it's around ~80% CPU, so there is still a lot usage caused by the SDK.

With TensorFlow 2.11

Based on information in #265 I installed TensorFlow 2 (2.11 is the latest version). I did the "trick" with satisfying ldd libultimate_alpr-sdk.so described here, but unfortunately I encountered runtime crash caused by:

file: "/home/ultimate/ultimateALPR/SDK_dev/lib/../../../ultimateBase/lib/include/ultimate_base_debug.h" 
line: "51" 
message: [UltAlprSdkEngine]Failed to match tensorflow fingerprint
recognizer: recognizer.cxx:74: int main(int, char**): Assertion `__ULTALPR_SDK_b_ret' failed.
Aborted (core dumped)

Full log is available here

⚠️ Is there anything else I can check/tweak to decrease CPU usage?

=====================================================

How to run example code

Install OpenCV. I built master from the official repository.

$ git clone https://github.com/opencv/opencv
$ cd opencv
$ mkdir build && cd build
$ cmake -GNinja -D BUILD_TIFF=ON  -DOPENCV_GENERATE_PKGCONFIG=ON  ..
$ ninja
$ sudo ninja install

Download example and modify path to video inside .cxx file

$ cd ultimateALPR-SDK/samples/c++/recognizer
$ wget https://gist.githubusercontent.com/mikkac/c4985af1a3d955dc8423140785614f62
$ # modify the path to video inside file - "cv::VideoCapture cap("/tmp/lp01_720p.mp4");"

Build & run the example

$ g++ recognizer_video.cxx -O3 -I../../../c++ -L../../../binaries/linux/x86_64 `pkg-config --cflags --libs opencv4` -lultimate_alpr-sdk -o recognizer_video
$ ./recognizer_video

Video file I used is available here.

DoubangoTelecom commented 1 year ago

Hi, Your CPU usage is high because of OpenCV. We have seen reports about high CPU usage or slow processing and it's always because of OpenCV. That's why we use our own Computer Vision lib instead of OpenCV. We use OpenCV for prototyping but never in any commercial app. It's CPU and memory hungry. On my PC with an RTX3060 and #16 cores, the benchmark app is at 150% (out of 1600%), it means 9% CPU usage. If you think the high CPU usage is because of our SDK, then you have to write a sample code reproducing the issue WITHOUT any other 3rd-party lib. You should try the benchmark app. This said, we do not support Tensorflow 2.11. The latest version supported is 2.6 (https://github.com/DoubangoTelecom/ultimateALPR-SDK/blob/master/samples/c++/README.md#migration-to-tensorflow-2x-and-cuda-11x). We'll re-open the ticket if you can provide a sample code WITHOUT OpenCV producing high CPU usage. Please also note that the CPU will be used if you enable OpenVINO.

mikkac commented 1 year ago

Hi, thanks for the response. I used OpenCV in my example code, based on official docs. However, as I pointed out, I checked CPU usage caused by OpenCV and it was ~80% CPU, so of course it should be subtracted from ~340% reported by top.

I installed TensorFlow 2.6 and it indeed helped. CPU usage dropped significantly (most of it is now used by OpenCV) and GPU memory usage increased (as expected, ~2GB vRAM allocated by the process). So in case of RTX 3070 performance issue is no longer the case.

However, I also have PC with GTX 1050 Ti (nvidia-smi output and hardware info below) and in this case, neither TF 1.14 nor TF 2.6 work well. CPU usage is still high and only ~40 MB of GPU's vRAM is allocated by the process. Are there any additional tips regarding "older" hardware? Unfortunately all of our HW on production has those GPUs...

Here is the output from nvidia-smi:

$ nvidia-smi
Tue Dec 13 12:47:46 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
|  0%   36C    P8    N/A /  75W |    116MiB /  4096MiB |      4%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1041      G   /usr/lib/xorg/Xorg                 79MiB |
|    0   N/A  N/A      1328      G   /usr/bin/gnome-shell               21MiB |
|    0   N/A  N/A      1848      G   ...mviewer/tv_bin/TeamViewer       10MiB |
+-----------------------------------------------------------------------------+

Hardware info:

OS: Ubuntu 18.04.6 LTS x86_64 
Host: H470M DS3H -CF 
Kernel: 5.4.0-122-generic 
Shell: bash 4.4.20 
Resolution: 1280x1024 
DE: GNOME 3.28.4 
WM: GNOME Shell 
CPU: Intel i5-10500 (12) @ 4.500GHz 
GPU: NVIDIA GeForce GTX 1050 Ti Memory: 11502MiB / 15924MiB

DoubangoTelecom commented 1 year ago

Could you please share the full logs?

mikkac commented 1 year ago

Sure: gtx_1050ti_tf_1_14.txt gtx_1050ti_tf_2_6.txt

Thanks

mikkac commented 1 year ago

Hi, any updates on the issue?

DoubangoTelecom commented 1 year ago

"so of course it should be subtracted from ~340% reported by top" -> You cannot subtract CPU usage percentages like that. As already explained, you must run the benchmark app without OpenCV.
"CPU usage is still high" -> high like what number?
"Are there any additional tips regarding "older" hardware? Unfortunately all of our HW on production has those GPUs..." -> there is no known issue with old hardware it's the contrary. We only added support for new GPUs a year ago. The software is tested and developed on an "old" GTX 1070 with TF14 (the one at https://github.com/DoubangoTelecom/ultimateALPR-SDK/tree/master/samples/c%2B%2B/benchmark#peformance-numbers). CPU usage on that GPU is 180% out of 800%. We have tested the SDK on almost all GTX GPUs with no known issues.

Your logs show:

*[COMPV INFO]: [UltAlprSdkEnginePrivate]recogn_tf_num_threads: 1, acceleration backend: null
*[COMPV INFO]: [UltOcrTensorflowSessionOptions] gpu_memory_alloc_max_percent = 0.100000
*[COMPV INFO]: [UltOcrTensorflowSessionOptions] Alloc session with gpu_memory_alloc_max_percent = 10%
*[COMPV INFO]: [UltOcrTensorflowSessionOptions] gpu_memory_alloc_max_percent = 0.100000
*[COMPV INFO]: [UltOcrTensorflowSessionOptions] Alloc session with gpu_memory_alloc_max_percent = 10%

... but you have a 4GB GPU, maybe that's too small. The 10% config was chosen for a 8GB GPU. Try adding to your JSON config:

{
"detect_tf_gpu_memory_alloc_max_percent": 0.4,
"pyramidal_search_tf_gpu_memory_alloc_max_percent": 0.2,
"recogn_tf_gpu_memory_alloc_max_percent": 0.4
}

DoubangoTelecom / ultimateALPR-SDK