ApolloAuto / apollo

An open autonomous driving platform
Apache License 2.0
25.08k stars 9.68k forks source link

FPS drop when using cyberRT! #11467

Open Allenhe123 opened 4 years ago

Allenhe123 commented 4 years ago

System information

issue scenario:

In a simple pipeline scene, FPS will reduce 3 frames or more after using cyberrt. Pipeline programming paradigm: three threads notify each other through conditional variables, triggering other threads to work.

Cyberrt programming paradigm: communication with each other by intra-process pub/sub. Of course, in the process we pass the data pointer address instead of copying the data.

In addition, we bind all three threads to an arm A76 CPU.

Could anyone can give any comment about this issue? Thanks.

daohu527 commented 4 years ago

Cyber rt uses coroutines to handle tasks, and it already use data pointers instead of copy memory to pass data.
In your scenario, are all three threads running in one CPU core? can you introduce the process in detail?

Allenhe123 commented 4 years ago

Cyber rt uses coroutines to handle tasks, and it already use data pointers instead of copy memory to pass data. In your scenario, are all three threads running in one CPU core? can you introduce the process in detail?

thanks a lot for your reply. In fact the threads num is 2, both 2 threads were assigned to the same CPU core.

Let me put some Pseudocode below: the work process is: decode --> inference --> parse result ---> render the result to screen. 1). The sample pipeline appliation: main thread logic:

auto fut = std::async(post_process); read the video file from disk while(1) { send frame data to the decoder; sleep 5 ms to wait the decoder to complete the job; push the decoded frame to DNN to do inference, and this is a block call, waste time about 25ms; sem_wait(&g_sem_render); clone some data; push data to task-queue; sem_post(&g_sem); }

postprocess thread logic: while(1) { sem_wait(&g_sem); fetch a task from task-queue; do parse operation, a block call about 30ms; render the frame and inference results; sem_post(&g_sem_render); }

Althrough we use a task queue, but always only one task will be pushed in it. We always pass data pointer between threads.

  1. CyberRT solution: main thread logic: create inference result reader, call back fucn --> { do parse operation, a block call about 30ms; render the frame and inference results; sem_post(&g_sem_render); } create inference result writer;

while(1) { send frame data to the decoder; sleep 5 ms to wait the decoder to complete the job; push the decoded frame to DNN to do inference, and this is a block call, waste time about 25ms; sem_wait(&g_sem_render); clone some data; writer.Write(data pointer); }

I have tried different number of the deault_process num to test, but have no impact. PS: I have closed the async logger of cyberRT and all other configurations are default.

As you can see in the Pseudocode, i have use semphore to notify threads to work with each other. In fact, the drop 3 fps is obtained when remove all semphore in cyberRT application(but I still keep it in above Pseudocode, if we remove the semphore the program will be not stable because the decoder should release some old frames in order to decode new frames, but it should wait the render to complete....). If we keep the semphore in cyberRT application, the drop fps number is about 7~8.

PS: decoder and render will also create some threads to do work, but this is the same for both the two application scenarios.

Look forward to your reply, thanks for all your help in advance :)

Allenhe123 commented 4 years ago

PS: We run both the two application on ARM V8, we assigned the threads to a A76 Arm Core.