ffvvc / FFmpeg

VVC Decoder for ffmpeg
Other
48 stars 12 forks source link

executor.c: divide tasks queue in un/ready queues #210

Closed danielsan901998 closed 2 months ago

danielsan901998 commented 2 months ago
Implement #50 Since #23 is already closed i have tried to use two queues and after using tests/tools/perf.py to check the performance this is the result with the changes: video before after
BQTerrace_1920x1080_60_10_420_22_RA.vvc 80.0 83.7
NovosobornayaSquare_1920x1080.bin 157.7 163.3
RitualDance_1920x1080_60_10_420_37_RA.266 125.7 135.3
Tango2_3840x2160_60_10_420_27_LD.266 30.0 31.0
RitualDance_1920x1080_60_10_420_32_LD.266 138.3 143.0
Chimera_8bit_1080P_1000_frames.vvc 136.3 145.3

In the current form when adding a task first it is checked if it is ready, and depending on the result it decide to which queue to add it, and only notify a worker when adding it to the ready queue, then in the worker, when calling run_one_task in the case there is no ready task it call update_tasks to try to find a ready task to execute.

I tried to only call the update_tasks function in av_executor_execute after adding the task to the unready queue, but it did not improve the performance unlike this method of only calling update_tasks when a worker do not find ready tasks, i think this is because in this case the unready queue is only checked when a worker does not find a task, but in the case of executing update_tasks after any task added it does unnecessary work checking the unready queue even when the ready queue is still not empty and all the workers are executing tasks.

nuomi2021 commented 2 months ago

@danielsan901998 thank you for the patch. Are you sure the performance introduced by the ready/unready queue? We refacted the code, and every task in the queue is a ready task. https://github.com/ffvvc/FFmpeg/blob/e81b6d78fc2ddf8edd53a6a052713354ef8d27c2/libavcodec/vvc/vvc_thread.c#L199C9-L199C17

danielsan901998 commented 2 months ago
After running the benchmark this time i observed the reverse result, confirming that it was not a real improvement: video before after
BQTerrace_1920x1080_60_10_420_22_RA.vvc 81.7 80.0
NovosobornayaSquare_1920x1080.bin 151.3 148.0
RitualDance_1920x1080_60_10_420_37_RA.266 123.0 124.0
Tango2_3840x2160_60_10_420_27_LD.266 31.0 31.0
RitualDance_1920x1080_60_10_420_32_LD.266 142.3 135.7
Chimera_8bit_1080P_1000_frames.vvc 143.7 141.0

Running multiple times the benchmark makes me think that this must have been statistical noise, i use a laptop and maybe background task influence the result, that explain why i did not see a difference in performance when calling the update in av_executor_execute, when i did the last change i must have misinterpreted what might have been a random noise that increased the FPS in the result with an improvement in performance.

Since now every task is a ready task, does that mean that dividing the queue in ready and unready task is no longer desired? Should I close the pull request?

nuomi2021 commented 2 months ago

Laptop will reduce frequency when the workload is high. Still thank you for help on this. Please help check other issues. thank you