Closed xshen053 closed 1 year ago
Hi @xshen053 , please share the below details to better understand the scenario
http://ffmpeg.org/releases/ffmpeg-6.0.tar.xz
http://www.phoronix-test-suite.com/benchmark-files/x264-20221005.tar.xz
Ubuntu 20.04.6 LTS
git clone https://github.com/phoronix-test-suite/phoronix-test-suite.git
sudo ./install-sh
might need install other dependency like
sudo apt install php7.4-cli
sudo apt-get install php-xml
phoronix-test-suite benchmark ffmpeg
choose 1
choose 4
then it will automatically execute vbench benchmark and give me results after finishing.
hey, can you run the test I did, do you need any other informations?
Hi @xshen053 , we are trying to reproduce your scenario. Can you also please let's know the CPU utilization you observed during the runs? is it possible to use your real application for benchmarking the performance instead ?
I tried to run the test but I ran into a problem with the phoronix code so I wasn't able to run it without investing some time to debug. By guess would be that the test is single threaded (or at least leaving many cores idle) which would explain the performance discrepancy. Graviton CPUs in general are optimized to sustain large workloads over many (or all) cores without reductions in performance. Other CPUs that have SMT can encounter resource constraints when fully loading across the whole instance. There is some data about that here: https://github.com/aws/aws-graviton-getting-started/blob/main/perfrunbook/system-load-and-compute-headroom.md
As mentioned in the blog post that you linked to, most video workloads utilize entire instances in order to transcode many video streams or files in parallel. This leads to lowest cost per unit time of video. When I ran the benchmarks for that post, I designed my benchmarks to fully load the instances. In that scenario, Graviton3 powered c7g instances achieved the lowest cost to encode of the instances I tested, which were C6i, C6a, and C7g.
I suspect that there is some step that you ran that is missing which is preventing me from running the phoronix test. (That would ultimately mean there's a bug in the test.) Perhaps libx264-devel
was already installed from the Ubuntu package manager that led to ffmpeg building with an older (and less optimized) version? (Just a guess...)
A rough approximation of the CPU usage with htop would be fine. If you want to explore a more rigorous method, use sysstat. Just make sure you get an idea for how the usage changes during the test. E.g. does it start out using one core and then use all in the middle? Is it steady state or periodic?
Thanks for the reply!
Everything depends on the workload and you would need to benchmark for what you are interested in, but it can be the case that a single thread will run faster on an M7i, M6i, or C6a than C7g, as you have seen with this ffmpeg benchmark.
Closing this issue as the question appears to have been answered.
Hi, I used
phoronix-test-suite
to benchmark performance on x86 and graviton3 instance. This test suite uses vbench to benchmark performance of ffmpeg. I usedc6a.8xlarge
andc7g.8xlarge
instances. However the result is not expected. Maybe I did something wrong?Scenario
Encoder: libx264 - Scenario: Video on demand
c6a.8xlarge
FPS: 44.6 seconds: 170
c7g.8xlarge
FPS: 30.64 seconds: 247
From this blog, seems graviton has some optimization
https://aws.amazon.com/blogs/opensource/optimized-video-encoding-with-ffmpeg-on-aws-graviton-processors/