In the recent week I have worked on optimizing the kernel with the recent posted updated AutoFDO patch.
The patch is generally expected to lang into 6.13 and therefore we can proceed with the integration into CachyOS.
An proper profiled AutoFDO can provide a bump for around 10% throughput and like 3% lower latency according the information in the patchset. This is quite a bump, and together with Propellor it should be possible to archive this.
Integration with LLVM 18
We already worked on a Integration in https://github.com/CachyOS/linux-cachyos/pull/322
This seems to be mostly solved and we can work in providing a default "linux-cachyos-autofdo" kernel for now. This can be tested by users for some weeks, until we implement it into the default "linux-cachyos" kernel, when no issues are found.
LLVM 18 currently does not support merging multiple profiles, therefore we are limited to one workload each release.
According the patchset the profile can be reused:
One can collect profiles using AutoFDO build for the previous kernel.
AutoFDO employs relative line numbers to match the profiles, offering
some tolerance for source changes. This mode is commonly used in a
production environment for profile collection.
We will work on profiles every 1-3 kernel versions, depending on how many code has been changed. If there are bigger stable merges with 300+ commits, we need to make a new profile all time.
The profiles will be named each and for now we will implement this for the Zen4 and x86-64-v3 repository and the profiles will be named:
x86-64-v3: perf-v3.afdo
Zen4: perf-znver4.afdo
Workflow
Compile Kernel in chroot with _autofdo=y
Download Kernel to test machines and generate a profile
Put the profile into the source directory named perf-v3.afdo or perf-znver4.afdo
Compile the Kernel in chroot again and distribute it into the repository
Integration with LLVM 19
As soon LLVM 19 is released, we can merge multiple profiles/workloads as well as using Propeller.
Multiple Profiles
We offer the first chroot compiled kernel in our archive and then some selected users can install and profile it. (Maybe one Intel CPU and one AMD CPU)
The users are sending a Pull request and we will merge the profiles together into one. This should generally improve the coverage and as well the performance.
Propeller
This makes it complicated. We need one more profile, which is used after the compilation with the AutoFDO Profile. We need to look, how much more work this is, but maybe we can do one more profiling and add the propeller profile then to the repository and add a toggle.
Hi together,
In the recent week I have worked on optimizing the kernel with the recent posted updated AutoFDO patch. The patch is generally expected to lang into 6.13 and therefore we can proceed with the integration into CachyOS.
An proper profiled AutoFDO can provide a bump for around 10% throughput and like 3% lower latency according the information in the patchset. This is quite a bump, and together with Propellor it should be possible to archive this.
Integration with LLVM 18
We already worked on a Integration in https://github.com/CachyOS/linux-cachyos/pull/322 This seems to be mostly solved and we can work in providing a default "linux-cachyos-autofdo" kernel for now. This can be tested by users for some weeks, until we implement it into the default "linux-cachyos" kernel, when no issues are found.
LLVM 18 currently does not support merging multiple profiles, therefore we are limited to one workload each release.
According the patchset the profile can be reused:
We will work on profiles every 1-3 kernel versions, depending on how many code has been changed. If there are bigger stable merges with 300+ commits, we need to make a new profile all time.
The profiles will be named each and for now we will implement this for the Zen4 and x86-64-v3 repository and the profiles will be named:
Workflow
Integration with LLVM 19
As soon LLVM 19 is released, we can merge multiple profiles/workloads as well as using Propeller.
Multiple Profiles
We offer the first chroot compiled kernel in our archive and then some selected users can install and profile it. (Maybe one Intel CPU and one AMD CPU) The users are sending a Pull request and we will merge the profiles together into one. This should generally improve the coverage and as well the performance.
Propeller
This makes it complicated. We need one more profile, which is used after the compilation with the AutoFDO Profile. We need to look, how much more work this is, but maybe we can do one more profiling and add the propeller profile then to the repository and add a toggle.