InBetweenNames / gentooLTO

A Gentoo Portage configuration for building with -O3, Graphite, and LTO optimizations
GNU General Public License v2.0
570 stars 97 forks source link

Integration of Clear Linux patches #164

Open InBetweenNames opened 5 years ago

InBetweenNames commented 5 years ago

Clear Linux maintains a number of performance related patches for open source projects. There are quite a few: https://github.com/clearlinux-pkgs

It would be interesting to integrate these into GentooLTO somehow, either synced into the overlay or through a user install mechanism.

aw1cks commented 5 years ago

@sjnewbury I don't use it on a laptop. I haven't got round to rebuilding my laptop with gentoo yet, but on my desktop I didn't use all the patches but rather the ones relating to boot speed & performance without much regard for the patches claiming to reduce wakelocks. As far as I can tell, in their use case with Clear Linux the difference in power is more than offset by the other tweaks which they have made. How much of this extra battery life comes from their kernel, I couldn't say, seeing as they have custom patches applied to many userland applications and even gcc itself. You would have to benchmark it to know for sure. If you want to test the kernel yourself, you can use any 4.19 series kernel and put the patches into /etc/portage/patches/sys-kernel/${KERNEL_SOURCE_PKG_NAME}-${KERNEL_VERSION}/. The patches are available here (they also have their boot parameters in a text file in this repository, worth trying perhaps). Just as a note, I do hope to eventually create ebuilds which DO include their userland patches for various programs available as a useflag, and then in that case we can make a fair comparison. Additionally, I do wonder if the use of systemd vs openRC could play a role here (in my experience, I have had higher power consumption when using init systems other than systemd - maybe it's something I'm doing wrong, I couldn't tell you) . If you do find anything out, please let me know as I'm quite interested in this myself for my laptop.

fenrus75 commented 5 years ago

for power, do look at the clr-power-tweaks package... that is where much of the tuning happens

On Wed, Dec 5, 2018, 16:10 Alex Wicks <notifications@github.com wrote:

@sjnewbury https://github.com/sjnewbury I don't use it on a laptop. I haven't got round to rebuilding my laptop with gentoo yet, but on my desktop I didn't use all the patches but rather the ones relating to boot speed & performance without much regard for the patches claiming to reduce wakelocks. As far as I can tell, in their use case with Clear Linux the difference in power is more than offset by the other tweaks which they have made. How much of this extra battery life comes from their kernel, I couldn't say, seeing as they have custom patches applied to many userland applications and even gcc itself. You would have to benchmark it to know for sure. If you want to test the kernel yourself, you can use any 4.19 series kernel and put the patches into /etc/portage/patches/sys-kernel/${KERNEL_SOURCE_PKG_NAME}-${KERNEL_VERSION}/. The patches are available here https://github.com/clearlinux-pkgs/linux. Just as a note, I do hope to eventually create ebuilds which DO include their userland patches for various programs available as a useflag, and then in that case we can make a fair comparison. Additionally, I do wonder if the use of systemd vs openRC could play a role here (in my experience, I have had higher power consumption when using init systems other than systemd - maybe it's something I'm doing wrong, I couldn't tell you) . If you do find anything out, please let me know as I'm quite interested in this myself for my laptop.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/InBetweenNames/gentooLTO/issues/164#issuecomment-444518726, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPeFZ_CJF_XroYBSVNXkIH9e3GrVxGMks5u1-HYgaJpZM4YLC7Z .

aw1cks commented 5 years ago

@fenrus75 thanks for pointing in the right direction. How can I build this package without Clear Linux userspace tools? I can't find any binaries in the repository, nor any of the releases, and the Makefile references a file not included in the repository.

fenrus75 commented 5 years ago

uh you might need to rpm2cpio our src.rpm

On Wed, Dec 5, 2018, 16:36 Alex Wicks <notifications@github.com wrote:

@fenrus75 https://github.com/fenrus75 thanks for pointing in the right direction. How can I build this package without Clear Linux userspace tools? I can't find any binaries in the repository, nor any of the releases, and the Makefile references a file not included in the repository.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/InBetweenNames/gentooLTO/issues/164#issuecomment-444528399, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPeFfDE8aQXb35Ikn2C7kN3DAAQS8ieks5u1-fjgaJpZM4YLC7Z .

aw1cks commented 5 years ago

Great, thanks. However I'm having an issue with autoconf.

[alex@xps13 clr-power-tweaks-174]$ autoconf
configure.ac:6: error: possibly undefined macro: AM_INIT_AUTOMAKE
      If this token and others are legitimate, please use m4_pattern_allow.
      See the Autoconf documentation.
configure.ac:7: error: possibly undefined macro: AM_SILENT_RULES

Some missing include maybe?

gcs-github commented 5 years ago

Leaving a link to this new Phoronix post here, benchmarking some of the performance gains from Clear Linux, to give some extra context to this issue and get some idea of we can hope for from following up: https://www.phoronix.com/scan.php?page=article&item=clear-faster-blas&num=1

javashin commented 5 years ago

FWIW -mtls-dialect=gnu2 also requires a glibc patch to make it work with a patched prelink... Yeah, I'm the last user of prelink! ;-)

(Currently building gentooLTO+x32+auto-prelink+autopar+jemalloc)

im prelinking gentoo too with my new install gentoo nomultilib lto nopie nossp

jelinekto commented 5 years ago

@InBetweenNames Did you by any chance look into -fdata-sections -ffunction-sections -Wl,--gc-sections further? I've enabled those globally couple months ago and while I'm not sure there's a clean benefit (couldn't directly compare binary sizes with my previous build as I changed some other things as well), my system does not appear to be broken.

Even if it's not worth enabling globally, perhaps packages that can't be build with full LTO could benefit from something like /"${FLTO}"/"${GCSECTIONS}"?

elsandosgrande commented 5 years ago

@jelinekto Umm, https://stackoverflow.com/questions/4274804/query-on-ffunction-section-fdata-sections-options-of-gcc . This is not as straightforward as you might think.

eternal-sorrow commented 2 years ago

So, let's make it straight: is -falign-functions=32 beneficial on AMD CPUs? Or only Intel?

JustArchi commented 2 years ago

I'm late to the party but I was having fun with your awesome LTO patches and various flags today, testing them with sysbench.

On my intel i7 7700k, -falign-functions=32 degrades sysbench cpu run results (total number of events) from around 15.5k to barely 13.9-14k. For comparison, I also did -falign-functions=8 and that resulted in around 14.3k result, so once again heavily downgrading the result, but to less extent than 32. I made triple sure I'm testing and interpreting stuff in correct way, emerging sysbench app (exclusively) after every change and running several times while ensuring everything in background is as silent as possible. The flags I've used were current gentooLTO as of today with only -march=native added, so implicitly -O3 and all lto/graphite optimizations.

Now I know this is one, very specific, maybe even a bit stupid benchmark which I used to test those flags, but it's definitely not universal to say that newer intels should use 32 globally. Maybe there are benchmarks or other apps where it's beneficial, I don't doubt that, but there is at least one (and from I read more than one) place where it heavily degrades the performance, so much that it degrades the result all the way to -O2 (without -march), which is clocking around 13.9-14k as well.

Just my 3 cents, maybe it'll help somebody, maybe it won't. I suggest running benchmarks to verify whether the flag is helping or not. Personally I dug it up due to the fact that after applying LTO flags the benchmark dropped by approx 10% compared to just -O2 -march=native, and I was looking for the cause - turns out it was -falign-functions=32. There is a chance that this benchmark could be flawed and it'd be exception rather than the rule, but I'd be very doubtful regarding that - once I get some time and motivation I might test other benchmarks just to compare the results.

RaphMad commented 1 year ago

@JustArchi Very interesting observation, I was contemplating about whether -falign-functions=32 is worth it for my i7-4790K today.

It may just be a fluke in the testing patterns of sysbench, but I guess even with all the research @InBetweenNames has done, it seems that this optimization really depends on a combination of workload and CPU-internal optimizations/alignment-/cache-assumptions.

firasuke commented 1 year ago

Is -falign-functions=32 actually profitable? Some research lead me to: