Open RobRich999 opened 1 month ago
I have a working dff48a7 release build websdr.bin binary with LTO, graphite, -fipa-pta, and -fdevirtualize-at-ltrans optimizations.
I did not build HFDL. Also I turned off debug symbols for my build, and it dropped the binary size to under 7MB.
I have not done much testing, but basic features seem to be working as intended. YMMV, of course.
cmake .. -DCMAKE_C_FLAGS_RELEASE="-O3 -DNDEBUG -fgraphite-identity -floop-nest-optimize -flto=auto -fipa-pta -fdevirtualize-at-ltrans -pipe" -DCMAKE_CXX_FLAGS_RELEASE="-O3 -DNDEBUG -fgraphite-identity -floop-nest-optimize -flto=auto -fipa-pta -fdevirtualize-at-ltrans -pipe -pthread" -DCMAKE_INTERPROCEDURAL_OPTIMIZATION=TRUE
set(CMAKE_INTERPROCEDURAL_OPTIMIZATION TRUE)
set(CMAKE_CXX_FLAGS_RELEASE "-O3 -fgraphite-identity -floop-nest-optimize -fipa-pta -flto=auto -fdevirtualize-at-ltrans -pipe")
set(CMAKE_C_FLAGS_RELEASE " -O3 -fgraphite-identity -floop-nest-optimize -fipa-pta -flto=auto -fdevirtualize-at-ltrans -pipe")
set(PLATFORM_FLAGS -march=armv7-a -mtune=cortex-a9 -mfpu=neon -mfloat-abi=hard -ffast-math -fsingle-precision-constant -mvectorize-with-neon-quad -fgraphite-identity -floop-nest-optimize -fipa-pta -flto=auto -fdevirtualize-at-ltrans -pipe)
Welcome a PR with a report on the perf gain.
Suppose one could try measuring processor utilization, though as noted, I will have to leave that to someone else for now.
Nonetheless, the best bet for a starting point (IMHO) is LTO as it rarely degrades performance except in corner cases.
I tried LTO, which works fine on my devbox. I didn't carefully design a benchmark to check the performance difference. but it didn't shows significance difference.
If interested in exploring possible performance tweaks, I can confirm the websdr.bin binary builds and works with GCC (v13.2.1) graphite and fipa-pta optimizations enabled. Note I have not tested all receiver options, so YMMV here. I will leave it up to someone else to figure out any actual performance difference(s).
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html https://github.com/InBetweenNames/gentooLTO/blob/master/sys-config/ltoize/files/make.conf.lto.defines
Ideally LTO would be used, especially for further improving the fipa-pta pass, plus potentially using -fdevirtualize-at-ltrans as well. Previously I have done LTO build of websdr.bin, but I did not get around to actually testing it at the time.. IIRC, there were several ODR and similar warnings. I might give it another go in the near future.