Closed Over17 closed 3 years ago
assigning to yabinc as "mr simpleperf", but also adding the PGO folks since i don't know enough about this stuff to know who's best suited to look at this, and it might end up involving everyone anyway... :-)
Documentation https://source.android.com/devices/tech/perf/pgo#collecting-profiles says At this time, Android does not support using sampling-based profile collection but makes no sense since I can collect sample-based profiles using simpleperf, huh?
simpleperf can record samples, but not all samples can be used for PGO. To be useful to PGO, the samples need to have branch information, so the compiler knows which branch directions are more likely to happen and worth optimizing. Intel x86 supports this by LBR(last branch record), which can be recorded using -b option in record cmd. And ARM supports this by Coresight ETM, which can be recorded using -e cs-etm option in record cmd. And simpleperf inject only supports perf.data generated by -e cs-etm option.
For security reason, ETM can't be available to user device soon. Here is more info, https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/collect_etm_data_for_autofdo.md. So currently the only way for app PGO is instrumented-based PGO. Here is another doc for it, https://medium.com/androiddevelopers/pgo-for-native-android-applications-1a48a99e95d0.
Thank you @enh-google and @yabinc.
Do you know if Arm ETM will be supported in Armv9 devices - is the extension going to be mandatory then?
The article by androiddevelopers is super useful. I tried building instrumented build but didn't seem to get any traces written, which may be explained by the lack of __llvm_profile_write_file()
call.
I need to add -fprofile-generate
to my compiler and linker invocations, but will it work if I add it to only one of the .so's in the APK? The docs in https://source.android.com/devices/tech/perf/pgo#enabling-pgo-in-android-bp-files are a bit unclear, or at least I'm having hard times deciphering
Static libraries instrumented with PGO, all shared libraries, and any binary that directly depends on the static library must also be instrumented for PGO. However, such shared libraries or executables don't need to use PGO profiles, and their enable_profile_use property can be set to false. Outside of this restriction, you can apply PGO to any static library, shared library, or executable.
Or if I have multiple so's that I want to instrument, do I need to call __llvm_profile_write_file()
from each of them? The function is likely defined in the static lib which is linked by the linker flag.
(closing since it seems there's no bug to fix, but we're still here to answer questions)
Do you know if Arm ETM will be supported in Armv9 devices - is the extension going to be mandatory then?
The hardware is actually there on almost all existing ARM devices, but is fused off for security reasons on production devices. We're hoping that future devices won't have such limitations, but don't have anything else to share about that right now.
The docs in https://source.android.com/devices/tech/perf/pgo#enabling-pgo-in-android-bp-files
These are docs for PGO on platform libraries and are not relevant for applications.
Or if I have multiple so's that I want to instrument, do I need to call __llvm_profile_write_file() from each of them? The function is likely defined in the static lib which is linked by the linker flag.
It should be called for each shared library. Each .so has a LOCAL/hidden copy of this function that writes profiles for that particular library. I think calling dlclose()
on each library may also work.
I think calling dlclose() on each library may also work.
C++ effectively requires that dlclose does nothing for a lot of programs. If you use this trick, expect it to stop working in the future.
Calling __llvm_profile_write_file()
returns -1, and there is no error printed anywhere - do I maybe need to call __llvm_profile_set_filename()
earlier or something like that? I doublechecked that -fprofile-generate
is passed to the compiler and the linker of one of the libraries (and its sized increased by ~20meg).
Strike that - calling __llvm_profile_set_filename()
on a writeable path seems to have worked!
In the end, I was able to make a POC with PGO using Android NDK and instrumented builds. Thank you everyone!
@Over17 Can you share if PGO was beneficial in this case?
@pirama-arumuga-nainar sorry for the late answer, have been away. Yes I was able to get 4-7% better CPU performance (plus, looking at LLVM remarks and the code, there is potential for more vectorization in some places). For the users, the workflow overhead by having to have an instrumented build => run the benchmark (or even manual testing) => optimized build is quite a significant drawback. It may be easier for server workflows or even end-user apps, but for our product it's a bit more problematic.
For the users, the workflow overhead by having to have an instrumented build => run the benchmark (or even manual testing) => optimized build is quite a significant drawback.
Agreed. It helps that Clang is tolerant of different/changing source code. For the Android platform, we don't create a profdata during the build. Instead a job in our CI collects it ~once per day. Approximately once a week, we get this profdata and check that into source control and use it for optimized build.
Yes that was one of the concerns, but the paper on AutoFDO and your experience proves the opposite.
Another workflow issue for us is that the engine is shipped to the gamedevs precompiled and optimized, so it's impossible to apply compile-time optimizations at this stage. Some of the code is being compiled on the gamedevs machine so PGO is doable, but not to the core of the engine.
Description
Using simpleperf to convert perf.data into autofdo format returns zero-sized output. (need it for PGO experiments)
I tried running simpleperf both on the host (windows) and on the device.
Command line used:
To collect the profile
python app_profiler.py -p com.unity3d.torture
To convert on the host:
c:\android-ndk-r23\simpleperf>bin\windows\x86_64\simpleperf.exe inject -i perf.data -o autofdo.txt --output autofdo
Command line on the device used:
Attaching perf.data just in case. perf.zip
Looks weird because the code is definitely there https://android.googlesource.com/platform/system/extras/+/master/simpleperf/cmd_inject.cpp#102
Documentation https://source.android.com/devices/tech/perf/pgo#collecting-profiles says
At this time, Android does not support using sampling-based profile collection
but makes no sense since I can collect sample-based profiles using simpleperf, huh?Environment Details
Not all of these will be relevant to every bug, but please provide as much information as you can.