Kobzol / cargo-pgo

Cargo subcommand for optimizing Rust binaries/libraries with PGO and BOLT.
MIT License
560 stars 11 forks source link

Evaluate PGO coverage #30

Closed zamazan4ik closed 1 year ago

zamazan4ik commented 1 year ago

Hi!

Is there a way to calculate PGO coverage of the profiled program? I see now cargo-pgo shows warnings for functions without a profile. I want to see more "advanced" statistics regarding PGO coverage. I guess something like gcov + lcov from C++ world (with different report capabilities, etc).

I guess it could be done somehow with llvm-cov report profile.profdata but I am not sure.

Thanks in advance!

Kobzol commented 1 year ago

Hi, I tried using llvm-cov, but it doesn't work if the binary doesn't have coverage information available.

I think that the proper tool for this is llvm-profdata, which is already used by cargo-pgo. It can display a lot of information, but I'm not sure what would be valuable to show by cargo-pgo. It can e.g. output the list of functions and their corresponding counter values (sort by top N functions or something like that). Would this be useful for you? Or is there some other specific information that you would like to see? You can try running llvm-profdata --help to see what it can output:

llvm-profdata --help output ``` OVERVIEW: LLVM profile data summary USAGE: llvm-profdata show [options] OPTIONS: Color Options: --color - Use colors in output (default=autodetect) General options: --aarch64-neon-syntax= - Choose style of NEON code to emit from AArch64 backend: =generic - Emit generic NEON assembly =apple - Emit Apple-style NEON assembly --aarch64-use-aa - Enable the use of AA during codegen. --abort-on-max-devirt-iterations-reached - Abort when the max iterations for devirtualization CGSCC repeat pass is reached --all-functions - Details for every function --allow-ginsert-as-artifact - Allow G_INSERT to be considered an artifact. Hack around AMDGPU test infinite loops. --arm-add-build-attributes - --arm-implicit-it= - Allow conditional instructions outdside of an IT block =always - Accept in both ISAs, emit implicit ITs in Thumb =never - Warn in ARM, reject in Thumb =arm - Accept in ARM, reject in Thumb =thumb - Warn in ARM, emit implicit ITs in Thumb --asm-show-inst - Emit internal instruction representation to assembly file --atomic-counter-update-promoted - Do counter update using atomic fetch add for promoted counters only --atomic-first-counter - Use atomic fetch add for first counter in a function (usually the entry counter) --binary-ids - Show binary ids in the profile. --bounds-checking-single-trap - Use one trap block per function --cfg-hide-cold-paths= - Hide blocks with relative frequency below the given value --cfg-hide-deoptimize-paths - --cfg-hide-unreachable-paths - --cost-kind= - Target cost kind =throughput - Reciprocal throughput =latency - Instruction latency =code-size - Code size =size-latency - Code size and latency --counts - Show counter values for shown functions --covered - Show only the functions that have been executed. --debug-info= - Read and extract profile metadata from debug info and show the functions it found. --debug-info-correlate - Use debug info to correlate profiles. --debugify-func-limit= - Set max number of processed functions per pass. --debugify-level= - Kind of debug info to add =locations - Locations only =location+variables - Locations and Variables --debugify-quiet - Suppress verbose debugify output --detailed-summary - Show detailed profile summary --detailed-summary-cutoffs=<800000,901000,999999> - Cutoff percentages (times 10000) for generating detailed summary --disable-i2p-p2i-opt - Disables inttoptr/ptrtoint roundtrip optimization --do-counter-promotion - Do counter register promotion --dot-cfg-mssa= - file name for generated dot file --dwarf-version= - Dwarf version --dwarf64 - Generate debugging info in the 64-bit DWARF format --emit-dwarf-unwind= - Whether to emit DWARF EH frame entries. =always - Always emit EH frame entries =no-compact-unwind - Only emit EH frame entries when compact unwind is not available =default - Use target platform default --emscripten-cxx-exceptions-allowed= - The list of function names in which Emscripten-style exception handling is enabled (see emscripten EMSCRIPTEN_CATCHING_ALLOWED options) --enable-cse-in-irtranslator - Should enable CSE in irtranslator --enable-cse-in-legalizer - Should enable CSE in Legalizer --enable-emscripten-cxx-exceptions - WebAssembly Emscripten-style exception handling --enable-emscripten-sjlj - WebAssembly Emscripten-style setjmp/longjmp handling --enable-gvn-hoist - Enable the GVN hoisting pass (default = off) --enable-gvn-memdep - --enable-gvn-sink - Enable the GVN sinking pass (default = off) --enable-load-in-loop-pre - --enable-load-pre - --enable-loop-simplifycfg-term-folding - --enable-name-compression - Enable name/filename string compression --enable-split-backedge-in-load-pre - --experimental-debug-variable-locations - Use experimental new value-tracking variable locations --fatal-warnings - Treat warnings as errors --fs-profile-debug-bw-threshold= - Only show debug message if the source branch weight is greater than this value. --fs-profile-debug-prob-diff-threshold= - Only show debug message if the branch probility is greater than this value (in percentage). --function= - Details for matching functions --generate-merged-base-profiles - When generating nested context-sensitive profiles, always generate extra base profile for function with all its context profiles merged into it. --gpsize= - Global Pointer Addressing Size. The default size is 8. --hash-based-counter-split - Rename counter variable of a comdat function based on cfg hash --hot-cold-split - Enable hot-cold splitting pass --hot-func-list - Show profile summary of a list of hot functions --ic-targets - Show indirect call site target values for shown functions --import-all-index - Import all external functions in index. --incremental-linker-compatible - When used with filetype=obj, emit an object file which can be used with an incremental linker --instcombine-code-sinking - Enable code sinking --instcombine-guard-widening-window= - How wide an instruction window to bypass looking for another guard --instcombine-max-iterations= - Limit the maximum number of instruction combining iterations --instcombine-max-num-phis= - Maximum number phis to handle in intptr/ptrint folding --instcombine-max-sink-users= - Maximum number of undroppable users for instruction sinking --instcombine-maxarray-size= - Maximum array size considered when doing a combine --instcombine-negator-enabled - Should we attempt to sink negations? --instcombine-negator-max-depth= - What is the maximal lookup depth when trying to check for viability of negation sinking. --instrprof-atomic-counter-update-all - Make all profile counter updates atomic (for testing only) --internalize-public-api-file= - A file containing list of symbol names to preserve --internalize-public-api-list= - A list of symbol names to preserve --iterative-counter-promotion - Allow counter promotion across the whole loop nest. --list-below-cutoff - Only output names of functions whose max count values are below the cutoff value --lto-embed-bitcode= - Embed LLVM bitcode in object files produced by LTO =none - Do not embed =optimized - Embed after all optimization passes =post-merge-pre-opt - Embed post merge, but before optimizations --lto-pass-remarks-filter= - Only record optimization remarks from passes whose names match the given regular expression --lto-pass-remarks-format= - The format used for serializing remarks (default: YAML) --lto-pass-remarks-output= - Output filename for pass remarks --matrix-default-layout= - Sets the default matrix layout =column-major - Use column-major layout =row-major - Use row-major layout --max-counter-promotions= - Max number of allowed counter promotions --max-counter-promotions-per-loop= - Max number counter promotions per loop to avoid increasing register pressure too much --mc-relax-all - When used with filetype=obj, relax all fixups in the emitted object file --mcabac - tbd --memop-sizes - Show the profiled sizes of the memory intrinsic calls for shown functions Profile kind: --instr - Instrumentation profile (default) --sample - Sample profile --memory - MemProf memory access profile --merror-missing-parenthesis - Error for missing parenthesis around predicate registers --merror-noncontigious-register - Error for register names that aren't contigious --mhvx - Enable Hexagon Vector eXtensions --mhvx= - Enable Hexagon Vector eXtensions =v60 - Build for HVX v60 =v62 - Build for HVX v62 =v65 - Build for HVX v65 =v66 - Build for HVX v66 =v67 - Build for HVX v67 =v68 - Build for HVX v68 =v69 - Build for HVX v69 --mips-compact-branches= - MIPS Specific: Compact branch policy. =never - Do not use compact branches if possible. =optimal - Use compact branches where appropriate (default). =always - Always use compact branches if possible. --mips16-constant-islands - Enable mips16 constant islands. --mips16-hard-float - Enable mips16 hard float. --mir-strip-debugify-only - Should mir-strip-debug only strip debug info from debugified modules by default --misexpect-tolerance= - Prevents emiting diagnostics when profile counts are within N% of the threshold.. --mno-compound - Disable looking for compound instructions for Hexagon --mno-fixup - Disable fixing up resolved relocations for Hexagon --mno-ldc1-sdc1 - Expand double precision loads and stores to their single precision counterparts --mno-pairing - Disable looking for duplex instructions for Hexagon --mwarn-missing-parenthesis - Warn for missing parenthesis around predicate registers --mwarn-noncontigious-register - Warn for register names that arent contigious --mwarn-sign-mismatch - Warn for mismatching a signed and unsigned value --no-deprecated-warn - Suppress all deprecated warnings --no-discriminators - Disable generation of discriminator information. --no-type-check - Suppress type errors (Wasm) --no-warn - Suppress all warnings --nvptx-sched4reg - NVPTX Specific: schedule for register pressue --opaque-pointers - Use opaque pointers --output= - Output file --poison-checking-function-local - Check that returns are non-poison (for testing) --print-pipeline-passes - Print a '-passes' compatible string describing the pipeline (best-effort only). --profiled-binary= - Path to binary from which the profile was collected. --rdf-dump - --rdf-limit= - --runtime-counter-relocation - Enable relocating counters at runtime. --safepoint-ir-verifier-print-only - --sample-profile-check-record-coverage= - Emit a warning if less than N% of records in the input profile are matched to the IR. --sample-profile-check-sample-coverage= - Emit a warning if less than N% of samples in the input profile are matched to the IR. --sample-profile-max-propagate-iterations= - Maximum number of iterations to go through when propagating sample block/edge weights through the CFG. --show-prof-sym-list - Show profile symbol list if it exists in the profile. --show-sec-info-only - Show the information of each section in the sample profile. The flag is only usable when the sample profile is in extbinary format --showcs - Show context sensitive counts --skip-ret-exit-block - Suppress counter promotion if exit blocks contain ret. --speculative-counter-promotion-max-exiting= - The max number of exiting blocks of a loop to allow speculative counter promotion --speculative-counter-promotion-to-loop - When the option is false, if the target block is in a loop, the promotion will be disallowed unless the promoted counter update can be further/iteratively promoted into an acyclic region. --summary-file= - The summary file to use for function importing. --sve-tail-folding= - Control the use of vectorisation using tail-folding for SVE: disabled No loop types will vectorize using tail-folding default Uses the default tail-folding settings for the target CPU all All legal loop types will vectorize using tail-folding simple Use tail-folding for simple loops (not reductions or recurrences) reductions Use tail-folding for loops containing reductions recurrences Use tail-folding for loops containing first order recurrences --tail-predication= - MVE tail-predication pass options =disabled - Don't tail-predicate loops =enabled-no-reductions - Enable tail-predication, but not for reduction loops =enabled - Enable tail-predication, including reduction loops =force-enabled-no-reductions - Enable tail-predication, but not for reduction loops, and force this which might be unsafe =force-enabled - Enable tail-predication, including reduction loops, and force this which might be unsafe --text - Show instr profile data in text dump format --thinlto-assume-merged - Assume the input has already undergone ThinLTO function importing and the other pre-optimization pipeline changes. --threads= - --topn= - Show the list of functions with the largest internal counts --type-based-intrinsic-cost - Calculate intrinsics cost based only on argument types --value-cutoff= - Set the count value cutoff. Functions with the maximum count less than this value will not be printed out. (Default is 0) --verify-region-info - Verify region info (time consuming) --vp-counters-per-site= - The average number of profile counters allocated per value profiling site. --vp-static-alloc - Do static counter allocation for value profiler --wasm-enable-eh - WebAssembly exception handling --wasm-enable-sjlj - WebAssembly setjmp/longjmp handling --x86-align-branch= - Specify types of branches to align (plus separated list of types): jcc indicates conditional jumps fused indicates fused conditional jumps jmp indicates direct unconditional jumps call indicates direct and indirect calls ret indicates rets indirect indicates indirect unconditional jumps --x86-align-branch-boundary= - Control how the assembler should align branches with NOP. If the boundary's size is not 0, it should be a power of 2 and no less than 32. Branches will be aligned to prevent from being across or against the boundary of specified size. The default value 0 does not align branches. --x86-branches-within-32B-boundaries - Align selected instructions to mitigate negative performance impact of Intel's micro code update for errata skx102. May break assumptions about labels corresponding to particular instructions, and should be used with caution. --x86-pad-max-prefix-size= - Maximum number of prefixes to use for padding Generic Options: --help - Display available options (--help-hidden for more) --help-list - Display list of available options (--help-list-hidden for more) --version - Display the version of this program ```
zamazan4ik commented 1 year ago

Seems like llvm-profdata show is the thing for me, thanks! I am using it via cargo-binutils that handles sysroot stuff. According to the llvm-profdata show options, it can show all instrumented functions, create some kind of reports, show functions with some context, etc

I agree, that there is no reason to integrate llvm-profdata into the cargo-pgo right now. However, I think would be a good thing to know about the llvm-profdata somewhere in the cargo-pgo documentation. E.g. if a user wants to get some insights from their instrumentation profiles.

I have found one strange issue: at least on my machine llvm-profdata doesn't work with merged profiles. When I try to run llvm-profdata show merged_data.profdata, I get unsupported instrumentation profile format version error. On the .profraw files llvm-profdata works as expected. For the users it could be a little bit inconvenient since for getting the whole coverage image from multiple runs they need somehow aggregate data from multiple runs on their own. But I understand that should be filed as an issue to the LLVM upstream, not here :)

Thanks for the help! I think the issue could be closed for now.

Kobzol commented 1 year ago

For me it has worked with the merged profile. But you have to use the exact llvm-profdata binary from the rustc toolchain used to compile the crate. Using a different version of LLVM might not work.

I'll add info about llvm-profdata to README.

Kobzol commented 1 year ago

I added brief information about using llvm-profdata for displaying PGO profiles statistics to README. Let me know if I should add more information.