apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.69k stars 1.04k forks source link

Add -Ptests.asyncprofile option. [LUCENE-10099] #11137

Open asfimport opened 3 years ago

asfimport commented 3 years ago

Allow async-profiler in addition to the JFR profiler.

async profiler description:

This project is a low overhead sampling profiler for Java that does not suffer from Safepoint bias problem. 
It features HotSpot-specific APIs to collect stack traces and to track memory allocations.
The profiler works with OpenJDK, Oracle JDK and other Java runtimes based on the HotSpot JVM.
async-profiler can trace the following kinds of events:

* CPU cycles
* Hardware and Software performance counters like cache misses, branch misses, page faults, context switches etc.
* Allocations in Java Heap
* Contented lock attempts, including both Java object monitors and ReentrantLocks

 

The "tests.asyncprofile" mode switch turns on the async-profiler (https://github.com/jvm-profiling-tools/async-profiler) during test execution, and prints a simple summary at the end.

For example, top-10 histogram of methods (cpu samples):

gradlew -p lucene/core test -Ptests.asyncprofile=true

Alternatively, you can profile heap allocations instead: TBD

Requirements

The following permissions need to be set to allow for the proper permissions for profiling:

# sysctl kernel.perf_event_paranoid=1 
# sysctl kernel.kptr_restrict=0

Migrated from LUCENE-10099 by Mark Robert Miller (@markrmiller), updated Sep 22 2021 Linked issues:

Pull requests: https://github.com/apache/lucene/pull/295

asfimport commented 3 years ago

Mark Robert Miller (@markrmiller) (migrated from JIRA)

Their wiki has some info on that:

“CPU profiling

In this mode profiler collects stack trace samples that include Java methods, native calls, JVM code and kernel functions.

The general approach is receiving call stacks generated by perf_events and matching them up with call stacks generated by AsyncGetCallTrace, in order to produce an accurate profile of both Java and native code. Additionally, async-profiler provides a workaround to recover stack traces in some corner cases where AsyncGetCallTrace fails.

This approach has the following advantages compared to using perf_events directly with a Java agent that translates addresses to Java method names:

Works on older Java versions because it doesn't require -XX:+PreserveFramePointer, which is only available in JDK 8u60 and later.

Does not introduce the performance overhead from -XX:+PreserveFramePointer, which can in rare cases be as high as 10%.

Does not require generating a map file to map Java code addresses to method names.

Works with interpreter frames.

Does not require writing out a perf.data file for further processing in user space scripts.

If you wish to resolve frames within libjvm, the debug symbols are required.”

asfimport commented 3 years ago

Mark Robert Miller (@markrmiller) (migrated from JIRA)

Looks like the other item I was going to add doesn’t so much fit. I also wanted to add support for jmhs perfasm, which pulls out code region hotspots from perf and lines it up with assembly. It would be some dancing around to get it fully nice and integrated compared to this issue though, and more importantly doesn’t really seem to be that useful with these tests - it’s much more oriented towards running a single benchmark than tests.

asfimport commented 3 years ago

Mark Robert Miller (@markrmiller) (migrated from JIRA)

Will it help this profiler?

I think the reason the output above is not listing proper java methods may be how I am plugging in a custom agent (trying to keep the ability to allow filtering by java thread properties). I may need to just use the agent built into the .so library.

asfimport commented 3 years ago

Mark Robert Miller (@markrmiller) (migrated from JIRA)

Okay, this actually seems to be working as intended, it's just those native methods for some of these quick tests are just what happens to be on top. I tried out linking to the .so lib built in agent and the result is not really any different, it's just a bit harder to control some things. You can see the correct Java methods showing up in runs though. Some of what comes out will depend on the sampling rate, currently set to 10ms:

PROFILE SUMMARY from 140 events (total: 140)
  tests.profile.mode=cpu
  tests.profile.count=10
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT       CPU SAMPLES   STACK
5.00%         7             #SymbolTable::lookup_dynamic()
2.86%         4             #IntervalWalker::walk_to()
2.14%         3             #no_Java_frame()
2.14%         3             #ValueStack::values_do()
1.43%         2             org.apache.lucene.store.ByteBuffersDataOutput#<init>()
1.43%         2             #I2C/C2I adapters()
1.43%         2             java.util.Formatter$FormatSpecifier#print()
1.43%         2             #copy_user_generic_string()
1.43%         2             #vframe::sender()
0.71%         1             #ConstantPool::impl_klass_ref_index_at()