criblio / appscope

Gain observability into any Linux command or application with no code modification
https://appscope.dev
Apache License 2.0
266 stars 33 forks source link

Attach to Java process missing Byte Code Instrumentation #576

Open michalbiesek opened 3 years ago

michalbiesek commented 3 years ago

Currently, when scoping the Java process we use initJavaAgent only when using the LD_PRELOAD mechanism. initJavaAgent starts interaction with JVM for Byte Code Instrumentation - see Agent_OnLoad method. With Byte Code Instrumentation we are able to scope HTTPS events from Java.
The Byte Code Instrumentation should be done when using scope attach.

The starting point:

michalbiesek commented 3 years ago

The main idea is to inform the existing JVM - the one which we can attach to be aware of Agent_OnAttach interface.

[DRAFT] Java process Byte code instrumentation on attach

- the main goal is to support Byte Code Instrumentation for existing/running
  Java process
- Agent_OnAttach called from JVM method can use same logic as Agent_OnLoad
- what we need to do is to trigger events/inform JVM that libscope.so is a
  java agent, like it is done for LD_PRELOAD
  "set JAVA_TOOL_OPTIONS so that JVM can load libscope.so as a java agent"
- it can be done with via Java layer as AgentRunner.java - need to figure it out
  how we can do this using JNI - for "loadAgentPath" to libscope.so

See current state of work on: https://github.com/criblio/appscope/tree/draft-Agent-Runner

michalbiesek commented 3 years ago

Current status:

michalbiesek commented 3 years ago

Current status: During working on this issue after checking different versions of Java I discover a crash when attaching on Java with SIGINT on the Dockerfile.glibc, this will be addressed in #619. At first glance, the segfault observed on the current state of Pull Requests #619 and #617 is related to doGotcha functionality and incorrect behaviour of restore write permissions.

michalbiesek commented 3 years ago

Current status:

619 address the bug with the corner case of handling Shared Object library and GOT entry:

7f4bc9a37000-7f4bc9a3a000 ---p 00000000 00:00 0
7f4bc9a3a000-7f4bc9b38000 rw-p 00000000 00:00 0

In current implementation of osGetPageProt we identify 7f4bc9a3a000 address as the one that belong to 7f4bc9a37000-7f4bc9a3a000 memory range. From this range we read permissions (which is no read, no write, no executable). Here we detect that permissions don't have write access: https://github.com/criblio/appscope/blob/26a82a53dffeab42ee282135ea6d8fb467b20e06/src/scopeelf.c#L199-L204 So we add write permissions. In the end, we will restore the permissions, which we read in the beginning: https://github.com/criblio/appscope/blob/26a82a53dffeab42ee282135ea6d8fb467b20e06/src/scopeelf.c#L229-L234 This will result with remove rw permission from the address starting from 7f4bc9a3a000 Then when the program will run and try to access got entry from which we revoke permission it will segfault.

The last commit in #619 addresses the previously described problem.

michalbiesek commented 3 years ago

Regarding https://github.com/criblio/appscope/pull/617

michalbiesek commented 3 years ago

With Agent_On_Attach we can use GetLoadedClasses to reiterate over expected classes and force JVM to call ClassFileLoadHook unfortunately RetransformClasses got following limitations: "The retransformation must not add, remove or rename fields or methods" so we cannot use a current mechanism based on javaCopyMethod. I was trying to use the DefineClass to copy the existing class with new name I received an JVMTI_ERROR_NAMES_DONT_MATCH from https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html#RetransformClasses

michalbiesek commented 2 years ago

Current status: The Agent_On_Attach method works fine for classes that are added after attaching process:

/opt/tomcat/bin/catalina.sh run &
/opt/appscope/bin/linux/scope attach java
curl -k https://localhost:8443  <<-- this will load "sun/nio/ch/SocketChannelImpl" class

What doesn't work is following scenario:

/opt/tomcat/bin/catalina.sh run &
curl -k https://localhost:8443  <<-- this will load "sun/nio/ch/SocketChannelImpl" class
/opt/appscope/bin/linux/scope attach java <<-- this will do copy of "sun/nio/ch/SocketChannelImpl" class
curl -k https://localhost:8443 

Calling the methods from the copy of the class fails.

Summary:

michalbiesek commented 2 years ago

Status:

I worked on verifying if the Copying class is possible. I focus on a more simple case - created the Test class which only contains 2 methods - see last commit in #617 for details and added a mechanism to Scope to copy the class and manipulate print method to be a native one.

The behavior when we want to call the original print method (not native one):

Other:

After copying the class when we additionally call javaCopyMethod(classInfo, classInfo->methods[methodIndex], "__print"); and try to using the print method loaded from Copied class (which shouldn't be native) we results with calling second_print implementation

Next step: Verify the method indexes and java class structure to see if additional action is required during the copy class mechanism - possibly we referred to old class code - so the copying class mechanism must be adjusted.

michalbiesek commented 2 years ago

Status:

I worked on verifying if the following mechanism works fine in the case of Agent_OnLoad (before any Java libraries are loaded) and Agent_OnAttach (after some/all Java libraries are loaded):

With this instrumentation, we will be able to intercept native barmethod on which we call:

JNIEXPORT void JNICALL
Java_class_name_foofoo_barmethod(JNIEnv *jni, jobject obj, jstring str)
{
  // perform scope logic 
  // locate the class_name_foo__
  // locate the __barmethod in class_name_foo__
  // call the original method backuped in __barmethod   ## (1)
}

Results:

With Agent_OnLoad code succeed only when we - add copy of bar_method (__barmethod) - javaCopyMethod in original class class_name_foofoo With Agent_OnLoad without adding copy of bar_method (__barmethod) - javaCopyMethod in original class class_name_foofoo we hit SEGV in (1) With Agent_OnAttach we cannot add new methods in the original class class_name_foofoo in (1) we call native variant of method again until we hit stack overflow

michalbiesek commented 2 years ago

Status:

The current implementation supports:

Next steps and limitations