awslabs / disco

A suite of tools including a framework for creating Java Agents, for aspect-oriented tooling for distributed systems.
Apache License 2.0
55 stars 12 forks source link

Slow application start with xray agent #15

Open dan-lind opened 3 years ago

dan-lind commented 3 years ago

Crossposting this issue here as disucssed with @willarmiros (although admittedly I'm a bit late....) https://github.com/aws/aws-xray-java-agent/issues/70

I recently added aws-xray-java-agent 2.7.1 to an app running Spring boot 2.3.5.

Before I added the agent, startup typically took around 3-4 seconds. Started Application in 3.927 seconds (JVM running for 4.282)

Using aws-xray-java-agent 2.7.1, the startup of my application takes anywhere between 5-10 times longer.

Basically put the disco jars in place and added this line to my gradle.build

bootRun { jvmArgs = ["-javaagent:disco/disco-java-agent.jar=pluginPath=disco/disco-plugins"] } Started Application in 21.268 seconds (JVM running for 23.917)

Does this slowdown seem reasonably, or is something going on with my setup?

connellp commented 3 years ago

Hi - sorry for the uptick in application start times. It's hard to estimate a 'reasonable' time for application startup increase, because it depends on factors like the number of classes in your application. The agent performs a unit of work per class, each time a classloader loads a class. This can happen both at application start time, and also throughout the lifetime of the application as classes are lazily loaded, for example.

5x or 10x startup overhead is probably beyond what I would anticipate, but it is highly application specific in terms of what work the agent has to perform, and when.

What I can say is that we have a number of optimizations that we are incubating/dogfooding locally before releasing to the public github, which have been fairly impactful. There is an overall 'ignore matcher' which effectively rejects most classes in an application without performing any bytecode translations, but performs some work per class to determine the fact. We have optimized this, and also are working on removing many of the transformations altogether, to intercept more smartly.

I don't have a firm timeline for the release of these changes available right now, but they will come to the public repo in due course. In the meantime, you could experiment with including/excluding the various disco plugins (web, aws, sql) to see if one is dominant over the others, try with a complete absence of plugins (pluginPath=/some/path/with/no/plugins/in/it) to reduce the test to just the base disco agent, and see how the numbers move around.

dan-lind commented 3 years ago

I have actually tried removing the plugins, but it didn't have any noticeable impact. I will patiently wait for the optimizations 😀