Closed sblumenthalAtBraze closed 4 years ago
Hi @sblumenthalAtBraze.
Can you provide the following info:
That will tell us where to look to investigate.
@realark can do!
Like I said, I am using spark-core 2.6.0. The corresponding jetty and jersey client versions that are pulled from that are:
Gradle: org.eclipse.jetty.websocket:websocket-api:9.4.4.v20170414
Gradle: org.eclipse.jetty.websocket:websocket-client:9.4.4.v20170414
Gradle: org.eclipse.jetty.websocket:websocket-common:9.4.4.v20170414
Gradle: org.eclipse.jetty.websocket:websocket-server:9.4.4.v20170414
Gradle: org.eclipse.jetty.websocket:websocket-servlet:9.4.4.v20170414
Gradle: org.eclipse.jetty:jetty-client:9.4.4.v20170414
Gradle: org.eclipse.jetty:jetty-continuation:9.2.22.v20170606
Gradle: org.eclipse.jetty:jetty-http:9.4.4.v20170414
Gradle: org.eclipse.jetty:jetty-io:9.4.4.v20170414
Gradle: org.eclipse.jetty:jetty-jaas:9.2.22.v20170606
Gradle: org.eclipse.jetty:jetty-jmx:9.2.22.v20170606
Gradle: org.eclipse.jetty:jetty-security:9.4.4.v20170414
Gradle: org.eclipse.jetty:jetty-server:9.4.4.v20170414
Gradle: org.eclipse.jetty:jetty-servlet:9.4.4.v20170414
Gradle: org.eclipse.jetty:jetty-servlets:9.2.22.v20170606
Gradle: org.eclipse.jetty:jetty-util:9.4.4.v20170414
Gradle: org.eclipse.jetty:jetty-webapp:9.4.4.v20170414
Gradle: org.eclipse.jetty:jetty-xml:9.4.4.v20170414
Gradle: org.glassfish.hk2.external:aopalliance-repackaged:2.5.0-b30
Gradle: org.glassfish.hk2.external:javax.inject:2.5.0-b30
Gradle: org.glassfish.hk2:hk2-api:2.5.0-b30
Gradle: org.glassfish.hk2:hk2-locator:2.5.0-b30
Gradle: org.glassfish.hk2:hk2-utils:2.5.0-b30
Gradle: org.glassfish.hk2:osgi-resource-locator:1.0.1
Gradle: org.glassfish.jersey.bundles.repackaged:jersey-guava:2.25
Gradle: org.glassfish.jersey.containers:jersey-container-servlet:2.25
Gradle: org.glassfish.jersey.containers:jersey-container-servlet-core:2.25
Gradle: org.glassfish.jersey.core:jersey-client:2.25
Gradle: org.glassfish.jersey.core:jersey-common:2.25
Gradle: org.glassfish.jersey.core:jersey-server:2.25
Gradle: org.glassfish.jersey.ext:jersey-bean-validation:2.25
Gradle: org.glassfish.jersey.media:jersey-media-jaxb:2.25
Gradle: org.glassfish.web:javax.el:2.2.4
Aside from that, the other dependencies for the app are:
compile "org.apache.kafka:kafka-clients:1.0.0-cp1"
compile "org.apache.kafka:kafka-streams:1.0.0-cp1"
compile "io.confluent:kafka-avro-serializer:4.0.0"
compile "io.confluent:kafka-json-serializer:4.0.0"
compile "io.confluent:kafka-schema-registry-client:4.0.0"
compile "org.apache.avro:avro:1.7.7"
compile group: 'com.fasterxml.jackson.core', name: 'jackson-databind', version: '2.8.4'
compile 'com.google.guava:guava:20.0'
compile 'com.google.code.gson:gson:2.8.0'
compile 'com.datadoghq:java-dogstatsd-client:2.3'
compile 'javax.ws.rs:javax.ws.rs-api:2.0.1'
compile 'com.sparkjava:spark-core:2.6.0'
compile 'org.glassfish.jersey.core:jersey-common:2.25'
compile 'org.glassfish.jersey.core:jersey-client:2.25'
compile 'org.jetbrains:annotations-java5:15.0'
compile 'io.sentry:sentry-log4j:1.7.5'
compile 'com.datadoghq:dd-trace-api:0.9.0'
@sblumenthalAtBraze Thanks for your response!
Could you please also try to add -Ddd.integration.jax-rs.enabled=false
to the JVM arguments and see if that helps that problem at all?
@mar-kolya That actually appears to have fixed the issue! Not quite sure why the out of the box integration would cause the problem, but it doesn't appear that any additional data is being lost, so no harm done then!
I will keep an eye on it for now to make sure nothing goes wrong, but it looks like I might be good
@sblumenthalAtBraze Thanks!
I think this gives us some information to look into fixing this properly. I hope this is 'good enough' as a workaround for now.
Thanks for reporting the problem!
@sblumenthalAtBraze
Could you please provide us with details about how you create client
(as in at <application code, call to client.target(uri).request().get()>
) object? Do you have any features enabled? Any special configuration? Anything would be helpful.
Thanks!
Could you please also share version of your JVM and any logs of the DataDog java agent start up that you can (when -Ddd.integration.jax-rs.enabled
is set to true)?
Unfortunately so far we cannot reproduce the problem even with exact same library versions that you are using.
Thanks!
@sblumenthalAtBraze We are still unable to figure out what is going on here. Reviewing the code in the framework, it seems impossible to get into this state. Are you able to reproduce this problem locally? If so, perhaps you have some time one of us could jump on a call to go through it in a debugger.
Thanks!
@sblumenthalAtBraze We've made quite a bit of changes since 0.9.0
. Would you be willing to try again with the latest release and see if the issue has been resolved?
You can also try out this build from #479 that I did to fix the problem reported in #437 that might help you as well.
So, I figure I will post an update here, because I tried enabling the jax-rs integration this week using agent version 0.25 and ran into the same problem.
What is happening is that, SOMETIMES, not always, a redeploy of the code would result in 100% of requests throwing the same stacktrace as the one posted at the top.
Since I had first opened this ticket, I noticed that we were not packaging all of the required runtime dependencies that our application needed into the docker container that we were creating. Instead, we were cherry-picking a small handful of them and wget'ing them in the container. So, I fixed that and thought that might have been the issue behind this. However, the NPE still seems to throw sometimes.
Then, I noticed that the stacktrace was indicating line numbers consistent with jersey-common version 2.27. My code lists 2.25 as a dependency, yet both versions were being loaded into the classpath. So, I thought maybe it was some version mismatch, and upgraded the version to 2.27 to be consistent. Yet, the issue would still rarely occur.
I was unable to figure out what during deployment and startup was causing the app to get into this broken state, and as a result I can not reliably reproduce it, but it was happening daily in prod, so today I re-enabled the integration.
With all of that said, looking at the traces that were produced when the integration was enabled, it looks like the information that was contained in the span for the jax-rs objects were identical to those created by the http.request span, and both were being reported with the exact same timestamp in each trace. Personally, I do not need both, and I am going to be leaving this integration disabled as a result anyway, but I wanted to give a status update that the issue still persists for me.
Using the current released Java agent version (0.9.0) WITHOUT enabling the (still in beta) jetty or spark instrumentation causes my jetty application to error out when trying to create outgoing get requests.
The stack trace I get:
This error DOES NOT occur when the Datadog Java agent is not present on the box. The javaagent string I am using is:
And the agent running on the box is set up using the chef-datadog recipe. I have this code deployed to a handful of servers, and deploying several times a day. It is usually only one of the servers that errors out on all outgoing requests, and the server changes with each new deploy
Edit: Some more details, the application is running in a docker container and is sending the trace data to a datadog agent running on the host machine. The application uses spark-core 2.6.0