DataDog / dd-trace-java

Datadog APM client for Java
https://docs.datadoghq.com/tracing/languages/java
Apache License 2.0
561 stars 281 forks source link

HTTP 400 Bad Request being marked as a false positive server error in Keycloak #7288

Open froque opened 1 month ago

froque commented 1 month ago

Running keycloak:

wget -nc --quiet https://github.com/DataDog/dd-trace-java/releases/download/v1.37.0/dd-java-agent-1.37.0.jar
docker run \
    -d \
    --rm \
    --name keycloak  \
    --network host \
    -e KEYCLOAK_ADMIN=admin \
    -e KEYCLOAK_ADMIN_PASSWORD=admin \
    -e "KC_HTTP_RELATIVE_PATH=/auth" \
    -e DD_SERVICE=keycloak \
    -e DD_WRITER_TYPE=LoggingWriter \
    -e DD_HTTP_SERVER_ROUTE_BASED_NAMING="false" \
    -e JAVA_TOOL_OPTIONS="-javaagent:/opt/dd-java-agent.jar" \
    -v $(pwd)/dd-java-agent-1.37.0.jar:/opt/dd-java-agent.jar:ro \
    quay.io/keycloak/keycloak \
        start-dev

make a HTTP request to token endpoint with an expired or invalid refresh token

❯ curl -i -s \
  -d "client_id=account" \
  -d "refresh_token=some_invalid_refresh_token" \
  -d "grant_type=refresh_token" \
  "http://localhost:8080/auth/realms/master/protocol/openid-connect/token"
HTTP/1.1 400 Bad Request
Cache-Control: no-store
Pragma: no-cache
content-length: 69
Content-Type: application/json
Referrer-Policy: no-referrer
Strict-Transport-Security: max-age=31536000; includeSubDomains
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block

{"error":"invalid_grant","error_description":"Invalid refresh token"}
docker logs keycloak >& dd-java-agent.log
docker stop keycloak
❯ grep "POST /auth/realms/master/protocol/openid-connect/token" dd-java-agent.log | cut -b 139-|jq ".[2]"  
{
  "service": "keycloak",
  "name": "jakarta-rs.request",
  "resource": "TokenEndpoint.processGrantRequest",
  "trace_id": 7190010773749954000,
  "span_id": 560244054904225900,
  "parent_id": 1901844586656103700,
  "start": 1720217563891473700,
  "duration": 87481836,
  "type": "web",
  "error": 1,
  "metrics": {
    "thread.id": 53
  },
  "meta": {
    "error.stack": "org.keycloak.services.CorsErrorResponseException: invalid_grant\n\tat org.keycloak.protocol.oidc.grants.RefreshTokenGrantType.process(RefreshTokenGrantType.java:101)\n\tat org.keycloak.protocol.oidc.endpoints.TokenEndpoint.processGrantRequest(TokenEndpoint.java:140)\n\tat org.keycloak.protocol.oidc.endpoints.TokenEndpoint$quarkusrestinvoker$processGrantRequest_6408e15340992839b66447750c221d9aaa837bd7.invoke(Unknown Source)\n\tat org.jboss.resteasy.reactive.server.handlers.InvocationHandler.handle(InvocationHandler.java:29)\n\tat io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:141)\n\tat org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:147)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)\n\tat io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:582)\n\tat org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)\n\tat org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)\n\tat org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)\n\tat org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat java.base/java.lang.Thread.run(Thread.java:1583)\n",
    "component": "jakarta-rs-controller",
    "error.type": "org.keycloak.services.CorsErrorResponseException",
    "thread.name": "executor-thread-1",
    "error.message": "invalid_grant"
  }
}

We initially though this was a problem in Keycloak https://github.com/keycloak/keycloak/issues/29451 fixed by https://github.com/keycloak/keycloak/pull/29649 but now it seems to be a problem in dd-java-agent.

image

vandonr commented 1 month ago

Hello, thank you for the report. This is marked as an error because it generates an Exception that is not handled within the scope of the span. You can see that the root span for that request (which is also the one bearing the http response code) is not marked as error. What exactly is the behavior you would like to see ?

I see from the screenshot that you are using error tracking, you can mark the issue as "Ignored" if you don't want to see it in the issues list.

froque commented 1 month ago

What exactly is the behavior you would like to see ?

Since from the HTTP client perspective the response is a HTTP 400 and not HTTP 5xx, I was expecting Datadog to not mark this as an error.

I see from the screenshot that you are using error tracking, you can mark the issue as "Ignored" if you don't want to see it in the issues list.

I missed that. I am going to try it now.

vandonr commented 1 month ago

The thing is that where the error is captured, we just see it as an exception being thrown by a Java service, which we record as an error. We don't have information on the HTTP response code at this point in time. When we handle the actual response code, we see it's a 400 and we do not mark that span as error.

I know it's not your code but an open source project, but maybe the issue is that keycloak is using an Exception to handle code flow in a not-so-exceptional situation, which is something regarded as bad practice when writing Java code. One fix would be to rework that code to use a meaningful return status rather than an exception.

froque commented 1 month ago

Ok. In that case, perhaps a new config similar to DD_TRACE_CLASSES_EXCLUDE and DD_HTTP_SERVER_ERROR_STATUSES to ignore specific exceptions (DD_TRACE_EXCEPTIONS_EXCLUDE)?

Xyaren commented 1 month ago

Or even better let us define some Java class that can decide if a specific exception should be regarded as an error or not i.e. by looking into the type, attributes or annotations.

This would also allow for a fix for #7141.