OpenTSDB / opentsdb

A scalable, distributed Time Series Database.
http://opentsdb.net
GNU Lesser General Public License v2.1
5k stars 1.25k forks source link

NullPointerException occurs in error reporting mechanism when reporting certain Exceptions thrown elsewhere #2189

Open twegener-embertec opened 3 years ago

twegener-embertec commented 3 years ago

Since trying out a relatively recent checkout of opentsdb master (post 2.4.0), the following error sometimes occurs:

2021-10-12 20:27:37,727 [OpenTSDB Old I/O server worker (parentId: 908074792, [id: 0x36201f28, /0.0.0.0:4242])] INFO  [ConnectionManager.handleUpstream] - [id: 0x41e7c6f2, /redacted:47796 => /redacted:4242] CONNECTED: /redacted:47796
2021-10-12 20:27:37,729 [OpenTSDB Old I/O server worker (parentId: 908074792, [id: 0x36201f28, /0.0.0.0:4242])] WARN  [HttpQuery.logWarn] - [id: 0x3474fb86, /redacted:47792 => /redacted:4242] Bad Request on /api/query/last?back_scan=96&timeseries=redacted%7Bredacted%7D: null
2021-10-12 20:27:37,730 [OpenTSDB Old I/O server worker (parentId: 908074792, [id: 0x36201f28, /0.0.0.0:4242])] ERROR [RpcHandler.logError] - [id: 0x3474fb86, /redacted:47792 => /redacted:4242] Unexpected exception caught while serving DefaultHttpRequest(chunked: false)
GET /api/query/last?back_scan=96&timeseries=redacted%7Bredacted%7D HTTP/1.1
Host: tsdb-01:4242
Accept-Encoding: gzip, deflate
Connection: close
java.lang.NullPointerException: null
    at net.opentsdb.tsd.HttpSerializer.formatErrorV1(HttpSerializer.java:855) ~[tsdb-2.4.1-SNAPSHOT.jar:a945263]
    at net.opentsdb.tsd.HttpQuery.badRequest(HttpQuery.java:411) ~[tsdb-2.4.1-SNAPSHOT.jar:a945263]
    at net.opentsdb.tsd.RpcHandler.handleHttpQuery(RpcHandler.java:294) ~[tsdb-2.4.1-SNAPSHOT.jar:a945263]
    at net.opentsdb.tsd.RpcHandler.messageReceived(RpcHandler.java:132) ~[tsdb-2.4.1-SNAPSHOT.jar:a945263]
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [netty-3.10.6.Final.jar:na]
...

I.e. it is hitting a NullPointerException while it is attempting to report another exception.

The relevant code is in the following file:

src/tsd/HttpSerializer.java

  public ChannelBuffer formatErrorV1(final BadRequestException exception) {
    StringBuilder output = 
      new StringBuilder(exception.getMessage().length() * 2);

exception.getMessage() returns null, and so a NullPointerException is hit when attempting to call the length() method on that.

As such, it means the nature of the other exception that this function was trying to report gets hidden.

A simple fix would be to coerce the result of exception.getMessage() to a string at the start of this function before calling length(). But arguably, this initial capacity argument to StringBuilder is pretty arbitrary anyway, so it might was well just be a fixed constant rather than being a function of exception.getMessage().length().

This can occur when a plain RuntimeException occurs somewhere else (without message) and then is converted directly into a BadRequestException (and so still has no message). There are several code paths where this can occur. (In this particular case, I strongly suspect that the other error triggering this particular case is #2186, e.g. possibly hitting an ArrayIndexOutOfBoundsException in src/core/Interval.java extractDataPoints.)

Note that this bug in the error handling appears to be really old (since 2013), but would only rarely get hit, since it would only get hit when a RuntimeException occurs elsewhere, which would only be when there is a bug of that nature elsewhere in the opentsdb code.

It would be good to fix this, so that other more significant exception error messages are not hidden from the logs.

twegener-embertec commented 3 years ago

Relevant code paths:

src/tsd/HttpSerializer.java

  public ChannelBuffer formatErrorV1(final BadRequestException exception) {
    StringBuilder output = 
      new StringBuilder(exception.getMessage().length() * 2);

src/tsd/HttpQuery.java

  public void badRequest(final BadRequestException exception) {
    logWarn("Bad Request on " + request().getUri() + ": " + exception.getMessage());
    if (this.api_version > 0) {
      // always default to the latest version of the error formatter since we
      // need to return something
      switch (this.api_version) {
        case 1:
        default:
          sendReply(exception.getStatus(), serializer.formatErrorV1(exception));
      }

That warning log message appeared, and exception.getMessage() was stringified as "null", because exception.getMessage() is null.

Here is the proposed code path that led to this getting hit in my particular case:

src/tsd/RpcHandler.java

  private void handleHttpQuery(final TSDB tsdb, final Channel chan, final HttpRequest req) {
    AbstractHttpQuery abstractQuery = null;
    try {
      abstractQuery = createQueryInstance(tsdb, req, chan);
...
    } catch (BadRequestException ex) {
      if (abstractQuery == null) {
        LOG.warn("{} Unable to create query for {}. Reason: {}", chan, req, ex);
        sendStatusAndClose(chan, HttpResponseStatus.BAD_REQUEST);
      } else {
        abstractQuery.badRequest(ex);
      }

src/tsd/QueryRpc.java

  public void execute(final TSDB tsdb, final HttpQuery query) 
...
    final String[] uri = query.explodeAPIPath();
    final String endpoint = uri.length > 1 ? uri[1] : ""; 

    if (endpoint.toLowerCase().equals("last")) {
      handleLastDataPointQuery(tsdb, query);
...
  private void handleLastDataPointQuery(final TSDB tsdb, final HttpQuery query) {
...
    final class ErrBack implements Callback<Object, Exception> {
      public Object call(final Exception e) throws Exception {
        Throwable ex = e;
        while (ex.getClass().equals(DeferredGroupException.class)) {
          if (ex.getCause() == null) {
            LOG.warn("Unable to get to the root cause of the DGE");
            break;
          }
          ex = ex.getCause();
        }
        if (ex instanceof RuntimeException) {
          throw new BadRequestException(ex);
...
          final TSUIDQuery tsuid_query = 
              new TSUIDQuery(tsdb, sub_query.getMetric(), 
                  sub_query.getTags() != null ? 
                      sub_query.getTags() : Collections.EMPTY_MAP);
          if (data_query.getBackScan() > 0) {
            deferreds.add(tsuid_query.getLastPoint(data_query.getResolveNames(),
...
    } catch (Exception e) {
      Throwable ex = e;
      while (ex.getClass().equals(DeferredGroupException.class)) {
        if (ex.getCause() == null) {
          LOG.warn("Unable to get to the root cause of the DGE");
          break;
        }
        ex = ex.getCause();
      }
      if (ex instanceof RuntimeException) {
        throw new BadRequestException(ex);
      } else {
        throw new RuntimeException("Shouldn't be here", e);
      }

src/meta/TSUIDQuery.java

  public Deferred<IncomingDataPoint> getLastPoint(final boolean resolve_names, 
      final int back_scan) {
...
    class TSUIDCB implements Callback<Deferred<IncomingDataPoint>, byte[]> {
...
        final GetRequest get = new GetRequest(tsdb.dataTable(), key);
        get.family(TSDB.FAMILY());
        return tsdb.getClient().get(get).addCallbackDeferring(new LastPointCB());
...
    if (tsuid == null) {
      return tsuidFromMetric(tsdb, metric, tags)
          .addCallbackDeferring(new TSUIDCB());
    }
    try {
      // damn typed exceptions....
      return new TSUIDCB().call(null);
    } catch (Exception e) {
      return Deferred.fromError(e);
    }
  }
...
  private class LastPointCB implements Callback<Deferred<IncomingDataPoint>, 
    ArrayList<KeyValue>> {
...
      final IncomingDataPoint dp = 
          new Internal.GetLastDataPointCB(tsdb).call(row);

src/core/Internal.java

  public static class GetLastDataPointCB implements Callback<IncomingDataPoint, 
    ArrayList<KeyValue>> {
 ...
       final ArrayList<Cell> cells = extractDataPoints(row, row.size());
...
  public static ArrayList<Cell> extractDataPoints(final ArrayList<KeyValue> row,
      final int estimated_nvalues) {
    final ArrayList<Cell> cells = new ArrayList<Cell>(estimated_nvalues);
    for (final KeyValue kv : row) {
      final byte[] qual = kv.qualifier();
      final int len = qual.length;
      final byte[] val = kv.value();

      // when enable_appends set to true, should get qualifier and value from the HBase Column Value
      if (kv.qualifier()[0] == AppendDataPoints.APPEND_COLUMN_PREFIX) {
        int idx = 0;
        int q_length = 0;
        int v_length = 0;
        while (idx < kv.value().length) {
          q_length = Internal.getQualifierLength(kv.value(), idx);
          v_length = Internal.getValueLengthFromQualifier(kv.value(), idx);
          final byte[] q = new byte[q_length];
          final byte[] v = new byte[v_length];
          System.arraycopy(kv.value(),idx,q,0,q_length);
          System.arraycopy(kv.value(),idx + q_length,v, 0, v_length);
          idx += q_length + v_length;

          final Cell cell = new Cell(q, v);
          cells.add(cell);
        }

E.g. it could be hitting an ArrayIndexOutOfBoundsException here intermittently due to #2186, for certain timestamps/values.

src/tsd/BadRequestException.java

  public BadRequestException(final Throwable cause) {
    this(cause.getMessage(), cause);
  }

This variant of the BadRequestException can result in an object that has a null message.

twegener-embertec commented 3 years ago

Arguably, the problem is that src/tsd/QueryRpc.java should not assume that it is okay to convert all RuntimeException instances into BadRequestException instances in handleLastDataPointQuery, but that aspect is a bit harder to untangle. In the meantime, the formatErrorV1(final BadRequestException exception) still needs to be made robust against exceptions that have no message, since that is possible to occur.

Here are some additional tests for test/tsd/TestHttpQuery.java that exercise that aspect:

  @Test
  public void badRequestDefaultSerializerRuntimeException() {
    HttpQuery query = NettyMocks.getQuery(tsdb, "/api/error");
    query.getQueryBaseRoute();
    try {
      throw new RuntimeException("Some runtime error");
    } catch (RuntimeException e) {
      try {
        throw new BadRequestException(e);
      } catch (BadRequestException bre) {
        query.badRequest(bre);
      }
    }
    assertEquals(HttpResponseStatus.BAD_REQUEST, query.response().getStatus());
    assertEquals(
        "{\"error\":{\"code\":400,\"message\":\"Some runtime error\"",
        query.response().getContent().toString(Charset.forName("UTF-8"))
        .substring(0, 51));
  }

  @Test
  public void badRequestDefaultSerializerRuntimeExceptionBlank() {
    HttpQuery query = NettyMocks.getQuery(tsdb, "/api/error");
    query.getQueryBaseRoute();
    try {
      throw new RuntimeException();
    } catch (RuntimeException e) {
      try {
        throw new BadRequestException(e);
      } catch (BadRequestException bre) {
        query.badRequest(bre);
      }
    }
    assertEquals(HttpResponseStatus.BAD_REQUEST, query.response().getStatus());
    assertEquals(
        "{\"error\":{\"code\":400,\"message\":\"An unknown exception occurred\",\"trace\":",
        query.response().getContent().toString(Charset.forName("UTF-8"))
        .substring(0, 71));
  }

  @Test
  public void badRequestDefaultSerializerNullPointerException() {
    HttpQuery query = NettyMocks.getQuery(tsdb, "/api/error");
    query.getQueryBaseRoute();
    try {
      throw new NullPointerException();
    } catch (NullPointerException e) {
      try {
        throw new BadRequestException(e);
      } catch (BadRequestException bre) {
        query.badRequest(bre);
      }
    }
    assertEquals(HttpResponseStatus.BAD_REQUEST, query.response().getStatus());
    assertEquals(
        "{\"error\":{\"code\":400,\"message\":\"An unknown exception occurred\",\"trace\":",
        query.response().getContent().toString(Charset.forName("UTF-8"))
        .substring(0, 71));
  }
manolama commented 3 years ago

Arguably, the problem is that src/tsd/QueryRpc.java should not assume that it is okay to convert all RuntimeException instances into BadRequestException instances in handleLastDataPointQuery, but that aspect is a bit harder to untangle.

Agreed, definitely messy and hopefully a bit cleaner in 3.x

Great catch on this one, I doubt very many folks ran into it. Would you mind throwing in a PR that properly initializes the StringBuilder and adds those UTs? A constant is fine by me. Or would you like me to apply these? Thanks!

twegener-embertec commented 3 years ago

I wasn't sure on the best way to address this. My interim patch just tries to follow the pattern laid down by formatErrorV1(final Exception exception):

diff --git a/src/tsd/HttpSerializer.java b/src/tsd/HttpSerializer.java
index f6108466..bf21d8e5 100644
--- a/src/tsd/HttpSerializer.java
+++ b/src/tsd/HttpSerializer.java
@@ -851,22 +851,28 @@ public abstract class HttpSerializer {
    * @return A standard JSON error
    */
   public ChannelBuffer formatErrorV1(final BadRequestException exception) {
+    // It is possible for mesage and details to be null.
+    String message = exception.getMessage();
+    String details = exception.getDetails();
+    if (message == null) {
+      message = "An unknown exception occurred";
+    }
     StringBuilder output = 
-      new StringBuilder(exception.getMessage().length() * 2);
+      new StringBuilder(message.length() * 2);
     final String jsonp = query.getQueryStringParam("jsonp");
     if (jsonp != null && !jsonp.isEmpty()) {
       output.append(query.getQueryStringParam("jsonp") + "(");
     }
     output.append("{\"error\":{\"code\":");
     output.append(exception.getStatus().getCode());
-    final StringBuilder msg = new StringBuilder(exception.getMessage().length());
-    HttpQuery.escapeJson(exception.getMessage(), msg);
+    final StringBuilder msg = new StringBuilder(message.length());
+    HttpQuery.escapeJson(message, msg);
     output.append(",\"message\":\"").append(msg.toString()).append("\"");
-    if (!exception.getDetails().isEmpty()) {
-      final StringBuilder details = new StringBuilder(
-          exception.getDetails().length());
-      HttpQuery.escapeJson(exception.getDetails(), details);
-      output.append(",\"details\":\"").append(details.toString()).append("\"");
+    if (details != null && !details.isEmpty()) {
+      final StringBuilder edetails = new StringBuilder(
+          details.length());
+      HttpQuery.escapeJson(details, edetails);
+      output.append(",\"details\":\"").append(edetails.toString()).append("\"");
     }
     if (query.showStackTrace()) {
       ThrowableProxy tp = new ThrowableProxy(exception);

I'll leave it to you to season to your taste.

But if it makes life easier for you I can post a PR. Do I need to fill out a contributor agreement before submitting PRs for small things like that?