OpenTSDB / opentsdb

A scalable, distributed Time Series Database.
http://opentsdb.net
GNU Lesser General Public License v2.1
4.98k stars 1.25k forks source link

Query to /api/exp fails without response #817

Open katameru opened 8 years ago

katameru commented 8 years ago

On the latest version of opentsdb (next/HEAD, currently 1409b550ae7f7fcf9c6cbe7e0dbf59c7363618ad), this query causes an exception during execution, and another exception during exception handling so no message is returned.

{

"time": {
    "start": "1h-ago",
    "aggregator": "sum"
},
"filters": [
    { "id": "f1", "tags": [] }
],
"metrics": [
    { "id": "m1", "metric": "metric", "filter": "f1" },
{ "id": "m2", "metric": "metric", "filter": "f1" },
{ "id": "m3", "metric": "metric", "filter": "f1" }
],
"expressions": [
{ "id": "e1", "expr": "m1" },
{ "id": "e2", "expr": "m2" },
{ "id": "e3", "expr": "m3" },
    { "id": "e12","expr": "e1+e2" }
]

}

johann8384 commented 8 years ago

I sent this query:

{
    "time": {
        "start": "1h-ago",
        "aggregator": "sum"
    },
    "filters": [
        { "id": "f1", "tags": [] }
    ],
    "metrics": [
        { "id": "m1", "metric": "interface.p2p0.if_packets.tx", "filter": "f1" },
        { "id": "m2", "metric": "interface.p2p0.if_packets.tx", "filter": "f1" },
        { "id": "m3", "metric": "interface.bridge0.if_octets.rx", "filter": "f1" }
    ],
    "expressions": [
        { "id": "e1", "expr": "m1" },
        { "id": "e2", "expr": "m2" },
        { "id": "e3", "expr": "m3" },
        { "id": "e12","expr": "e1+e2" }
    ]
}

output:

2016-07-06 17:49:33,980 INFO  [OpenTSDB I/O Boss #1] ConnectionManager: [id: 0xe287bfdf, /0:0:0:0:0:0:0:1:53604 => /0:0:0:0:0:0:0:1:4242] OPEN
2016-07-06 17:49:33,981 INFO  [OpenTSDB I/O Worker #1] ConnectionManager: [id: 0xe287bfdf, /0:0:0:0:0:0:0:1:53604 => /0:0:0:0:0:0:0:1:4242] BOUND: /0:0:0:0:0:0:0:1:4242
2016-07-06 17:49:33,981 INFO  [OpenTSDB I/O Worker #1] ConnectionManager: [id: 0xe287bfdf, /0:0:0:0:0:0:0:1:53604 => /0:0:0:0:0:0:0:1:4242] CONNECTED: /0:0:0:0:0:0:0:1:53604
2016-07-06 17:49:34,322 INFO  [OpenTSDB I/O Worker #1] QueryStats: Executing new query={"query":{"start":"1h-ago","end":null,"timezone":null,"options":null,"padding":false,"queries":[{"aggregator":"sum","metric":"interface.p2p0.if_packets.tx","tsuids":null,"downsample":null,"rate":false,"filters":[],"index":0,"tags":{},"explicitTags":false,"rateOptions":null,"filterTagKs":null},{"aggregator":"sum","metric":"interface.p2p0.if_packets.tx","tsuids":null,"downsample":null,"rate":false,"filters":[],"index":1,"tags":{},"explicitTags":false,"rateOptions":null,"filterTagKs":null},{"aggregator":"sum","metric":"interface.bridge0.if_octets.rx","tsuids":null,"downsample":null,"rate":false,"filters":[],"index":2,"tags":{},"explicitTags":false,"rateOptions":null,"filterTagKs":null}],"delete":false,"showTSUIDs":false,"msResolution":false,"showQuery":false,"showStats":false,"showSummary":false,"useCalendar":false,"globalAnnotations":false,"noAnnotations":false},"exception":"null","executed":1,"user":null,"stats":{},"requestHeaders":{"authorization":"accessKey:digest:epoch:nonce","Origin":"chrome-extension://fdmmgilgnpjigdojojpjoooidkmcomcm","Cache-Control":"no-cache","Accept":"*/*","Connection":"keep-alive","User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36","Host":"localhost:4242","DNT":"1","Accept-Encoding":"gzip, deflate","Accept-Language":"en-US,en;q=0.8","Content-Length":"562","Content-Type":"text/plain;charset=UTF-8"},"numRunningQueries":1,"httpResponse":null,"queryStartTimestamp":1467845374280,"queryCompletedTimestamp":0,"sentToClient":false}
2016-07-06 17:49:34,349 INFO  [AsyncHBase I/O Worker #1] TsdbQuery: TsdbQuery(start_time=1467841774280, end_time=1467845374280, metric=[0, 0, 56], filters=[], rate=false, aggregator=sum, group_bys=()) matched 1 rows in 1 spans in 14.212251ms
2016-07-06 17:49:34,351 INFO  [AsyncHBase I/O Worker #1] TsdbQuery: TsdbQuery(start_time=1467841774280, end_time=1467845374280, metric=[0, 0, 56], filters=[], rate=false, aggregator=sum, group_bys=()) matched 1 rows in 1 spans in 13.744573ms
2016-07-06 17:49:34,351 INFO  [AsyncHBase I/O Worker #1] TsdbQuery: TsdbQuery(start_time=1467841774280, end_time=1467845374280, metric=[0, 0, 42], filters=[], rate=false, aggregator=sum, group_bys=()) matched 1 rows in 1 spans in 13.740798ms

Got the first Exp:

2016-07-06 17:49:34,381 ERROR [AsyncHBase I/O Worker #1] QueryExecutor: Query exception:
java.lang.NullPointerException: null
    at net.opentsdb.tsd.QueryExecutor$1QueriesCB.call(QueryExecutor.java:327) ~[tsdb-2.3.0-RC1.jar:85047ea]
    at net.opentsdb.tsd.QueryExecutor$1QueriesCB.call(QueryExecutor.java:251) ~[tsdb-2.3.0-RC1.jar:85047ea]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) [async-1.4.0.jar:na]

then:

2016-07-06 17:49:34,382 WARN  [AsyncHBase I/O Worker #1] HttpQuery: [id: 0xe287bfdf, /0:0:0:0:0:0:0:1:53604 => /0:0:0:0:0:0:0:1:4242] Bad Request on /api/query/exp: null
2016-07-06 17:49:34,383 ERROR [AsyncHBase I/O Worker #1] QueryExecutor: Exception thrown during exception handling
java.lang.NullPointerException: null
    at net.opentsdb.tsd.HttpSerializer.formatErrorV1(HttpSerializer.java:834) ~[tsdb-2.3.0-RC1.jar:85047ea]
    at net.opentsdb.tsd.HttpQuery.badRequest(HttpQuery.java:413) ~[tsdb-2.3.0-RC1.jar:85047ea]
    at net.opentsdb.tsd.QueryExecutor$ErrorCB.call(QueryExecutor.java:517) [tsdb-2.3.0-RC1.jar:85047ea]

then:

2016-07-06 17:49:34,384 ERROR [AsyncHBase I/O Worker #1] QueryExecutor: Query exception:
java.lang.NullPointerException: null
    at net.opentsdb.tsd.QueryExecutor$ErrorCB.call(QueryExecutor.java:524) ~[tsdb-2.3.0-RC1.jar:85047ea]
    at net.opentsdb.tsd.QueryExecutor$ErrorCB.call(QueryExecutor.java:489) ~[tsdb-2.3.0-RC1.jar:85047ea]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) [async-1.4.0.jar:na]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) [async-1.4.0.jar:na]
    at com.stumbleupon.async.Deferred.callback(Deferred.java:1005) [async-1.4.0.jar:na]

then:

2016-07-06 17:49:34,385 WARN  [AsyncHBase I/O Worker #1] HttpQuery: [id: 0xe287bfdf, /0:0:0:0:0:0:0:1:53604 => /0:0:0:0:0:0:0:1:4242] Bad Request on /api/query/exp: null
2016-07-06 17:49:34,386 ERROR [AsyncHBase I/O Worker #1] QueryExecutor: Exception thrown during exception handling
java.lang.NullPointerException: null
    at net.opentsdb.tsd.HttpSerializer.formatErrorV1(HttpSerializer.java:834) ~[tsdb-2.3.0-RC1.jar:85047ea]
    at net.opentsdb.tsd.HttpQuery.badRequest(HttpQuery.java:413) ~[tsdb-2.3.0-RC1.jar:85047ea]
    at net.opentsdb.tsd.QueryExecutor$ErrorCB.call(QueryExecutor.java:517) [tsdb-2.3.0-RC1.jar:85047ea]
johann8384 commented 8 years ago

Using

        { "id": "e12","expr": "m1+m2" }

Works.

johann8384 commented 8 years ago

The expressions length is 4, which it is, but when it build compile_stack, the last entry is null

2016-07-06 20:41:00,677 DEBUG [AsyncHBase I/O Worker #1] QueryExecutor: Expressions Size is 4
2016-07-06 20:41:00,677 DEBUG [AsyncHBase I/O Worker #1] QueryExecutor: Adding expression to compile_stack[0]
2016-07-06 20:41:00,678 DEBUG [AsyncHBase I/O Worker #1] QueryExecutor: Adding expression to compile_stack[1]
2016-07-06 20:41:00,678 DEBUG [AsyncHBase I/O Worker #1] QueryExecutor: Adding expression to compile_stack[2]
2016-07-06 20:41:00,693 DEBUG [AsyncHBase I/O Worker #1] QueryExecutor: compile stack length: 4
2016-07-06 20:41:00,693 DEBUG [AsyncHBase I/O Worker #1] QueryExecutor: this x is: 3
2016-07-06 20:41:00,693 ERROR [AsyncHBase I/O Worker #1] QueryExecutor: compile_stack[3] is null
2016-07-06 20:41:47,847 DEBUG [AsyncHBase I/O Worker #1] QueryExecutor: this x is: 2
2016-07-06 20:41:47,847 DEBUG [AsyncHBase I/O Worker #1] QueryExecutor: Looking for variable m3
johann8384 commented 8 years ago

So, the bug is right here, it's not properly iterating the expressions.

https://github.com/OpenTSDB/opentsdb/blob/next/src/tsd/QueryExecutor.java#L321

  // compile all of the expressions
        final long intersect_start = DateTime.currentTimeMillis();
        if (graph != null) {
          final ExpressionIterator[] compile_stack = 
              new ExpressionIterator[expressions.size()];
          final TopologicalOrderIterator<String, DefaultEdge> it = 
              new TopologicalOrderIterator<String, DefaultEdge>(graph);
          int i = 0;
          while (it.hasNext()) {
            compile_stack[i++] = expressions.get(it.next());
          }
          for (int x = compile_stack.length - 1; x >= 0; x--) {
            // look for and add expressions
            for (final String var : compile_stack[x].getVariableNames()) {
              ExpressionIterator source = expressions.get(var);
              if (source != null) {
                compile_stack[x].addResults(var, source.getCopy());
                LOG.debug("Adding expression " + source.getId() + " to " + 
                    compile_stack[x].getId());
              }
            }

            compile_stack[x].compile();
            LOG.debug("Successfully compiled " + compile_stack[x]);
          }
        } else {
          for (final ExpressionIterator ei : expressions.values()) {
            ei.compile();
            LOG.debug("Successfully compiled " + ei);
          }
        }
        LOG.debug("Finished compilations in " + 
            (DateTime.currentTimeMillis() - intersect_start) + " ms");

        return serialize().addCallback(new CompleteCB()).addErrback(new ErrorCB());
      }
    }
johann8384 commented 8 years ago

Well, at least now I have captured there condition and can return a sensible error + 500 status code.

{
    "error": {
        "code": 500,
        "message": " Internal Error: Less expressions where added to the compile stack than expressions.size (3 instead of 4)",
        "trace": "java.io.IOException:  Internal Error: Less expressions where added to the compile stack than expressions.size (3 instead of 4)
manolama commented 7 years ago

You fix looked good and I had some tweaks that I'm upstreaming now in that branch. Will have this one resolved shortly I hope.

manolama commented 7 years ago

Merged up into the "next" branch. @katameru could you give this a try now please?

johann8384 commented 7 years ago

This "works on my machine" now also

gnydick commented 5 years ago

This is happening on 2.4.0, should we reopen or create a new ticket?

manolama commented 5 years ago

Yeah I'll re-open. @gnydick do you have an example please?

gnydick commented 5 years ago

I found that it's when a filter is used, sometimes the

.getFilter(xyz).value() in ExpressionIterator ends up being null. I actually ran OpenTSDB attached to a debugger to find it. I may not be remembering the statement perfectly, but I think that should point you in the right direction. G On Fri, Feb 1, 2019, 2:41 PM Chris Larsen Yeah I'll re-open. @gnydick do you have an > example please? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > , > or mute the thread > > . >