OpenTSDB / opentsdb

A scalable, distributed Time Series Database.
http://opentsdb.net
GNU Lesser General Public License v2.1
4.99k stars 1.25k forks source link

When appends are disabled, /api/query/last gives error for certain timestamps: Offset of [0] is out of bounds for the qualifier length of [1] #2186

Open twegener-embertec opened 2 years ago

twegener-embertec commented 2 years ago

Since trying out a relatively recent checkout of opentsdb master (post 2.4.0), the following error sometimes occurs, depending on the timestamp of the last value for the given series:

http://tsdb-01:4242/api/query/last?back_scan=96&timeseries=redacted{redacted}: cannot GET /api/query/last?back_scan=96&timeseries=redacted%7Bredacted%7D (400)

message: Offset of [0] is out of bounds for the qualifier length of [1]

Looking at the opentsdb source code, this appears to be a regression introduced by: https://github.com/OpenTSDB/opentsdb/pull/1634/commits/967e4a03fc0f3f2e7b4ddc06d6e8612714575b25 Fix OpenTSDB#1632

I.e. a regression when run with appends disabled (tsd.storage.enable_appends false, i.e. the default case).

Consider the following code from src/core/Interval.java:

  public static ArrayList<Cell> extractDataPoints(final ArrayList<KeyValue> row,
...
      // when enable_appends set to true, should get qualifier and value from the HBase Column Value
      if (kv.qualifier()[0] == AppendDataPoints.APPEND_COLUMN_PREFIX) {
        int idx = 0;
        int q_length = 0;
        int v_length = 0;
        while (idx < kv.value().length) {
          q_length = Internal.getQualifierLength(kv.value(), idx);
          v_length = Internal.getValueLengthFromQualifier(kv.value(), idx);
...

From src/core/AppendDataPoints.java:

  public static final byte APPEND_COLUMN_PREFIX = 0x05;

The key line is:

      if (kv.qualifier()[0] == AppendDataPoints.APPEND_COLUMN_PREFIX) {

Note that it doesn't check the config to see whether appends are enabled, and so misinterprets the qualifier.

When appends are not enabled, the first 12 bits of the qualifier will be the offset (in seconds) from the row timestamp (which is on an hour boundary).

For timestamp offsets from 80 to 95 seconds, the first byte of the qualifier will be 0x05, which is the APPEND_COLUMN_PREFIX, and so they get misinterpreted as an append column prefix here.

In that case, it then interprets kv.value() as an embedded qualifier. In my case, a value that is 1 byte long was being stored, so when the following line is called:

          q_length = Internal.getQualifierLength(kv.value(), idx);

...which calls:

  public static short getQualifierLength(final byte[] qualifier, 
      final int offset) {
    validateQualifier(qualifier, offset);  

...which calls:

  private static void validateQualifier(final byte[] qualifier, 
      final int offset) {
    if (offset < 0 || offset >= qualifier.length - 1) {
      throw new IllegalDataException("Offset of [" + offset + 
          "] is out of bounds for the qualifier length of [" + 
          qualifier.length + "]");
    }
  }

...where offset will be 0 and qualifier.length will be 1 (due to the 1 byte value), it results in the error: Offset of [0] is out of bounds for the qualifier length of [1]

This also affects other areas that call extractDataPoints, such as tsdb scan when dumping points with timestamps that have an offset from the hour such that it hits this problem.

Note that the unit tests in test/tsd/TestQueryRpcLastDataPoint.java only appear to test nice round timestamps that fall on the hour, and so they miss this issue. I have created a unit test that reproduces this problem when using a carefully chosen timestamp.

Presumably, the fix would be to add checks for append mode being enabled in the appropriate places.

twegener-embertec commented 2 years ago

Some unit tests (added to test/tsd/TestQueryRpcLastDataPoint.java) that first test a non-hour-boundary timestamp that is not affected by this issue (which passes), and then a similar test where the timestamp offset is such that it hits this problem, and so the test fails:

  @Test
  public void qsMetricBackscanNonRoundStamp() throws Exception {
    PowerMockito.mockStatic(DateTime.class);
    // 40 seconds past the hour                                                                                                                         
    PowerMockito.when(DateTime.currentTimeMillis()).thenReturn(1356998440000L);
    tsdb.addPoint("sys.cpu.user", 1356998440L, 45, tags);

    final HttpQuery query = NettyMocks.getQuery(tsdb,
        "/api/query/last?timeseries=sys.cpu.user{host=web01}&back_scan=1");
    rpc.execute(tsdb, query);
    final String json = getContent(query);
    assertEquals(HttpResponseStatus.OK, query.response().getStatus());
    assertTrue(json.contains("\"timestamp\":1356998440000"));
    assertTrue(json.contains("\"value\":\"45\""));
    assertTrue(json.contains("\"tsuid\":\"000001000001000001\""));
    assertFalse(json.contains("\"metric\""));
    assertFalse(json.contains("\"tags\""));
  }

  @Test
  public void qsMetricBackscanNastyStamp() throws Exception {
    PowerMockito.mockStatic(DateTime.class);
    // 80 seconds past the hour, which looks like qualifier starting with 0x05                                                                          
    PowerMockito.when(DateTime.currentTimeMillis()).thenReturn(1356998480000L);
    tsdb.addPoint("sys.cpu.user", 1356998480L, 43, tags);

    final HttpQuery query = NettyMocks.getQuery(tsdb,
        "/api/query/last?timeseries=sys.cpu.user{host=web01}&back_scan=1");
    rpc.execute(tsdb, query);
    final String json = getContent(query);
    assertEquals(HttpResponseStatus.OK, query.response().getStatus());
    assertTrue(json.contains("\"timestamp\":1356998480000"));
    assertTrue(json.contains("\"value\":\"43\""));
    assertTrue(json.contains("\"tsuid\":\"000001000001000001\""));
    assertFalse(json.contains("\"metric\""));
    assertFalse(json.contains("\"tags\""));
  }

And the failure result:

1) qsMetricBackscanNastyStamp(net.opentsdb.tsd.TestQueryRpcLastDataPoint)
net.opentsdb.tsd.BadRequestException: Offset of [0] is out of bounds for the qualifier length of [1]
    at net.opentsdb.tsd.QueryRpc.handleLastDataPointQuery(QueryRpc.java:506)
    at net.opentsdb.tsd.QueryRpc.execute(QueryRpc.java:110)
    at net.opentsdb.tsd.TestQueryRpcLastDataPoint.qsMetricBackscanNastyStamp(TestQueryRpcLastDataPoint.java:225)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.internal.runners.TestMethod.invoke(TestMethod.java:68)
    at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.runTestMethod(PowerMockJUnit44RunnerDelegateImpl.java:310)
    at org.junit.internal.runners.MethodRoadie$2.run(MethodRoadie.java:88)
    at org.junit.internal.runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java:96)
    at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.executeTest(PowerMockJUnit44RunnerDelegateImpl.java:294)
    at org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.executeTestInSuper(PowerMockJUnit47RunnerDelegateImpl.java:127)
    at org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.executeTest(PowerMockJUnit47RunnerDelegateImpl.java:82)
    at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.runBeforesThenTestThenAfters(PowerMockJUnit44RunnerDelegateImpl.java:282)
    at org.junit.internal.runners.MethodRoadie.runTest(MethodRoadie.java:86)
    at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java:49)
    at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.invokeTestMethod(PowerMockJUnit44RunnerDelegateImpl.java:207)
    at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.runMethods(PowerMockJUnit44RunnerDelegateImpl.java:146)
    at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$1.run(PowerMockJUnit44RunnerDelegateImpl.java:120)
    at org.junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java:33)
    at org.junit.internal.runners.ClassRoadie.runProtected(ClassRoadie.java:45)
    at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.run(PowerMockJUnit44RunnerDelegateImpl.java:118)
    at org.powermock.modules.junit4.common.internal.impl.JUnit4TestSuiteChunkerImpl.run(JUnit4TestSuiteChunkerImpl.java:104)
    at org.powermock.modules.junit4.common.internal.impl.AbstractCommonPowerMockRunner.run(AbstractCommonPowerMockRunner.java:53)
    at org.powermock.modules.junit4.PowerMockRunner.run(PowerMockRunner.java:53)
    at org.junit.runners.Suite.runChild(Suite.java:127)
    at org.junit.runners.Suite.runChild(Suite.java:26)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:138)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:117)
    at org.junit.runner.JUnitCore.runMain(JUnitCore.java:96)
    at org.junit.runner.JUnitCore.runMainAndExit(JUnitCore.java:47)
    at org.junit.runner.JUnitCore.main(JUnitCore.java:40)
Caused by: net.opentsdb.tsd.BadRequestException: Offset of [0] is out of bounds for the qualifier length of [1]
    at net.opentsdb.tsd.QueryRpc$1ErrBack.call(QueryRpc.java:388)
    at net.opentsdb.tsd.QueryRpc$1ErrBack.call(QueryRpc.java:377)
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278)
    at com.stumbleupon.async.Deferred.addCallbacks(Deferred.java:688)
    at com.stumbleupon.async.Deferred.addErrback(Deferred.java:756)
    at net.opentsdb.tsd.QueryRpc.handleLastDataPointQuery(QueryRpc.java:493)
    ... 39 more
Caused by: net.opentsdb.core.IllegalDataException: Offset of [0] is out of bounds for the qualifier length of [1]
    at net.opentsdb.core.Internal.validateQualifier(Internal.java:877)
    at net.opentsdb.core.Internal.getQualifierLength(Internal.java:714)
    at net.opentsdb.core.Internal.extractDataPoints(Internal.java:251)
    at net.opentsdb.core.Internal$GetLastDataPointCB.call(Internal.java:484)
    at net.opentsdb.meta.TSUIDQuery$LastPointCB.call(TSUIDQuery.java:634)
    at net.opentsdb.meta.TSUIDQuery$LastPointCB.call(TSUIDQuery.java:609)
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278)
    at com.stumbleupon.async.Deferred.addCallbacks(Deferred.java:688)
    at com.stumbleupon.async.Deferred.addCallbackDeferring(Deferred.java:738)
    at net.opentsdb.meta.TSUIDQuery$1TSUIDCB.call(TSUIDQuery.java:199)
    at net.opentsdb.meta.TSUIDQuery$1TSUIDCB.call(TSUIDQuery.java:174)
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278)
    at com.stumbleupon.async.Deferred.addCallbacks(Deferred.java:688)
    at com.stumbleupon.async.Deferred.addCallbackDeferring(Deferred.java:738)
    at net.opentsdb.meta.TSUIDQuery.getLastPoint(TSUIDQuery.java:209)
    at net.opentsdb.tsd.QueryRpc.handleLastDataPointQuery(QueryRpc.java:478)
    ... 39 more

FAILURES!!!
Tests run: 48,  Failures: 1
twegener-embertec commented 2 years ago

Here's an example of where I hit this with tsdb scan when it was attempting to dump a point that had a timestamp that would hit this:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
    at java.lang.System.arraycopy(Native Method)
    at net.opentsdb.core.Internal.extractDataPoints(Internal.java:256)
    at net.opentsdb.core.Internal.extractDataPoints(Internal.java:219)
    at net.opentsdb.tools.DumpSeries.formatKeyValue(DumpSeries.java:215)
    at net.opentsdb.tools.DumpSeries.doDump(DumpSeries.java:136)
    at net.opentsdb.tools.DumpSeries.main(DumpSeries.java:85)

Relevant code from src/core/Internal.java:

  public static ArrayList<Cell> extractDataPoints(final ArrayList<KeyValue> row,
      final int estimated_nvalues) {
    final ArrayList<Cell> cells = new ArrayList<Cell>(estimated_nvalues);
    for (final KeyValue kv : row) {
      final byte[] qual = kv.qualifier();
      final int len = qual.length;
      final byte[] val = kv.value();

      // when enable_appends set to true, should get qualifier and value from the HBase Column Value
      if (kv.qualifier()[0] == AppendDataPoints.APPEND_COLUMN_PREFIX) {
        int idx = 0;
        int q_length = 0;
        int v_length = 0;
        while (idx < kv.value().length) {
          q_length = Internal.getQualifierLength(kv.value(), idx);
          v_length = Internal.getValueLengthFromQualifier(kv.value(), idx);
          final byte[] q = new byte[q_length];
          final byte[] v = new byte[v_length];
          System.arraycopy(kv.value(),idx,q,0,q_length);
          System.arraycopy(kv.value(),idx + q_length,v, 0, v_length);
          idx += q_length + v_length;

          final Cell cell = new Cell(q, v);
          cells.add(cell);
        }

Specifically:

          System.arraycopy(kv.value(),idx + q_length,v, 0, v_length);
twegener-embertec commented 2 years ago

Here's an example of where I hit this with tsdb fsck --full-scan:

2021-08-13 02:45:05,296 [Fsck #1] ERROR [Fsck.run] - Shouldn't be here
    at net.opentsdb.tools.Fsck$FsckWorker.fsckRow(Fsck.java:563) ~[tsdb-2.4.1-SNAPSHOT.jar:a945263]
    at net.opentsdb.tools.Fsck$FsckWorker.run(Fsck.java:330) ~[tsdb-2.4.1-SNAPSHOT.jar:a945263]

Relevant code in src/tools/Fsck.java:

    private void fsckRow(final ArrayList<KeyValue> row,
...
            final ArrayList<Cell> cells = Internal.extractDataPoints(kv);
twegener-embertec commented 2 years ago

Presumably, the fix would be to add checks for append mode being enabled in the appropriate places.

Perhaps an alternative fix would be to check that the qualifier is 3 bytes long, along with checking that the first byte is APPEND_COLUMN_PREFIX.

twegener-embertec commented 2 years ago

Some other errors that came up during tsdb fsck --full-scan that did not occur with opentsdb 2.3.2:

2021-08-13 02:45:04,992 [Fsck #1] ERROR [Fsck.fsckRow] - Compacted column was out of order or requires a fixup: KeyValue(key=[0, 0, 43, 87, -112, 2, 112, 0, 0, 1, 0, 6, -71, 0, 0, 3, 0, 0, 11, 0, 0, 4, 0, 4, 4, 0, 0, 5, 0, 4, 2, 0, 0, 6, 0, 6, -67, 0, 0, 11, 0, 4, 1], family="t", qualifier=[5, -32, 5, -16, 6, 64], value=[22, 25, 27, 0], timestamp=1469059214749)
2021-08-13 02:45:04,992 [Fsck #1] ERROR [Fsck.fsckFloat] - This floating point value must be encoded either on 4 or 8 bytes, but it's on 2 bytes.
2021-08-13 02:45:04,992 [Fsck #1] ERROR [Fsck.fsckFloat] - The previous value was in a compacted column. This should not be possible.
2
twegener-embertec commented 2 years ago

The following patch made the problems with /api/query/last / tsdb scan / tsdb fsck --full-scan go away for me:

--- a/src/core/Internal.java                                                                                                                            
+++ b/src/core/Internal.java                                                                                                                            
@@ -243,7 +243,7 @@ public final class Internal {
       final byte[] val = kv.value();                                                                                                                   

       // when enable_appends set to true, should get qualifier and value from the HBase Column Value                                                   
-      if (kv.qualifier()[0] == AppendDataPoints.APPEND_COLUMN_PREFIX) {                                                                                
+      if (len == 3 && kv.qualifier()[0] == AppendDataPoints.APPEND_COLUMN_PREFIX) {                                                                    
         int idx = 0;                                                                                                                                   
         int q_length = 0;                                                                                                                              
         int v_length = 0;                                                                                                                              
manolama commented 2 years ago

Yup, excellent walkthrough and good catch. Just checking the length is all that is needed as per your patch. Would you like to submit a PR or would you like me to just apply the patch as-is? And thank you for your work!

twegener-embertec commented 2 years ago

would you like me to just apply the patch as-is?

Sure, no worries.

And thank you for your work!

You're welcome, and thank you for yours. :-)