OpenTSDB / opentsdb

A scalable, distributed Time Series Database.
http://opentsdb.net
GNU Lesser General Public License v2.1
4.98k stars 1.25k forks source link

Batch importing data results in IllegalArgumentException when data is unordered in time #168

Open mbranden opened 11 years ago

mbranden commented 11 years ago

My workflow for feeding opentsdb involves off-line processing of numerous log files and generating a single stream of importable data for 'tsdb import'. This data may not be ordered in time and this results in an exception being thrown out of addPointInternal():

2013-02-19 15:37:16,218 INFO  [New I/O  worker #1] HBaseClient: Added client for region RegionInfo(table="tsdb-uid", region_name="tsdb-uid,,1361288131750.802dddc3726d461678a822323c4e28f6.", stop_key=""), which was added to the regions cache.  Now we know that RegionClient@1738275632(chan=[id: 0x7be11b78, /127.0.0.1:44445 => /127.0.0.1:50564], #pending_rpcs=0, #batched=0, #rpcs_inflight=0) is hosting 2 regions.
2013-02-19 15:37:16,223 INFO  [New I/O  worker #1] HBaseClient: Added client for region RegionInfo(table="tsdb", region_name="tsdb,,1361288133012.99acbf016aabf8f48d33904aa6be4052.", stop_key=""), which was added to the regions cache.  Now we know that RegionClient@1738275632(chan=[id: 0x7be11b78, /127.0.0.1:44445 => /127.0.0.1:50564], #pending_rpcs=0, #batched=0, #rpcs_inflight=1) is hosting 3 regions.
2013-02-19 15:37:16,417 ERROR [main] TextImporter: Exception caught while processing file tsdb-feb-14-import.txt line=apache2.resp_time 1360736752 0.001000 host=sim7000.nandi.lindenlab.com scheme=http instance=top
2013-02-19 15:37:16,428 INFO  [New I/O  worker #1] HBaseClient: Lost connection with the -ROOT- region
Exception in thread "main" java.lang.IllegalArgumentException: New timestamp=1360736752 is less than previous=1360861392 when trying to add value=[58, -125, 18, 111] to IncomingDataPoints([0, 0, 1, 81, 29, 24, 16, 0, 0, 1, 0, 0, 1, 0, 0, 2, 0, 0, 2, 0, 0, 3, 0, 0, 3] (metric=apache2.resp_time), base_time=1360861200 (Thu Feb 14 17:00:00 UTC 2013), [+12:float(0.0010000000474974513), +72:float(0.0010000000474974513), +132:float(0.0010000000474974513), +192:float(0.0010000000474974513)])
        at net.opentsdb.core.IncomingDataPoints.addPointInternal(IncomingDataPoints.java:201)
        at net.opentsdb.core.IncomingDataPoints.addPoint(IncomingDataPoints.java:283)
        at net.opentsdb.tools.TextImporter.importFile(TextImporter.java:153)
        at net.opentsdb.tools.TextImporter.main(TextImporter.java:72) 

I haven't looked at this closely to see if this test is restricted to a single time sequence or if it has wider scope.

I'm dealing with this by simply sorting the unified stream before importing. So this may be taken as a bug, a feature request or just a documentation request for a works-as-intended test.

tsuna commented 11 years ago

The test applies only to a single time series (remember a time series is uniquely defined by a metric name and a set of tags).

This check here is intentional, because importing data in-order allows for various optimizations, which are especially important for batch imports. Is it OK for you to keep sorting the data before you batch-import it?

IDerr commented 6 years ago

Hi @mbranden If your question has been answered, could you please close this issue ?

thanks :)