Open bobrik opened 8 years ago
Okay, I think I have what I need implemented locally:
SparkConf sc = new SparkConf().setAppName("Whatever");
sc.setMaster("local[24]");
sc.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
JavaSparkContext ctx = new JavaSparkContext(sc);
Config config = new Config(false);
config.overrideConfig("tsd.storage.hbase.zk_quorum", "myzk:2181");
config.overrideConfig("tsd.storage.hbase.zk_basedir", "/hbase/metrics");
config.overrideConfig("tsd.storage.hbase.data_table", "fallback-tsdb");
config.overrideConfig("tsd.storage.hbase.uid_table", "fallback-tsdb-uid");
config.overrideConfig("tsd.storage.enable_compaction", "false");
Configuration hbaseConfiguration = new Configuration();
hbaseConfiguration.setLong(TableInputFormat.SCAN_TIMERANGE_START, System.currentTimeMillis() - 86400 * 1 * 1000);
hbaseConfiguration.setLong(TableInputFormat.SCAN_TIMERANGE_END, System.currentTimeMillis());
hbaseConfiguration.setInt("hbase.client.scanner.caching", 1000);
hbaseConfiguration.setInt("hbase.rpc.timeout", 86400 * 1000);
hbaseConfiguration.setInt("hbase.client.scanner.timeout.period", 86400 * 1000);
hbaseConfiguration.setInt("hbase.client.retries.number", 1000000);
JavaRDD<OpenTSDBInput.DataPoint> rdd = OpenTSDBInput.rdd(ctx, config, hbaseConfiguration);
List<OpenTSDBInput.DataPoint> dps = rdd.filter(x -> x.getMetric().startsWith("tsd.")).takeOrdered(50, new DataPointComparator());
for (OpenTSDBInput.DataPoint dp : dps) {
System.out.println(dp);
}
But here's another issue that stops me from running it on Spark properly: OpenTSDB/asynchbase#99.
Posting OpenTSDBInput
if anyone needs it:
@bobrik you should post an article online, great job.
Hi @bobrik, if your problem has been resolved, could you please close this issue ?
Thanks :)
I came to the need to reindex OpenTSDB. I started with doing CopyTable, then reimplementing CopyTable on Spark, then thinking if I need something better. Now I want to try reindexing with different settings that cannot be changed after writing data: salting and metric / tag width.
It seems that doing map-reduce over OpenTSDB's data in TSDB's terms is a good idea. You feed data directly to the new OpenTSDB, save it in different format, do cool things in Spark, etc. However, there is no support for this currently from OpenTSDB's side.
I ended up doing a POC to iterate over
tsdb
table and emit metrics (ts + metric + tags + value):Poking around
Internal
things and reading from HBase directly doesn't look like a good practice. I think that class with the similar functionality should be present in OpenTSDB itself. I want to be able to get something like this to iterate over: