filodb / FiloDB

Distributed Prometheus time series database
Apache License 2.0
1.43k stars 225 forks source link

[Long term] Look into Supersonic query API #11

Open velvia opened 9 years ago

velvia commented 9 years ago

https://slack-files.com/files-pri-safe/T03BMF0R2-F0A3LCQ3C/api-presentation_1_.pdf?c=1441299236-4641d956f1354dd200dd184c1f1fc76fc59b9d2c

samklr commented 9 years ago

Link expired?

velvia commented 9 years ago

@samklr try this?

https://slack-files.com/files-pri-safe/T03BMF0R2-F0AFBB892/jethrodata_white_paper.pdf?c=1441927242-e4340a9d9477dca46000bf030eb89fddb468fd58

darkjh commented 9 years ago

@velvia still expired

samklr commented 9 years ago

Lol. Still expired ...

velvia commented 9 years ago

I finally found a live link - though not sure how much longer this will be up too. Download the PDF while you can. https://code.google.com/p/supersonic/downloads/list

velvia commented 8 years ago

So, Supersonic is C++. There is also Apache Drill, but that might be C++ too.

velvia commented 8 years ago

I think in the short term that playing with Spark's Catalyst optimizer to get columnar or at least vector wise execution is the best bet. Here is a video:

http://blog.madhukaraphatak.com/anatomy-of-spark-dataframe-api/

Some thoughts:

velvia commented 8 years ago

More notes on where in Spark codebase to look for SQL Optimizer stages (Spark 1.5.x):

Custom execution strategies can be inserted -- see SQLContext.experimental variable.

Changing the optimizer steps might require a custom optimizer and a custom SQLContext/QueryExecution class.

velvia commented 8 years ago

A current Spark ticket for pushing down aggregations into DataSources:

https://issues.apache.org/jira/browse/SPARK-12449

See Santiago's comment right above mine, for links to how Druid, Magellan, HBase and other folks are modifying Spark Catalyst plans to get aggregation done on server side.