ChronixDB / chronix.server

The Chronix Server implementation that is based on Apache Solr.
Apache License 2.0
263 stars 29 forks source link

Using SAX with Chronix #138

Closed andyfangdz closed 6 years ago

andyfangdz commented 6 years ago

Hi,

Here at Georgia Tech, we're trying to use Chronix in NIH's MD2K research project (https://md2k.org). One feature in Chronix that's particularly interesting to us is a DB-native SAX implementation. However, I did not find any documentation on using SAX with Chronix.

So here are my questions:

  1. How should one use Chronix to query the SAX representation of time-series?
  2. How should one get the calculated SAX representation (maybe pre-calculate with Chronix?)? With function queries?
  3. I saw in this issue (https://github.com/ChronixDB/chronix.examples/issues/14) that it's suggested we use the filtering method to directly find relevant representations. Is this approached preferred to storing the SAX representation back to Solr for searching/matching?
  4. I think SAX is only available in the GPL branch. Is there any features missing from that branch but are in master branch?

Thanks so much for the help! Chronix truly seems like an amazing TSDB so far. /cc @FlorianLautenschlager

FlorianLautenschlager commented 6 years ago

Hi @andyfangdz thanks for your issue. Great to hear that you try to use Chronix for your research project ;-). So first of all, Chronix matches perfectly if you have read-intensive scenario (few batch writes, lots of reads).

So lets get quickly into your questions:

  1. How should one use Chronix to query the SAX representation of time-series?
  2. How should one get the calculated SAX representation (maybe pre-calculate with Chronix?)? With function queries?

This to questions depend on each other. You can use Chronix to calculate the SAX-Representation and query for a specific representation (Case 1). Or you can pre-calculate and index the SAX-Representation. Then you can use the whole bunch of search features of Solr (Case 2). Case 2 works perfectly if your data won't change (one large import) and you can pre-calculate the representation for the whole time series.

Search for time series having a the pattern af somewhere Case 1:

//Sax(String regex, int paaSize, int alphabetSize, double threshold) 
metric{sax:*af*,10,60,0.01}

Case 2:

//pre-calculate on the client-side
//define a field in the schema.xml, <field name="sax", value="string", indexed="true", stored="true"> (stored is optional)
//store it as attribute, e.g. sax
//use a solr query
q=sax:*af* //blazing fast ;-)

A few thoughts to about the pre-calculation. Given the time series

ts = [(1,10), (2,20), (3,30), ... (10, 100)]

if we calculate the SAX-Representation for the whole time series -> a,b,c, ..., j the following queries matches:

So you have to think hard about the meanings of pre-calculate the SAX-Representation value.

  1. I saw in this issue (ChronixDB/chronix.examples#14) that it's suggested we use the filtering method to directly find relevant representations. Is this approached preferred to storing the SAX representation back to Solr for searching/matching?

See above. It depends on how you analyze the data. If your data won't change you can pre-calculate the data. If not, i suggest to calculate the representation on the fly.

  1. I think SAX is only available in the GPL branch. Is there any features missing from that branch but are in master branch?

No the branches are equal. But i think i have to update them to the latest version. I can do this for you.

andyfangdz commented 6 years ago

Hi @FlorianLautenschlager, thanks for the reply!

These suggestions are really helpful. I'm sure I'll come up with additional questions, but these seem to be enough to get us started.

We expect the rewrite to take a few months, until then, please enjoy a video of our prototype (that's not using chronix yet :P):

mHealth Discovery Dashboard

https://youtu.be/vpvozWf1aCc

Could you help us by updating to the latest version on both branches? Thanks!

FlorianLautenschlager commented 6 years ago

Cool. I will update the branch hopefully this evening (Europe ;-)) I will give you a ping.

andyfangdz commented 6 years ago

@FlorianLautenschlager Any chance I can get an update today? Thanks :)

FlorianLautenschlager commented 6 years ago

@andyfangdz Yes. I will release the update this evening. Sorry for the delay. Lots on my plate...

andyfangdz commented 6 years ago

@FlorianLautenschlager no worries! You've helped so much already!

FlorianLautenschlager commented 6 years ago

So its updated but the integration test does not work due to the gradle update. Will fix this this evening and let u know.

FlorianLautenschlager commented 6 years ago

@andyfangdz I have migrated it - tomorrow i will update to the latest Solr version and than its done.

andyfangdz commented 6 years ago

Thank you!

FlorianLautenschlager commented 6 years ago

For further questions i suggest that we use gitter.