ChronixDB / chronix.server

The Chronix Server implementation that is based on Apache Solr.
Apache License 2.0
263 stars 29 forks source link

writing data directly to Solr #135

Closed afrozl closed 6 years ago

afrozl commented 7 years ago

Given that there is no node.js client available for ChronixDB, is it possible to write data points directly to Solr using one of the solr clients?

FlorianLautenschlager commented 7 years ago

Hi, there are multiple ways to write data to Chronix (InfluxDB/Graphite/OpenTSDB/KairosDB protocol) and of course you can write data points directly to Solr. If you writing data directly into Solr, you have to ensure, that the data is serialized properly. If you want to use the "standard" time series you have to ensure the data is written in the correct format. But you can implement you custom time series (plug it into chronix) that use your specific serialization format.

Currently there are the following required fields:

afrozl commented 6 years ago

so, looking at a ruby client, I get the following: Input data:

1500071738,cpu-active,79.58
1500071748,cpu-active,9.05
1500071758,cpu-active,7.68

Which gets converted to:

{"cpu-active"=>{"startTime"=>1500071738, "lastTimestamp"=>1500071758, "points"=>#, #, #]>, "prevDelta"=>10, "timeSinceLastDelta"=>2, "lastStoredDate"=>1500071748}}

and finally, the points are converted to binary data:

{:metric=>"cpu-active", :start=>1500071738, :end=>1500071758, :data=>"H4sIAAAAAAAA/+Pi5mAQZACBB8EOXNwcXIKzZgKBpBKIwyC4Q671deAOOQcARULF4CcAAAA="}

Is that correct? Where do the tags/dimensions get stored?

Thanks for the help

FlorianLautenschlager commented 6 years ago

I suggest you to use one of the provided http apis: https://chronix.gitbooks.io/chronix/content/http-api.html

Otherwise you have to implement the whole serialization and compression stuff to use server-side functions.

Chronix stores everything in Solr. Tags are the key-value pairs of the solr document.

afrozl commented 6 years ago

That works quite well. The only issue I have is when writing tags:

[
  {
      "name": "cpu-active",
      "datapoints": [[1359788400000, 23.1], [1359788300000, 13.2], [1359788410000, 23.1]],
      "tags": {
          "hostname": "server1"
      }
  }
]

I am not sure why it expects pre defined tags? for example anything other that a tag for 'host' fails:

{"responseHeader":{"status":400,"QTime":3},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],"msg":"ERROR: [doc=50d244e1-85c7-459a-b728-3c04e7861e0a] unknown field 'hostname'","code":400}}
FlorianLautenschlager commented 6 years ago

Tags are defined in the schema.xml of the solr core (default is chronix,https://github.com/ChronixDB/chronix.server/blob/master/chronix-server-test-integration/src/inttest/resources/de/qaware/chronix/solr/chronix/conf/schema.xml). It also defines the required, optional fields, dynamic fields, etc.

Let me know if you need further help. :)

Sent from my OnePlus ONEPLUS A3003 using FastHub

afrozl commented 6 years ago

Got it. Sorry for all the questions. Very new to solr although a relatively old hand at elasticsearch.

BTW, unrelated but out of curiosity, is it possible to run chronixDB on top of elasticsearch? Given that ES is also lucene based, it should be theoretically possible?

FlorianLautenschlager commented 6 years ago

No problem. It is not well documented...

Yeah. Both are based on lucene and it is possible to replace the storage with little effort. But that manual effort is needed. 😉

afrozl commented 6 years ago

one more question. Am I correct in assuming that the storage benefits of ChronixDB are best attained with large-ish batches of data? In my use case I expect to be ingesting real-time data from many IOT sources. If I can only afford to delay the ingestion by 30 seconds at max, will I still see any storage/compression benefits? Most metrics would be at a 5 second granularity, so at best I could batch 6 data points per metric.

FlorianLautenschlager commented 6 years ago

Six points per chunk is not the size Chronix is designed for. Chronix is designed as a long-term storage with an operational workload of few batch writes and frequent reads.

I think you need a storage that focuses on fast writes. Prometheus in combination with the Push-Gateway might be solution and Chronix as long-term storagen ;-). Check this spreadsheet with that holds a bunch of tsdbs https://docs.google.com/spreadsheets/d/1sMQe9oOKhMhIVw9WmuCEWdPtAoccJ4a-IuZv4fXDHxM/edit#gid=0