ChronixDB / chronix.server

The Chronix Server implementation that is based on Apache Solr.
Apache License 2.0
263 stars 29 forks source link

Data-mining or real-time? #78

Closed brettwooldridge closed 7 years ago

brettwooldridge commented 7 years ago

Asked in Gitter...

@FlorianLautenschlager I have a question about Chronix. Maybe about chronix-storage in particular...

It seems like Chronix is designed more for data-mining that real-time use, is that correct?

I ask, because it seems that a time series is only (should only) be added when a sufficient number of data points have been collected.

For example, in order to benefit from the compression it seems that "chunks" of data points need to be accumulated before adding the total series to Solr. If this is true, the "recent" values would not be available for query. Correct?

Or can I collect a set of metrics every 5 seconds, and add them through the storage service, whereby they can be queried? Does something underlying in Chronix "merge" them in some way into a document of "significant size" over time to achieve better compression and query performance?

My concern is that we are building a monitoring system with thousands (or tens of thousands) of disparate metrics collected every 5 seconds, but for any given host/metric pair there would only be 12 per minute -- but they need to be available "immediately" for query to display on real-time dashboards.

FlorianLautenschlager commented 7 years ago

Simple answer is: it depends ;-)

It seems like Chronix is designed more for data-mining that real-time use, is that correct? I ask, because it seems that a time series is only (should only) be added when a sufficient number of data points have been collected.

Yes. Chronix is designed as a long term storage. But it depends what real-time means to you. Storing records of small data points reduces both the storage efficiency and query performance.

For example, in order to benefit from the compression it seems that "chunks" of data points need to be accumulated before adding the total series to Solr. If this is true, the "recent" values would not be available for query. Correct? Or can I collect a set of metrics every 5 seconds, and add them through the storage service, whereby they can be queried?

Recent values are available after a commit is performed. But commits are expensive and blocking. Hence many commits will negatively influence the write and query performance (if parallel).

Does something underlying in Chronix "merge" them in some way into a document of "significant size" over time to achieve better compression and query performance?

It is planned to implement a record compaction (see. #49). Currently there is no implemented solution.

My concern is that we are building a monitoring system with thousands (or tens of thousands) of disparate metrics collected every 5 seconds, but for any given host/metric pair there would only be 12 per minute -- but they need to be available "immediately" for query to display on real-time dashboards.

Well a short term storage might be a good choice in front of Chronix. Another option is to store multivariate time series. A record could look like:

record {
start: 4711
end: 4711
data: [
  ts1_v, ts2_v, ts3_v, ...
]
metrics: [ts1,ts2,t3]

Hope this helps.