VictoriaMetrics / VictoriaMetrics

VictoriaMetrics: fast, cost-effective monitoring solution and time series database
https://victoriametrics.com/
Apache License 2.0
12.15k stars 1.2k forks source link

Some questions about cluster version #58

Closed oOHenry closed 5 years ago

oOHenry commented 5 years ago

Hi, I have a few questions about the cluster version of vm:

  1. How efficient is the consistent hashing? Is the storage balanced by usage or are there cases where certain node runs out of space and other nodes in the cluster are doing fine, cause of a bad distribution of the data?

  2. Is the Cluster version production ready? Is it planed to offer a pre compiled stable release as in single node version?

  3. How is the cluster version versioned?

  4. How to size the hardware, if I understood the cluster version right, vmstorage is a dump storage backend which stores the data(no really cpu power is needed), the calculation of prometheus rate, sum, etc. all happens in vmselect?

  5. I saw a -cacheDataPath option in vmselect, what type of data is cached here?

valyala commented 5 years ago

How efficient is the consistent hashing? Is the storage balanced by usage or are there cases where certain node runs out of space and other nodes in the cluster are doing fine, cause of a bad distribution of the data?

Incoming time series are spread across available vmstorage nodes by calculating jump consistent hash over time series name plus all the labels. The returned value points to vmstorage node for the time series. This is fast and efficient consistent hashing, which evenly distributes time series among available vmstorage nodes and minimizes time series placement shuffling when adding new vmstorage nodes. Unbalanced data between storage nodes is possible, but the dis-balance decreases with the increased number of stored time series.

Is the Cluster version production ready? Is it planed to offer a pre compiled stable release as in single node version?

Cluster version is successfully used in production. Pre-compiled stable releases with semantic versioning will be published in the future. The head of cluster branch is quite stable, so it is safe building production-ready binaries from it.

How is the cluster version versioned?

Currently it is versioned by commit hash. In the future we plan to use semantic versioning.

How to size the hardware, if I understood the cluster version right, vmstorage is a dump storage backend which stores the data(no really cpu power is needed), the calculation of prometheus rate, sum, etc. all happens in vmselect?

The required hardware highly depends on workload. General observations:

The general advice is to size the required hardware for each node type in response of the current resource usage.

I saw a -cacheDataPath option in vmselect, what type of data is cached here?

oOHenry commented 5 years ago

Thanks for your answer, I'm currently doing a POC with victoria metrics cluster version. Currently I have two servers:

I benchmarked with tsbs as in your example, with one vminsert node I got 500 K Metrics per second. The vminsert node and the vmstorage node both had a low cpu/disk utilization. I deployed a second insert instance to another server and I reached one million metrics per second. Is there a bottleneck somewhere in vminsert? I also increased the max openfiles.

version: vminsert-20190605-152733-heads-cluster-0-g2ff0d59

valyala commented 5 years ago

The bottleneck is likely in a single connection between each pair of vminsert and vmstorage nodes. Each connection is bound to a single vCPU core, so other vCPU cores may stay idle. Possible solutions:

Do not forget monitoring network bandwidth usage while testing.

Also note that cluster version will give lower performance numbers per CPU core comparing to single-node version due to RPC overhead on passing data between nodes over network.

oOHenry commented 5 years ago

Thanks for all your answer, I will close this ticket further question will be asked in the slack channel :)