Some questions about cluster version

oOHenry commented 5 years ago

Hi, I have a few questions about the cluster version of vm:

How efficient is the consistent hashing? Is the storage balanced by usage or are there cases where certain node runs out of space and other nodes in the cluster are doing fine, cause of a bad distribution of the data?
Is the Cluster version production ready? Is it planed to offer a pre compiled stable release as in single node version?
How is the cluster version versioned?
How to size the hardware, if I understood the cluster version right, vmstorage is a dump storage backend which stores the data(no really cpu power is needed), the calculation of prometheus rate, sum, etc. all happens in vmselect?
I saw a -cacheDataPath option in vmselect, what type of data is cached here?

valyala commented 5 years ago

How efficient is the consistent hashing? Is the storage balanced by usage or are there cases where certain node runs out of space and other nodes in the cluster are doing fine, cause of a bad distribution of the data?

Incoming time series are spread across available vmstorage nodes by calculating jump consistent hash over time series name plus all the labels. The returned value points to vmstorage node for the time series. This is fast and efficient consistent hashing, which evenly distributes time series among available vmstorage nodes and minimizes time series placement shuffling when adding new vmstorage nodes. Unbalanced data between storage nodes is possible, but the dis-balance decreases with the increased number of stored time series.

Is the Cluster version production ready? Is it planed to offer a pre compiled stable release as in single node version?

Cluster version is successfully used in production. Pre-compiled stable releases with semantic versioning will be published in the future. The head of cluster branch is quite stable, so it is safe building production-ready binaries from it.

How is the cluster version versioned?

Currently it is versioned by commit hash. In the future we plan to use semantic versioning.

How to size the hardware, if I understood the cluster version right, vmstorage is a dump storage backend which stores the data(no really cpu power is needed), the calculation of prometheus rate, sum, etc. all happens in vmselect?

The required hardware highly depends on workload. General observations:

vmstorage nodes usually require 2x CPU cores comparing to vminsert nodes for handling high ingestion rate.
vmselect nodes usually require more CPU cores than all other node types, since the majority of calculations for select requests are performed on vmselect nodes.
vmselect and vmstorage nodes usually require more RAM than vminsert nodes, since they contain various caches aimed towards performance improvements.

The general advice is to size the required hardware for each node type in response of the current resource usage.

I saw a -cacheDataPath option in vmselect, what type of data is cached here?

Temporary files during heavy queries may be stored there in order to reduce RAM usage.
Response cache is stored there on graceful shutdown. The cache is read from this dir on startup. This allows keeping caches warm during upgrades.

oOHenry commented 5 years ago

Thanks for your answer, I'm currently doing a POC with victoria metrics cluster version. Currently I have two servers:

vmstorage node 8 core xeon + HT
vminsert/select node 6 core xeon + HT

I benchmarked with tsbs as in your example, with one vminsert node I got 500 K Metrics per second. The vminsert node and the vmstorage node both had a low cpu/disk utilization. I deployed a second insert instance to another server and I reached one million metrics per second. Is there a bottleneck somewhere in vminsert? I also increased the max openfiles.

version: vminsert-20190605-152733-heads-cluster-0-g2ff0d59

valyala commented 5 years ago

The bottleneck is likely in a single connection between each pair of vminsert and vmstorage nodes. Each connection is bound to a single vCPU core, so other vCPU cores may stay idle. Possible solutions:

Increase the number of vmstorage nodes to the number of vCPU cores in each vminsert node - 6cores x 2HT = 12 in your case.
Run vminsert nodes with -rpc.disableCompression option. This option trades lower CPU usage for serving vminsert->vmstorage connection to higher network bandwidth usage.

Do not forget monitoring network bandwidth usage while testing.

Also note that cluster version will give lower performance numbers per CPU core comparing to single-node version due to RPC overhead on passing data between nodes over network.

oOHenry commented 5 years ago

Thanks for all your answer, I will close this ticket further question will be asked in the slack channel :)

VictoriaMetrics / VictoriaMetrics

Some questions about cluster version #58