VictoriaMetrics / VictoriaMetrics

VictoriaMetrics: fast, cost-effective monitoring solution and time series database
https://victoriametrics.com/
Apache License 2.0
12.35k stars 1.22k forks source link

Attempts to configure VM for small memory footprint don't yield expected results #6276

Open aprospero opened 6 months ago

aprospero commented 6 months ago

Is your question request related to a specific component?

VictoriaMetrics

Describe the question in detail

Abstract

I am evaluating VM on a embedded device with limited resources. My ultimate goal is to only allocate 20-30MiB RAM to VM.

Setup

Platform:

OS:

VM Version:

Test data:

Test setup:

Command line flags:

Also tried but only sporadic:

Regardless of the combination of listed command line flags the result is always pretty much the same (see below).

Observed behavior

the RSS page count allocated by vm starts after startup at around 60MB (which is already way more than expected) and begins to continually rise when proceeding with the benchmark. This behaviour goes on until the system runs out of free memory pages and the kernel kills the vm process.

A typical VM startup log looks like this:

 /usr/bin/vm -opentsdbHTTPListenAddr=:4242 -retentionPeriod=5 -memory.allowedBytes=30MiB -search.maxConcurrentRequests=1 -search.maxMemoryPerQuery=4MiB
2024-05-15T10:13:54.082Z    info    VictoriaMetrics/lib/logger/flag.go:12   build version: victoria-metrics-20240301-013527-tags-v1.99.0-0-g9cd4b0537
2024-05-15T10:13:54.085Z    info    VictoriaMetrics/lib/logger/flag.go:13   command-line flags
2024-05-15T10:13:54.089Z    info    VictoriaMetrics/lib/logger/flag.go:20     -memory.allowedBytes="30MiB"
2024-05-15T10:13:54.091Z    info    VictoriaMetrics/lib/logger/flag.go:20     -opentsdbHTTPListenAddr=":4242"
2024-05-15T10:13:54.095Z    info    VictoriaMetrics/lib/logger/flag.go:20     -retentionPeriod="5"
2024-05-15T10:13:54.097Z    info    VictoriaMetrics/lib/logger/flag.go:20     -search.maxConcurrentRequests="1"
2024-05-15T10:13:54.098Z    info    VictoriaMetrics/lib/logger/flag.go:20     -search.maxMemoryPerQuery="4MiB"
2024-05-15T10:13:54.099Z    info    VictoriaMetrics/app/victoria-metrics/main.go:73 starting VictoriaMetrics at "[:8428]"...
2024-05-15T10:13:54.101Z    info    VictoriaMetrics/app/vmstorage/main.go:106   opening storage at "victoria-metrics-data" with -retentionPeriod=5
2024-05-15T10:13:54.130Z    info    VictoriaMetrics/lib/memory/memory.go:46 limiting caches to 31457280 bytes, leaving 492752896 bytes to the OS according to -memory.allowedBytes=30MiB
2024-05-15T10:13:55.855Z    info    VictoriaMetrics/lib/storage/storage.go:958  discarding /mnt/data/fld-prototype/victoria-metrics-data/cache/curr_hour_metric_ids, since it contains outdated hour; got 476583; want 476602
2024-05-15T10:13:55.859Z    info    VictoriaMetrics/lib/storage/storage.go:958  discarding /mnt/data/fld-prototype/victoria-metrics-data/cache/prev_hour_metric_ids, since it contains outdated hour; got 476582; want 476601
2024-05-15T10:13:56.138Z    info    VictoriaMetrics/lib/storage/storage.go:919  discarding /mnt/data/fld-prototype/victoria-metrics-data/cache/next_day_metric_ids_v2, since it contains data for stale date; got 19857; want 19858
2024-05-15T10:13:56.834Z    info    VictoriaMetrics/app/vmstorage/main.go:120   successfully opened storage "victoria-metrics-data" in 2.731 seconds; partsCount: 34; blocksCount: 4998; rowsCount: 1817056; sizeBytes: 1351094
2024-05-15T10:13:56.852Z    info    VictoriaMetrics/app/vmselect/promql/rollup_result_cache.go:126  loading rollupResult cache from "victoria-metrics-data/cache/rollupResult"...
2024-05-15T10:13:58.504Z    info    VictoriaMetrics/app/vmselect/promql/rollup_result_cache.go:155  loaded rollupResult cache from "victoria-metrics-data/cache/rollupResult" in 1.644 seconds; entriesCount: 459, sizeBytes: 20119552
2024-05-15T10:13:58.508Z    info    VictoriaMetrics/lib/ingestserver/opentsdbhttp/server.go:35  starting HTTP OpenTSDB server at ":4242"
2024-05-15T10:13:58.516Z    info    VictoriaMetrics/app/victoria-metrics/main.go:84 started VictoriaMetrics in 4.415 seconds
2024-05-15T10:13:58.526Z    info    VictoriaMetrics/lib/httpserver/httpserver.go:118    starting server at http://127.0.0.1:8428/
2024-05-15T10:13:58.528Z    info    VictoriaMetrics/lib/httpserver/httpserver.go:119    pprof handlers are exposed at http://127.0.0.1:8428/debug/pprof/
2024/05/15 10:30:33 ERROR: metrics: cannot read process_io_* metrics from "/proc/self/io", so these metrics won't be updated until the error is fixed; see https://github.com/VictoriaMetrics/metrics/issues/42 ; The error: open /proc/self/io: no such file or directory
Killed

The typical output of our Benchmark looks like this:

VictoriaMetrics Benchmark
The system clock ticks at 0.001 µs, steadiness false. The steady clock ticks at 0.001 µs.

10000 Queries for   1 measurements over     5 minutes took (ms) min/avg/max:     2.63/    7.87/  496.86, median:     6.80, standard deviation:     6.73. Resultcount was (pts) min/avg/max:        0/      0.96/       1, median:        1, standard deviation:     0.20.
10000 Queries for   1 measurements over    60 minutes took (ms) min/avg/max:     2.89/    8.19/   66.47, median:     7.15, standard deviation:     4.24. Resultcount was (pts) min/avg/max:        0/     11.54/      17, median:       12, standard deviation:     2.41.
10000 Queries for   1 measurements over  1440 minutes took (ms) min/avg/max:     3.32/   15.31/  153.49, median:    13.12, standard deviation:     8.42. Resultcount was (pts) min/avg/max:        0/    275.96/     333, median:      288, standard deviation:    58.26.
  100 Queries for   1 measurements over 86400 minutes took (ms) min/avg/max:   527.22/ 1109.86/ 1741.76, median:  1107.97, standard deviation:   220.34. Resultcount was (pts) min/avg/max:        1/  11438.42/   12778, median:    11771, standard deviation:  2015.65.
10000 Queries for   4 measurements over     5 minutes took (ms) min/avg/max:     8.77/   23.99/  147.99, median:    19.02, standard deviation:    14.52. Resultcount was (pts) min/avg/max:        1/      3.83/       4, median:        4, standard deviation:     0.40.
10000 Queries for   4 measurements over    60 minutes took (ms) min/avg/max:     9.22/   27.19/  233.38, median:    21.39, standard deviation:    16.00. Resultcount was (pts) min/avg/max:       12/     45.90/      53, median:       48, standard deviation:     5.08.
 5871 - 1152Request error: (1) Failed to connect to localhost port 8428: Connection refused

The Benchmark starts with tiny queries for only one series over the minimum timespan of 5min. It then raises the timespan and series count step by step.

Phase RSS
startup 59 MByte
after 10.000 Queries 1 series over last 5 min 64 MByte
after 10.000 Queries 1 series over last 1 hour 67 MByte
after 10.000 Queries 1 series over last 1 day 78 MByte
after 100 Queries 1 series over last 1.5 month1 87 MByte
after 10.000 Queries 4 series over last 5 min 128 MByte
after 10.000 Queries 4 series over last 1 hour 168 MByte
after 10.000 Queries 4 series over last 1 day 178 MByte

1 divided in 45 smaller consecutive 1 day queries

Observed VM Metrics

I was asked for the following metrics to add to the issue description. I'm happy to provide more if necessary. Metric Value
vm_allowed_memory_bytes 31457280
vm_available_memory_bytes 524210176

Expected Behaviour

I understand that the real memory consumption does not alone depend on the provided command line flags but what definitely was unxepected to see was the RSS raising indefinitely until no free memory pages are available anymore.

I have expected VM would

Further comments

Epilog

I'm out of ideas how to tame VM regarding memory consumption. I'd say I don't expect too much from it limiting all queries to 1day timespans and only a hand full of series. Even the biggest queries result in around 300 data points.

If anyone has an idea or even a comment that maybe I'm in vain since it won't run with that fistful of RAM is appreciated!

Troubleshooting docs

AndrewChubatiuk commented 6 months ago

hey @aprospero Thanks for a question Could you please share a memory profile?

aprospero commented 6 months ago

Hey AndrewChubatiuk, thanks for having a look into it!

Could you specify more in detail what you need? Do you mean a certain VM profiler info? I'm not familiar with golang, so how can I extract that info?

aprospero commented 5 months ago

@AndrewChubatiuk Bump