The memory usage keeps increasing after starting and finally results in an OOM error

escapekyg commented 1 month ago

Describe the bug

we start the VictoriaMetrics, and the memory usage gradually increases, and finally results in an OOM (Out of Memory) error. we didn't use any special command options(attached below)

for the data path /data/victoriaMetrics, it's about 73GB. the following is the memory usage metrics for a period, when vm was just started :

requested from system 31GB
resident 27.9GB
heap inuse 27.3GB
resident anonymous 27.3GB

in the meantime, the memory requested from the system suddenly increased to a new level, and the rest of the metrics increased gradually.

before VM encountered an OOM error:

requested from system 42.9GB
resident 32GB
heap inuse 31.9GB
resident anonymous 31.4GB

and other information is attached blow

According to the official documentation, it is recommended to have 50% free memory. In this scenario:

should the server have around 32GB of free memory?
Will the memory usage continue to increase? Does the memory leak exsist?
Is it possible to maintain memory usage at a stable level and prevent OOM errors, to avoid the need for continuous memory expansion?

To Reproduce

restart and wait for serveral days

Version

1.85.3

Logs

2024-03-30T18:01:27.108Z info VictoriaMetrics/lib/storage/partition.go:1366 merged (14 parts, 1104145313 rows, 14869628 blocks, 2074361386 bytes) into (1 part, 1104145313 rows, 3378025 blocks, 1487565696 bytes) in 82.407 seconds at 13398663 rows/sec to "/data/victoriaMetrics/data/big/2024_03/1104145313_3378025_20240328192345.254_20240330180010.170_17BE82CFAABB1D5F" 2024-03-30T22:42:40.868Z error VictoriaMetrics/lib/vmselectapi/server.go:166 cannot perform vmselect handshake with client "10.67.17.70:41200": cannot read isCompressed flag: cannot read message with size 1: EOF; read only 0 bytes 2024-03-30T22:42:40.869Z error VictoriaMetrics/lib/vmselectapi/server.go:166 cannot perform vmselect handshake with client "10.67.17.69:26958": cannot read isCompressed flag: cannot read message with size 1: EOF; read only 0 bytes fatal error: runtime: out of memory

runtime stack: runtime.throw({0x90df79?, 0x2030?}) runtime/panic.go:1047 +0x5d fp=0x7efbd1d29c78 sp=0x7efbd1d29c48 pc=0x4381dd runtime.sysMapOS(0xcb2a800000, 0xcdc00000?) runtime/mem_linux.go:187 +0x11b fp=0x7efbd1d29cc0 sp=0x7efbd1d29c78 pc=0x4193bb runtime.sysMap(0xca2b40?, 0x7f0375c50000?, 0x42c8e0?) runtime/mem.go:142 +0x35 fp=0x7efbd1d29cf0 sp=0x7efbd1d29cc0 pc=0x418d95 runtime.(mheap).grow(0xca2b40, 0x66daf?) runtime/mheap.go:1468 +0x23d fp=0x7efbd1d29d60 sp=0x7efbd1d29cf0 pc=0x42999d runtime.(mheap).allocSpan(0xca2b40, 0x66daf, 0x0, 0x1) runtime/mheap.go:1199 +0x1be fp=0x7efbd1d29df8 sp=0x7efbd1d29d60 pc=0x4290de runtime.(*mheap).alloc.func1() runtime/mheap.go:918 +0x65 fp=0x7efbd1d29e40 sp=0x7efbd1d29df8 pc=0x428b65 runtime.systemstack() runtime/asm_amd64.s:492 +0x49 fp=0x7efbd1d29e48 sp=0x7efbd1d29e40 pc=0x4699a9

goroutine 56 [running]: runtime.systemstack_switch() runtime/asm_amd64.s:459 fp=0xc3d9e8d348 sp=0xc3d9e8d340 pc=0x469940 runtime.(mheap).alloc(0xcdb5e000?, 0x66daf?, 0x5b?) runtime/mheap.go:912 +0x65 fp=0xc3d9e8d390 sp=0xc3d9e8d348 pc=0x428aa5 runtime.(mcache).allocLarge(0x404bf1?, 0xcdb5e000, 0x1) runtime/mcache.go:233 +0x85 fp=0xc3d9e8d3e0 sp=0xc3d9e8d390 pc=0x417d25 runtime.mallocgc(0xcdb5e000, 0x0, 0x0) runtime/malloc.go:1029 +0x57e fp=0xc3d9e8d458 sp=0xc3d9e8d3e0 pc=0x40e43e runtime.growslice(0xc1b54a00e8?, {0xcba170?, 0xc8050a8000?, 0xcdb5c403?}, 0xcdb5e000?) runtime/slice.go:284 +0x4ac fp=0xc3d9e8d4c0 sp=0xc3d9e8d458 pc=0x4504ac github.com/valyala/gozstd.decompress(0xc45d40?, 0xc87580?, {0x0, 0x0, 0x0}, {0xc6b1660000, 0xc64236, 0xc98000}, 0x68c76d?) github.com/valyala/gozstd@v1.17.0/gozstd.go:280 +0x32b fp=0xc3d9e8d590 sp=0xc3d9e8d4c0 pc=0x65c78b github.com/valyala/gozstd.DecompressDict({0x0, 0x0, 0x0}, {0xc6b1660000, 0xc64236, 0xc98000}, 0x0) github.com/valyala/gozstd@v1.17.0/gozstd.go:199 +0xf5 fp=0xc3d9e8d620 sp=0xc3d9e8d590 pc=0x65c1d5 github.com/valyala/gozstd.Decompress(...) github.com/valyala/gozstd@v1.17.0/gozstd.go:184 github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding/zstd.Decompress({0x0?, 0xc6b1660000?, 0xc3d9e8d638?}, {0xc6b1660000?, 0x49d246?, 0xc462b81dd0?}) github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding/zstd/zstd_cgo.go:12 +0x26 fp=0xc3d9e8d668 sp=0xc3d9e8d620 pc=0x65f3c6 github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding.DecompressZSTD({0x0?, 0x0, 0x0?}, {0xc6b1660000, 0xc64236, 0xc98000}) github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding/compress.go:27 +0x65 fp=0xc3d9e8d710 sp=0xc3d9e8d668 pc=0x695c85 github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset.unmarshalMetaindexRows({0x0, 0x0, 0x0}, {0xa2a320?, 0xc462b81dd0?}) github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset/metaindex_row.go:89 +0xe8 fp=0xc3d9e8d830 sp=0xc3d9e8d710 pc=0x6d4ca8 github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset.newPart(0xc3d9e8da48, {0xc25af34660, 0x57}, 0xc8956438, {0xa2bda0?, 0xc462b81dd0}, {0xa2be68?, 0xc44b6a42e0}, {0xa2be68, 0xc44b6a4300}, ...) github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset/part.go:103 +0x9c fp=0xc3d9e8d920 sp=0xc3d9e8d830 pc=0x6d583c github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset.openFilePart({0xc25af34660?, 0xc0000e0900?}) github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset/part.go:98 +0x385 fp=0xc3d9e8da98 sp=0xc3d9e8d920 pc=0x6d5765 github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset.(Table).openCreatedPart(0xc0000fc140, 0xc2231f5ae0?, {0xc236fa2d00?, 0xf, 0x10}, 0x0, {0xc2231f5ae0, 0x4b}, 0x3?) github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset/table.go:1295 +0x585 fp=0xc3d9e8dba8 sp=0xc3d9e8da98 pc=0x6e0965 github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset.(Table).mergeParts(0xc0000fc140, {0xc236fa2d00?, 0xf, 0x10}, 0x0?, 0x0) github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset/table.go:1139 +0x8b3 fp=0xc3d9e8de48 sp=0xc3d9e8dba8 pc=0x6df353 github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset.(Table).mergeExistingParts(0xc0000fc140, 0x50?) github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset/table.go:967 +0x26a fp=0xc3d9e8def0 sp=0xc3d9e8de48 pc=0x6de5ea github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset.(Table).mergeWorker(0xc0000fc140) github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset/table.go:984 +0x7e fp=0xc3d9e8dfb8 sp=0xc3d9e8def0 pc=0x6de6be github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset.(Table).startMergeWorkers.func1() github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset/table.go:910 +0x25 fp=0xc3d9e8dfe0 sp=0xc3d9e8dfb8 pc=0x6de0e5 runtime.goexit() runtime/asm_amd64.s:1594 +0x1 fp=0xc3d9e8dfe8 sp=0xc3d9e8dfe0 pc=0x46bb61 created by github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset.(Table).startMergeWorkers github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset/table.go:909 +0x37

Screenshots

No response

Used command-line flags

./vmstorage -storageDataPath=/data/victoriaMetrics -httpListenAddr=10.0.0.1:18000 -vminsertAddr=10.0.0.1:18001 -vmselectAddr=10.0.0.1:18002 -retentionPeriod=1y ./vminsert -replicationFactor=2 -httpListenAddr=10.0.0.1:18004 -storageNode=10.0.0.1,10.0.0.2,10.0.0.3 ./vmselect -replicationFactor=2 -cacheDataPath=/vm/tmp -httpListenAddr=10.0.0.1:18003 -dedup.minScrapInterval=15s -storageNode=10.0.0.1,10.0.0.2,10.0.0.3

Additional information

No response

zekker6 commented 1 month ago

Hello @escapekyg

Could you elaborate on the deployment architecture you're using? Based on IP addresses in command-line flags it seems like there are 3 machines which host all three components, is it correct?

Also, seems like you're using quite outdated release, could you try upgrading to more recent release and check if the issue persists? Recent releases include a lot of bugfixes and improvements including ones related to memory usage.

escapekyg commented 1 month ago

Hello @escapekyg

Could you elaborate on the deployment architecture you're using? Based on IP addresses in command-line flags it seems like there are 3 machines which host all three components, is it correct?

Also, seems like you're using quite outdated release, could you try upgrading to more recent release and check if the issue persists? Recent releases include a lot of bugfixes and improvements including ones related to memory usage.

Hello @zekker6 Thanks for you reply. Yes, you're right, it's a cluster with 3 nodes. Which version do you recommend upgrading to? Can version 1.99.0 be directly compatible with 1.85.3?

zekker6 commented 1 month ago

Yes, you're right, it's a cluster with 3 nodes.

Such deployment is usually not recommended because each component tries to use up to full memory size for the operations. In case all components are running on the same machine please make sure to adjust memory.allowedPercent value. By default the value is 60% which means that component will try to fit into 60% leaving 40% for OS level caches. In case all components are hosted on single machine each component will try to use up to 60% percent of overall memory which can lead to OOM issues. Depending on the deployment type it might be possible to limit amount of memory available to the processes which will also work.

Which version do you recommend upgrading to? Can version 1.99.0 be directly compatible with 1.85.3?

Ideally I would recommend to wait for 1.100.0 to be released(expected early this week) as 1.99.0 had a bug which could lead to inconsistent data read(see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959).

It is possible to upgrade directly to the latest releases, please note that there were releases with backward incompatible changes which means that it will not be possible to roll back to versions older than these. The releases are:

v1.90.0 - https://docs.victoriametrics.com/changelog_2023/#v1900
v1.92.0 - https://docs.victoriametrics.com/changelog_2023/#v1920

Also note that there were other deprecation warning in betwee v1.85.3 and current releases which deprected metrics and flags used by the VictoriaMetrics and its' components, you can find all changes in the changelog - https://docs.victoriametrics.com/changelog/

escapekyg commented 2 weeks ago

Ideally I would recommend to wait for 1.100.0 to be released(expected early this week) as 1.99.0 had a bug which could lead to inconsistent data read(see #5959).

Hello, we've upgraded to the latest version 1.100.1. However, we've noticed that the memory usage gradually increases after startup, eventually leading to an out-of-memory (OOM) error several days later. The official documentation recommends having 50% of free memory, does this mean that the memory usage after startup should occupy 50% of the system's memory? （considering that it increases after startup）

Additionally, we've deployed the vminsert and vmselect on the same node as the vmstorage. Despite this, we've observed that vminsert and vmselect don't consume much memory. Is this still a concern?

jiekun commented 2 weeks ago

A small suggestion for issues of memory and cpu usage: post the issue along with a pprof result of the component might be helpful.

Also some useful links:

hagen1778 commented 2 weeks ago

Hello @escapekyg ! How many vmstorage/vmselect/vminsert instances do you have? I'd suggest reading https://docs.victoriametrics.com/troubleshooting/#out-of-memory-errors in order to understand what could the root cause of the high memory consumption. From the screenshots you provided it seems like the % of slow inserts is 30% on avg. It is quite high %. If you check the description of that panel you'll find the following info:

The percentage of slow inserts comparing to total insertion rate during the last 5 minutes. The less value is better. If percentage remains high (>10%) during extended periods of time, then it is likely more RAM is needed for optimal handling of the current number of active time series. In general, VictoriaMetrics requires ~1KB or RAM per active time series, so it should be easy calculating the required amounts of RAM for the current workload according to capacity planning docs. But the resulting number may be far from the real number because the required amounts of memory depends on many other factors such as the number of labels per time series and the length of label values. See also this issue for details.

Let me know if this helps.

escapekyg commented 1 week ago

@hagen1778 thanks! it's a cluster with 3 nodes, and each node has vmstorage, vmselect and vminsert, I listed the command line in the first comment.

@jiekun thanks! the attachment is the storage's mem and cpu pprof file.it was collected after vmstorage started and before it crashed. vmstorage_pprof.zip

we use the latest version 1.100.1, but the memory increased faster, and it crashes for OOM after starting for a while, it comsumes over 50-60 GB, even though we shutdown all the write and read traffic, and deleted the metric name with the highest number of series. this week, I observed that the memory usaged after starting grew from 30+ GB to 50+ GB, while the data storage path each node grew from 90 GB to 100 GB.

the following is some metrics before crush.

Can you give me some suggestion about how to plan the capacity of resource for this cluster?
The official documentation recommends having 50% of free memory, does this mean that we need 120+ GB ram?considering the mem usage is almost 60 GB after started.
we've deployed the vminsert and vmselect on the same node as the vmstorage. Despite this, we've observed that vminsert and vmselect don't consume much memory. Is this still a concern?

escapekyg commented 1 week ago

@zekker6 @hagen1778 @jiekun hello, thanks for your reply again ! Currently, we have increased the machines' memory from 64 GB to 128 GB, and restarted the vmstorage, with the 30+ GB indexdb file, the memory increased to 60~70GB, and gradully down to 40+ GB, I think we still need to deal with the high slow insert and churn rate. I just didn't expect that the memory usage can be much more higher than the indexdb size after starting

hagen1778 commented 1 week ago

Hello @escapekyg!

we use the latest version 1.100.1, but the memory increased faster, and it crashes for OOM after starting for a while, it comsumes over 50-60 GB

This is not expected memory usage based on the stats from your screenshots:

Active time series: 1.2 Mil
Churn rate 24h: 2-4 Mil You can see our playground installation which has similar number of active series and churn, but it uses 8GiB of RSS in total (see vmstorage mem usage).

I suspect, something is wrong with the data you write. On the screenshot for caches - do you show utilization or miss ratio? It is unclear without panel titles what I am looking at.

and deleted the metric name with the highest number of series

Could you please clarify what do you mean by "deleted the metric name"? Did you drop it via relabeling or stopped exposing from your apps?

escapekyg commented 1 week ago

hello @hagen1778

I suspect, something is wrong with the data you write. On the screenshot for caches - do you show utilization or miss ratio? It is unclear without panel titles what I am looking at.

the missing tiltle panel was Cache usage %, after adding memory to 128GB, this panel is different from that day

Could you please clarify what do you mean by "deleted the metric name"? Did you drop it via relabeling or stopped exposing from your apps?

sorry for bad english expression, at that time, I found some metrics with high number of series in cardinalty explorer, and I deleted them by curl. but it still crashed with OOM after starting in one minute. And we have stop ingesting those metrics that day. now the machine is 128GB, but the slow insert still high as before

Glorytianzhao commented 1 week ago

Hello, @hagen1778

I'm a colleague from the same team as the original inquirer, and I have several additional questions that require clarification:

Upon reviewing the source code, we've observed that when dealing with a 30GB indexdb file, it seems the entire indexdb file is loaded during the startup process. Subsequently, pertinent label data is extracted and loaded into various cache types. Does this indicate that we would necessitate 1 to 2 times the memory to initiate the entire program?
After the indexdb file has been loaded and integrated into the various cache types, is the memory reclaimed? We've observed that following startup, even in the absence of any write or query operations, the memory usage persists at or exceeds the size of the indexdb file.
In scenarios where a significant volume of historical data points with high churn rates results in an enlarged indexdb size, how can we purge the data from the indexdb without disrupting the program's functionality? We've attempted to remove the entire indexdb directory, only to find that the subsequently ingested indexdb data does not support queries for historical data points. Additionally, we've tried utilizing /prometheus/api/v1/admin/tsdb/delete_series, but noticed no substantial reduction in the indexdb size. We eagerly await your insights and recommendations. Thank you!

hagen1778 commented 1 week ago

the missing tiltle panel was Cache usage %, after adding memory to 128GB, this panel is different from that day

Thanks! Then it is more weird than I thought. If you have 40% tsid cache utilization and still many SlowInserts it means that 40% of the ingested data is new time series. This number should correlate with the churn rate, as churn rate reflects the amount of new time series registered. But based on the screenshots you provided - it doesn't correlate.

I found some metrics with high number of series in cardinalty explorer, and I deleted them by curl. but it still crashed with OOM after starting in one minute. And we have stop ingesting those metrics that day.

Deleting metrics via DELETE API won't help on reducing memory usage:

Note that VictoriaMetrics doesn’t delete entries from inverted index (aka indexdb) for the deleted time series. Inverted index is cleaned up once per the configured retention.

See https://docs.victoriametrics.com/#how-to-delete-time-series

But stopping ingesting those metrics will have an effect.

hagen1778 commented 1 week ago

Upon reviewing the source code, we've observed that when dealing with a 30GB indexdb file, it seems the entire indexdb file is loaded during the startup process. Subsequently, pertinent label data is extracted and loaded into various cache types. Does this indicate that we would necessitate 1 to 2 times the memory to initiate the entire program?

No, indexdb consists of parts. Parts contain arbitrary number of unique time series. VictoriaMetrics will load into memory only those parts, which contain time series that were ingested or queried recently. In other words, only active time series.

After the indexdb file has been loaded and integrated into the various cache types, is the memory reclaimed?

Caches are populated only on requests: read or write. If you don't perfrom any write or read operations, caches will remain still. However, VictoriaMetrics persists the caches on disk during graceful shutdown. This is needed to start with "hot" caches on start. You can remove caches, though. https://docs.victoriametrics.com/#cache-removal

In scenarios where a significant volume of historical data points with high churn rates results in an enlarged indexdb size, how can we purge the data from the indexdb without disrupting the program's functionality?

The data can't be purged from indexdb. If you wan't to prevent VM from cardinality explosions, please limit the cardinality on the clients which push data to VM. For example, VM collector vmagent supports cardinality limiting https://docs.victoriametrics.com/vmagent/#cardinality-limiter

Additionally, we've tried utilizing /prometheus/api/v1/admin/tsdb/delete_series, but noticed no substantial reduction in the indexdb size.

Deleting metrics via DELETE API won't help on reducing memory usage or removing data from disk:

Note that VictoriaMetrics doesn’t delete entries from inverted index (aka indexdb) for the deleted time series. Inverted index is cleaned up once per the configured retention.

See https://docs.victoriametrics.com/#how-to-delete-time-series

Glorytianzhao commented 1 week ago

@hagen1778 Thank you for your reply.

We have clarified the third question. However, we still have some concerns about 1.2. We observed the size of the <-storageDataPath>/cache directory, which seems to be only around 300~600MB. This is a file exported from our service when it crashed after memory grew to 50GB upon startup. By analyzing with Go tools, we can see that "mergeset.unmarshalMetaindexRows" consumed 35GB of memory, which seems inevitable. Moreover, we didn't observe the use of the remaining 20GB of memory.

The service only starts normally when we increase the memory to 70GB. Is this situation normal?

[Uploading vmstorage_mem4.pprof.txt…]()

VictoriaMetrics / VictoriaMetrics