VictoriaMetrics / VictoriaMetrics

VictoriaMetrics: fast, cost-effective monitoring solution and time series database
https://victoriametrics.com/
Apache License 2.0
10.96k stars 1.1k forks source link

vmstorage custom build crashes for versions 1.87.3, 1.87.5, 1.87.14 #6208

Open jinlongwang opened 2 weeks ago

jinlongwang commented 2 weeks ago

Describe the bug

crash snapshot

image

main crash info

runtime: marked free object in span 0x7fce6fa713e8, elemsize=320 freeindex=0 (bad use of unsafe.Pointer? try -d=checkptr)

some context

  1. This problem occurs in the version we compiled ourselves.
  2. There is no any code change
  3. This problem does not occur when using the community version binary directly
  4. Our build environment(use self docker image):
    • golang versoin: FROM golang:1.21-alpine as builder
    • system version:debian-stretch
    • others:RUN apt-get update && apt-get install -y make rsync gcc g++ pkg-config cmake build-essential bzip2 nasm \ zlib1g-dev libelf-dev libelf1 libglib2.0-dev autoconf automake libtool libmagickwand-dev python-dev libcrypto++-dev clang curl libssl-dev libpq-dev \ openssl default-libmysqlclient-dev && apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

To Reproduce

random crash

Version

vmstorage-20240428-135734-heads-cluster_1.87.14-0-g7e9bc13f8

base: cluster_1.87.14 there is no any code change

Logs

No response

Screenshots

No response

Used command-line flags

No response

Additional information

No response

hagen1778 commented 2 weeks ago

@jinlongwang I'm removing the bug label because you said official pre-compiled binaries are Ok.

@f41gh7 @zekker6 do you any opinion on why this problem could happen?

zekker6 commented 1 week ago

@jinlongwang Could you also provide a go build command which you're using to create a build?

As far as I can see based on the shared command, you're using gcc to perform the build while official binaries are using musl. This can cause some issues in code which is using CGO. Is it possible to either use musl or switch to official build scripts to produce a binary?

f41gh7 commented 1 week ago

given error related to the case when object was marked as free by garbage collector. But at the next GC cycle a pointer for it is still in-use.

It could happen in the following cases: 1) incorrect usage of raw pointers. It's a trivial case, most linters could catch it. And we follow unsafe rules. So it's most unlikely. 2) missing runtime.KeepAlive call for object created by unsafe code. Potentially, when you create some object from bytes unsafely converted from string, garbage collector may mark original string as free for garbage collection. It relies on compiler escape analysis and could cause a problem if compiler incorrectly handles noinline marker.

Both cases are possible if you have an issue with golang compiler. I'd recommend to pin minor version number for build image, e.g. golang:1.22.2-alpine instead of golang:1.22-alpine.

jinlongwang commented 1 week ago

This looks like it's most likely an issue with the gcc version. The gcc in the VM compilation image is 13.2.1, while the version with which we previously had issues was 6.x. I have now recompiled a version using gcc12 and am currently monitoring it. Due to company environment constraints, I cant used gcc 13.2.1