golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.17k stars 17.57k forks source link

runtime: significant heap profiler memory usage increase in Go 1.23 #69590

Open nsrip-dd opened 6 days ago

nsrip-dd commented 6 days ago

Go version

go1.23.1

Output of go env in your module/workspace:

n/a

What did you do?

We upgraded our Go services to Go 1.23.1. All of our services use continuous profiling and have the heap profiler enabled. Go 1.23 increased the default call stack depth for the heap profiler (and others) from 32 frames to 128 frames.

What did you see happen?

We saw a significant increase in memory usage for one of our services, in particular the /memory/classes/profiling/buckets:bytes runtime metric:

Screenshot 2024-09-23 at 10 37 58

The maximum went from ~50MiB to almost 4GiB, an 80x increase. We also saw a significant increase in the time to serialize the heap profile, from <1 second to over 20 seconds.

We set the environment variable GODEBUG=profstackdepth=32 to get the old limit, and the profiling bucket memory usage went back down.

What did you expect to see?

We were surprised at first to see such a significant memory usage increase. However, the affected program is doing just about the worst-case thing for the heap profiler. It parses complex, deeply-nested XML. This results in a massive number of unique, deep stack traces due to the mutual recursion in the XML parser. And the heap profiler never frees any stack trace it collects, so the cumulative size of the buckets becomes significant as more and more unique stack traces are observed.

See this gist for a (kind of kludgy) example program which sees a 100x increase in bucket size from Go 1.22 to Go 1.23.

I'm mainly filing this issue to document this behavior. Manually setting GODEBUG=profstackdepth=32 mitigates the issue. I don't think anything necessarily needs to change in the runtime right now, unless this turns out to be a widespread problem.

cc @felixge

gabyhelp commented 6 days ago

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

ianlancetaylor commented 6 days ago

CC @golang/runtime @felixge

mknyszek commented 4 days ago

Thanks for the detailed issue! I'm glad this seems localized for now; I agree we should wait and see if this more widespread.

robert-thille-cb commented 3 days ago

Just wanted to confirm that the GODEBUG=profstackdepth=32 environment setting needs to be done at runtime, not at build time. Correct?

nsrip-dd commented 2 days ago

Yes, that's correct

prattmic commented 2 days ago

Copying from https://github.com/golang/go/issues/57175#issuecomment-2377500656:

This is theoretically a problem even with 32 frames. i.e., with left and right recursive frames, there are 2^32 combinations of those frames. Storing that many unique samples would likely use ~128GB of memory, assuming 1 byte per frame in a sample.

The reproducer above only triggers extreme usage only with a higher stack limit because it has 16 constant leaf frames. That limits to 2^16 combinations with a 32 frame limit, but 2^112 with a 128 frame limit. The production application that encountered this way presumably similar.