golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.99k stars 17.54k forks source link

unique: large string still referenced, after interning only a small substring #69370

Closed Deleplace closed 1 week ago

Deleplace commented 1 week ago

Go version

go version go1.23.1 darwin/arm64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/deleplace/Library/Caches/go-build'
GOENV='/Users/deleplace/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/deleplace/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/deleplace/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.23.1'
GODEBUG=''
GOTELEMETRY='on'
GOTELEMETRYDIR='/Users/deleplace/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/p4/g7fnpss96g5708_mmv3cxzgh0000gn/T/go-build3365171445=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

What did you see happen?

Program is still using 300MB

What did you expect to see?

Program should be using less than 10MB

Discussion

This issue has already been fixed by @mknyszek in CL 610738. The change will be included in the future Go 1.24.

The current issue is about documenting the behavior of Go 1.23.0 and Go 1.23.1 when interning strings, and retaining more memory than expected.

The problem arises only with strings, not with other types comparable with ==. String is the only comparable type that supports slicing.

When interning a small substring y of a large string x, 2 strategies are possible in the implementation of unique.Make: 1) Keep the original memory location of y, within x. x will not be garbage collected, because the object pool in unique is keeping a reference to a part of it. The user is responsible about the memory consequences of interning a substring. The user may decide to call strings.Clone beforehand themself. 2) Clone y into a fresh string y2, and intern y2. x may then be garbage collected.

unique implements strategy (2).

Before the fix 610738, unique.Make did proactively clone input strings. However, an internal map was still referencing the original input strings, which was unintended.

image

(in this diagram, Bwords is a []unique.Handle[string])

After the fix 610738, unique.Make proactively clones input strings, and then keeps only references to the clones. The large original input strings are then properly garbage collected.

image
gabyhelp commented 1 week ago

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

mknyszek commented 1 week ago

As @Deleplace states, this is already fixed. The only question is whether or not to backport it. I'll come back with a decision today.

mknyszek commented 1 week ago

Ah and, thank you @Deleplace for filing an issue. :)

ianlancetaylor commented 1 week ago

@gopherbot Please open backport to 1.23

Since the unique package is new in 1.23, we should implement this optimization there as well, for consistent behavior.

gopherbot commented 1 week ago

Backport issue(s) opened: #69383 (for 1.23).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

mknyszek commented 1 week ago

Thanks @ianlancetaylor!

gopherbot commented 1 week ago

Change https://go.dev/cl/612295 mentions this issue: [release-branch.go1.23] unique: don't retain uncloned input as key