Closed akalenyu closed 2 weeks ago
If a program runs out of memory, it can't continue. 1G is not going to be enough for a large program. What are you suggesting that we do here?
I am suggesting that something leaks, since it's very unlikely that simple program would run out of 1G of memory
I don't think it's unlikely at all. You are limiting the size of virtual memory, but the program itself takes up virtual memory.
Hmm, the core dump size is also nowhere near 1G. What am I missing here?
ls -sh ./virt-chroot.core
136M ./virt-chroot.core
The Go runtime reserves large portions of address space, see https://go.dev/doc/gc-guide#A_note_about_virtual_memory. RLIMIT_AS is not a good way to limit the memory of Go programs.
cc @mknyszek
Hmm, the core dump size is also nowhere near 1G. What am I missing here?
RLIMIT_AS
is about address space reservation, not actually committed memory. I see 24 mappings that are larger than 50 megabytes. The address space limit is already exceeded at the time of the syscall.Rlimit
call. Linux does not return failure in this case. The limit is still applied, but future allocation (actually: address space reservation) from the kernel will fail. But there is still plenty of unused Go heap, so most of the time, the program succeeds. There's probably some heap expansion heuristic that kicks in very rarely, and that produces the sporadic failure. But in truth, the process is in a bad state during every run.
The Go runtime reserves large portions of address space, see https://go.dev/doc/gc-guide#A_note_about_virtual_memory. RLIMIT_AS is not a good way to limit the memory of Go programs.
cc @mknyszek
Thank you! this link is super useful. From a quick search I see that we have no better options unfortunately, since SetMemoryLimit is a soft limit and we're kind of looking at protecting against indirect malicious usage of a binary we call which is out of our control.
RLIMIT_AS
is about address space reservation, not actually committed memory. I see 24 mappings that are larger than 50 megabytes. The address space limit is already exceeded at the time of thesyscall.Rlimit
call. Linux does not return failure in this case. The limit is still applied, but future allocation (actually: address space reservation) from the kernel will fail. But there is still plenty of unused Go heap, so most of the time, the program succeeds. There's probably some heap expansion heuristic that kicks in very rarely, and that produces the sporadic failure. But in truth, the process is in a bad state during every run.
Thank you. I am assuming those come from the "sufficiently large dependencies" but I am particularly interested in 2 mmaps for 512Mi that IIUC already exceed the disclaimers in https://go.dev/doc/gc-guide#A_note_about_virtual_memory
If you are running on Linux, I recommend using a memory cgroup to limit memory use of an application. Memory cgroups measure actual memory usage more precisely and aren't quite so trivially circumvented (RLIMIT_AS is immediately circumvented by fork()).
If you are running on Linux, I recommend using a memory cgroup to limit memory use of an application. Memory cgroups measure actual memory usage more precisely and aren't quite so trivially circumvented (RLIMIT_AS is immediately circumvented by fork()).
So we follow up on the Setrlimit call with a syscall.Exec
, so it should not be circumvented.
I am thinking the same about cgroups being our best bet, but, if we don't, I guess we could do some nasty things with a short cgo func that calls setrlimit
and execve
directly
Related Issues and Documentation
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Ah, do you actually want to apply the rlimit to a (non-Go) process you are exec'ing to? I think there is reasonable room for a proposal to add rlimits to https://pkg.go.dev/syscall#SysProcAttr so that os.StartProcess
/ os/exec
could set rlimits for the child process.
Note that at least on Linux you can also do that by executing the process via the prlimit command.
So playing around a bit with this new finding, I am getting larger VSS numbers in the core dump than I was expecting. I acknowledge that I might be doing something terribly wrong, but, would 1.4Gi be expected with the below program?
package main
import (
"fmt"
"net/http"
)
func main() {
fmt.Println(http.Client{})
panic("test")
}
$ eu-readelf -l reprducer-cgo-pthread.core | awk '{sz=strtonum($6); if (sz > 50 * 1024 * 1024) {print sz}}' | awk '{n += $1}; END{print n}'
1505267712
Timed out in state WaitingForInfo. Closing.
(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)
Go version
go version go1.22.6 linux/amd64
Output of
go env
in your module/workspace:What did you do?
Running a simple program that sets a 1G memory rlimit and does basically nothing, with a dep on a sufficiently large library (k8s for the purpose of demonstration) https://github.com/akalenyu/kubernetes/commit/b88d05b4892ce16634200b54cf84a7e1396f32cd
To reproduce:
What did you see happen?
What did you expect to see?
No sporadic ENOMEM