debug/elf: NewFile can be tricked into reading the entire file into memory if it is corrupted

umanwizard commented 3 months ago

Go version

go version go1.22.2 linux/arm64

Output of `go env` in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/home/brennan/.cache/go-build'
GOENV='/home/brennan/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/brennan/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/brennan/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/gnu/store/vcp6w6z0syjwv383s0nywhy0svxk2d05-go-1.22.2/lib/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/gnu/store/vcp6w6z0syjwv383s0nywhy0svxk2d05-go-1.22.2/lib/go/pkg/tool/linux_arm64'
GOVCS=''
GOVERSION='go1.22.2'
GCCGO='gccgo'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build369678066=/tmp/go-build -gno-record-gcc-switches

What did you do?

Take a 64-bit little-endian ELF file at least 1GiB in size, and run the following script to corrupt it (by setting the shstrndx to point to a section of size 1GiB):

package main

import (
    "encoding/binary"
    "fmt"
    "os"
)

func main() {
    if len(os.Args) < 2 {
        fmt.Println("Usage: go run main.go <filename>")
        return
    }

    filename := os.Args[1]
    file, err := os.OpenFile(filename, os.O_RDWR, 0)
    if err != nil {
        fmt.Println("Error opening file:", err)
        return
    }
    defer file.Close()

    // Write "1" to shstrndx
    _, err = file.WriteAt([]byte { 1, 0, 0, 0}, 62)
    if err != nil {
        fmt.Println("Error writing to file:", err)
        return
    }

    // Get offset of section 1
    var data [8]byte
    _, err = file.ReadAt(data[:], 40)
    if err != nil {
        fmt.Println("Error reading from file:", err)
        return
    }
    offset := int64(binary.LittleEndian.Uint64(data[:])) + 64

    // Change sec1 type to 3 (SHT_STRTAB), offset to 0, size to 1GiB
    file.WriteAt([]byte { 3, 0 }, offset + 4)
    file.WriteAt([]byte {0, 0, 0, 0, 0, 0, 0, 0}, offset + 24)
    file.WriteAt([]byte {0, 0, 0, 0x40, 0, 0, 0, 0}, offset + 32)
}

Then try to get the buildinfo, e.g. using https://github.com/kelseyhightower/buildinfo . This uses elf.NewFile under the hood.

What did you see happen?

The program consumes 1GiB of memory before failing.

What did you expect to see?

Memory use should be bounded.

The reason is that debug/elf reads the entire shstrtab into memory in order to get section names. It uses saferio.ReadData to do this.

ReadData's doc comment reads as follows:

// ReadData reads n bytes from the input stream, but avoids allocating
// all n bytes if n is large. This avoids crashing the program by
// allocating all n bytes in cases where n is incorrect.

However, this is misleading: it does not actually prevent allocating more than n bytes (which is 10MB in this case). It just prevents it from allocating that many at a time. It will happily allocate chunks in a loop until it has read up to n or the total file size.

gabyhelp commented 3 months ago

Related Issues and Documentation

_{(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)}

ianlancetaylor commented 3 months ago

It would be a bug if there were a way to trick debug/elf into allocating more memory than exists in the file. That is what the saferio package avoids. That kind of thing is bad because an attacker can provide a small file and trick the program into allocating a great deal of memory.

I think it's less interesting when the program doesn't read more than the contents of a file. That means that for an attacker to get the program to allocate a lot of memory, the attacker has to provide a large file. That kind of attack doesn't scale, so it's not all that dangerous.

Of course I'm not opposed to fixing this if we can. But it doesn't seem high priority. In particular I don't think we should reduce the efficiency of debug/elf to fix this kind of problem.

umanwizard commented 3 months ago

Note that e.g. with truncate you can create a sparse file of arbitrary length without actually occupying disk space.

ianlancetaylor commented 3 months ago

That is true but that doesn't seem like a particularly likely remote attack. And a local attack can already allocate essentially arbitrary amounts of memory. Perhaps I am missing something.

umanwizard commented 3 months ago

I think we are on the same page that it's not a critical security issue. I was actually not even thinking about security when I opened the issue, but rather performance/efficiency. It is annoying if malformed files can cause unexpected OOMs or GC pauses, even if it's not a security issue.

Also, I question the premise that it's not exploitable if the amount of memory an attacker can cause to be allocated is limited by the size of the files. Imagine a service that lets customers upload ELF files, e.g. in order to extract debug symbols. It wouldn't be hard to upload a big enough file to cause the service to consume more memory than is available to it.

The easiest ways to solve this would be to either (1) just use normal file read APIs to get data from the shstrtab, rather than slurping the whole thing in, or (2) to mmap it. Both of these would fall short of your requirement to not make debug/elf less efficient: the first inflates the number of syscalls required, and the second will make goroutines block on I/O. Another very simple solution is to just fail if the section is bigger than some value like 10 MiB. It's hard to imagine a legitimate ELF file whose shstrtab is bigger than that ...

ianlancetaylor commented 3 months ago

It seems to me that there are other places in debug/elf where this class of problem can arise, such as the symbol table, the symbol string table, the symbol version sections, the dynamic section, the DWARF sections. What drew your attention particularly to the shstrtab section?

umanwizard commented 3 months ago

Because that is the only section that is read into memory in full when elf.NewFile is called (unless I misread the code).

ianlancetaylor commented 3 months ago

I expect that is true, but most users will also call the Symbols method or the Data method of some section or some other method that reads a section. They won't just call NewFile and not do anything else.

prattmic commented 2 months ago

Regardless of whether this counts as a bug, security issue, etc, it is certainly an annoyance that getting symbols from a binary with a large symbol table requires a lot of memory. It would be nice if there was a way to search for a specific symbol or iterate over symbols without consuming as much memory.

cc @mknyszek @zpavlinovic

prattmic commented 2 months ago

I took a look at a couple of large (500+ MB) C++ binaries I have laying around (as one does...). They had symtab sections around 40MB, which is a bit less than I expected. I'd say that amount of memory usage is not a huge issue, but could be painful for some applications.

golang / go