golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.12k stars 17.45k forks source link

proposal: encoding/csv: limit record size with Reader.MaxRecordSize #67536

Open personnumber3377 opened 3 months ago

personnumber3377 commented 3 months ago

Go version

go version go1.22.3 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/oof/.cache/go-build'
GOENV='/home/oof/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/oof/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/oof/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.3'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/oof/go/oof/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3135575456=/tmp/go-build -gno-record-gcc-switches'

What did you do?

The encoding/csv readRecord function has this line:

        dst = make([]string, len(r.fieldIndexes))

Here is a golang program which demonstrates this bug: https://go.dev/play/p/u7OUTS71q2Y

This bug is very similar to this: https://github.com/advisories/GHSA-8v5j-pwr7-w5f8 (aka CVE-2023-24534)

I reported this through google bug bounty, but this was considered to not have big enough impact to be considered a vulnerability.

What did you see happen?

The program allocates enormous amounts of memory based on attacker controlled input.

What did you expect to see?

I expected the csv reader to limit the arguments passed to "make", such that the program doesn't allocate too much memory.

randall77 commented 3 months ago

You might want to look at internal/saferio.SliceCap, could we just use that instead of make?

personnumber3377 commented 3 months ago

@randall77 I don't know. Probably. I am just a bug bounty hunter with very limited knowledge in golang. Maybe someone with more experience can take a look? I am just trying to work out the reasoning on why this wasn't considered a vulnerability, even though the other bug (the CVE) was even though they have the exact same exploitation mechanism behind them.

seankhliao commented 3 months ago

cc @golang/security

neild commented 3 months ago

Thanks for the report.

For future reference, the best way to report potential Go security issues is by sending mail to security@golang.org (https://go.dev/doc/security/policy#reporting-a-security-bug). This ensures the Go security team will see it.

I've taken a look at *encoding/csv.Reader.readRecord. I think that this is not something we would consider a vulnerability, but that encoding/csv.Reader is also inherently unsafe for use on arbitrary untrusted inputs (with some caveats, see below).

The Reader.Read function returns a single record (line) from the input as a []string containing the fields in the record. The line you reference above is allocating that []string:

dst = make([]string, len(r.fieldIndexes))

If the Reader.FieldsPerRecord configuration parameter is not set, there is no limit on the number of fields, and Read can therefore allocate an arbitrarily long slice. In the linked playground example, you're constructing a CSV file consisting of a single line containing 20001 fields, and Read returns a 20001-element slice.

A slice header is 16 bytes, so a Read of a CSV line containing nothing but commas (",,,,,,") will allocate about 16x the memory of the input line. That's a fairly sizable expansion.

This is not something that we would consider a vulnerability, however, because the allocation is inherent in the design of the encoding/csv.Reader API. If it returns a []string, it needs to allocate a []string. This may make the Reader API unsuitable for parsing untrusted inputs in some situations, but it isn't something that can be fixed without changing the API.

We could set a default cap on the maximum size of a field, but any cap is going to either be too small or break existing users depending on the ability to parse CSV records with many fields.

In contrast, CVE-2023-24534 applies to a case where MIME header parsing could allocate substantially more memory than the amount required to hold the parsed data. This does not apply here; encoding/csv is allocating exactly enough memory to hold the returned data.

I do think we should consider adding the ability to limit the size of records to encoding/csv.Reader, to make it easier to use safely with untrusted inputs. (Perhaps Reader.MaxRecordSize, applying to the sum of the field lengths in the record plus per-entry overhead?)