encoding/pem: should allow whitespace in the base64 body

jimsnab commented 2 years ago

What version of Go are you using (`go version`)?

$ go version
go version go1.18.3 linux/amd64

Does this issue reproduce with the latest release?

Yes

What did you do?

package main

import (
    "encoding/pem"
    "fmt"
)

func main() {

    p := `-----BEGIN CERTIFICATE-----
        MIIGATCCA+mgAwIBAgIUCn3
         . . . valid pem data . . .
        ib7Jp7Q=
        -----END CERTIFICATE-----`

    blk, rest := pem.Decode([]byte(p))

    fmt.Printf("blk: %p\n", blk)
    fmt.Printf("rest len: %d\n", len(rest))
}

Paste valid PEM data in the example above. The whitespace indentation must be included. Typical output:

blk: 0x0
rest len: 2210

What did you expect to see?

Take the whitespace out of the PEM and see something like this:

blk: 0xc00010e180
rest len: 0

Per RFC 7468 says whitespace SHOULD be ignored. It's not a bug for this module to reject whitespace. But the RFC is absolutely clear that PEMs with whitespace can exist in the world, and is tolerated by some parsers, and I would expect this general purpose module to tolerate the whitespace as well.

Further, RFC 2045 is more explicit and says non-base64 characters MUST be ignored. Since the inner body of the PEM is base64, I would expect whitespace to be ignored by the golang parser.

What did you see instead?

Leading whitespace within the lines of the PEM octets causes the parser to fail.

seankhliao commented 2 years ago

cc @rsc

ianlancetaylor commented 2 years ago

CC @FiloSottile @agl

dsnet commented 2 years ago

This is related to #53845, which proposes the opposite, where it locks down the set of allowed characters. We could modify that proposal to instead be something more versatile like:

// WithIgnored specifies a set of non-alphabet characters that are ignored
// when parsing the input. An empty string causes the encoder to reject
// all characters that are not part of the encoding alphabet.
// A newly created Encoder ignores '\r' and '\n' by default.
func (enc Encoding) WithIgnored(chars string) *Encoding

Thus, for my original purposes for #53845, you could do enc.WithIgnored("") to be in a strict mode that rejects any non-alphabet characters. On the other hand, for the use-cases of PEM, the package could do enc.WithIgnored("\t\v\f \r\n") based on the list specified in RFC 7468, section 2.

Even though RFC 2045 says:

Any characters outside of the base64 alphabet are to be ignored in base64-encoded data.

IMO, that's a dangerous semantic. If someone were accidentally using the base64 URL alphabet (RFC 4648, section 5.), which uses '-' and '_' instead of '+' and '/', then that would not be detected as an error, but would be silently parsed as valid data. Ignoring whitespace, on the other hand, has a more obviously correct semantic.

gopherbot commented 1 year ago

Change https://go.dev/cl/532295 mentions this issue: encoding: support WithIgnored in base32 and base64

golang / go