golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.14k stars 17.69k forks source link

proposal: encoding/baseXX: add Encoding.RejectLineFeeds #53845

Open dsnet opened 2 years ago

dsnet commented 2 years ago

Currently, base32 and base64 ignore carriage returns and linefeeds by default.

This behavior goes against RFC 4648, sections 3.3 which state:

Implementations MUST reject the encoded data if it contains characters outside the base alphabet when interpreting base-encoded data, unless the specification referring to this document explicitly states otherwise. Such specifications may instead state, as MIME does, that characters outside the base encoding alphabet should simply be ignored when interpreting data ("be liberal in what you accept"). Note that this means that any adjacent carriage return/line feed (CRLF) characters constitute "non-alphabet characters" and are ignored.

Rejection of "characters outside the base encoding alphabet" (including carriage returns and line feeds) should be the default, unless specified otherwise by some higher-level specification (e.g., MIME). The decision to allow \r or \n should not have been made by the base32 and base64 packages, but rather by the users of it.

Today, base32 and base64 already ignore \r and \n by default and we can't change that, but we should expose control over this behavior:

// RejectLineFeedscreates a new encoding identical to enc except that
// rejects the presence of carriage returns and line feeds as
// described in RFC 4648, sections 3.1 and 3.3.
func (enc Encoding) RejectLineFeeds() *Encoding
dsnet commented 2 years ago

An alternative and more flexible API is (per https://github.com/golang/go/issues/54054#issuecomment-1194998778):

// WithIgnored specifies a set of non-alphabet characters that are ignored
// when parsing the input. An empty string causes the encoder to reject
// all characters that are not part of the encoding alphabet.
// A newly created Encoder ignores '\r' and '\n' by default.
func (enc Encoding) WithIgnored(chars string) *Encoding

My original proposal would be equivalent to enc.WithIgnored(""), while #54054 could be accomplished using enc.WithIgnored("\t\v\f \r\n").

gopherbot commented 1 year ago

Change https://go.dev/cl/532295 mentions this issue: encoding: support WithIgnored in base32 and base64

dsnet commented 1 year ago

This feature combined with #53844 makes it possible to implement a truly bijective mapping between baseXX and binary data. This would allow the use of base32 and base64 to produce a truly canonical encoding per RFC 4648, section 3.5.

dsnet commented 3 months ago

golang/protobuf#1626 arose because the "google.golang.org/protobuf/encoding/protojson" package implicitly allowed newlines and carriage returns because the default behavior of the "base64" package is to ignore such characters. Having this option to begin with would have avoided that problem.

puellanivis commented 3 months ago

I like the idea of WithIgnored as all sorts of whitespaces are common when dealing with various base64 uses. This option also gives the greatest flexibility to allow for strict nothing else but valid characters, the current newline/carriage return only ignored, and more flexible permissive whitespace ignoring in general.