golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.68k stars 17.62k forks source link

proposal: net/url: add QueryParser and QueryEncoder interfaces #56300

Open chrisguiney opened 2 years ago

chrisguiney commented 2 years ago

Summary

I would like to propose adding new interfaces to net/url to allow users to have control over url query string parsing and encoding implementations.

This can be done in a backwards compatible manner by

  1. introducing the interface
  2. providing a default implementation of the interface that implements current behavior
  3. exposing the default implementations as package level variables
  4. updating call sites in the standard library to accept interface values to prefer over the default implementation

With the DefaultQueryParser and DefaultQueryEncoder being package level variables, a user can set the implementations once in main or init. This allows the user to have confidence that their chosen implementation will be used, even if some usages have not yet been updated to accept a custom implementation.

This approach allows users to gradually widen their API to accept a QueryParser or QueryEncoder, allowing for implementation to be spread over multiple releases if necessary.

Proposed Interface Addition

var (
    DefaultQueryParser QueryParser = qsParser{}
    DefaultQueryEncoder QueryEncoder = qsParser{}
)

type QueryParser interface {
    Parse(string) (Values, error)
}

type QueryEncoder interface {
    Encode(Values) (string, error)
}

type qsParser struct{}

func (qsParser) Parse(s string) (Values, error) {
    // body would be the current body of ParseQuery
}

func (qsParser) Encode(v Values) (string, error) {
    // body would be the current body of Values.Encode
}

// update existing apis to use the DefaultQueryParser
// and DefaultQueryEncoder

func ParseQuery(s string) (Values, error) {
    return DefaultQueryParser.Parse(s)
}

func (v Values) Encode() string {
     s, _ := DefaultQueryEncoder.Encode()
    return s
}

Rationale

Other systems have their own interpretations of the url RFCs, which are often at odds with how go interprets them. This creates situations where a behavior can be a security issue for one user, but an intentional and relied upon behavior for others. There is no single implementation that can serve all users.

By allowing custom implementations to be supplied, we're able to provide a safe default that covers the majority of use cases, but still allow users with very specific needs the flexibility to provide their own implementations.

Users of the package include http.Server and httputil.ReverseProxy`. Both are too large to reasonably expect users to fork in order to call their own parsing and encoding implementations.

Historical Issues

There have been a number of issues surrounding query string parsing and encoding. Security issues have caused a number of backwards incompatible changes, which then motivated other changes to allow prior behavior to be restored.

For users sensitive to these changes, some of the above issues caused operational disruption. In other cases, the backwards compatibility breakages were in point releases.

Issues since proposal

edit: forgot interface keyword on QueryParser and QueryEncoder

2022/11/21 edit: added Issues since proposal

neild commented 2 years ago

I don't think a global variable is a good idea: What happens if something parses a URL at init time? Or in a goroutine started at init time? Or if two places attempt to set the variable?

http.Request automatically calls ParseQuery to populate Request.Form on demand, but code that wants a different parser can always parse the query itself.

func httpHandler(w http.ResponseWriter, req *http.Request) {
  req.Form = myQueryParser(req.URL.RawQuery)
  // ...
}

ReverseProxy uses ParseQuery to sanitize forwarded URLs, but this behavior can be disabled:

// Go 1.20 Rewrite func:
proxy.Rewrite = func(r *httputil.ProxyRequest) {
  r.Out.URL.RawQuery = r.In.URL.RawQuery // preserve raw query string from inbound request
}
// Pre-1.20 Director func:
proxy.Director = func(r *http.Request) {
  // ...

  // Clear Request.Form before returning to disable query string sanitization.
  // Note that sanitization is not performed if Form is not set, so Director functions
  // that do not set Form (possibly implicitly by calling Request.FormValue) will never
  // sanitize the forwarded query.
  r.Form = nil
}
chrisguiney commented 2 years ago

What happens if something parses a URL at init time? Or in a goroutine started at init time? Or if two places attempt to set the variable?

The simplest answer would be "don't do that". The globals haven't been an issue in other places in the standard library such as:

It's not my first choice in design decisions, but given the go1 compatibility promise, I think it's the most failsafe way for a user to divorce themselves from the default implementation.

If concurrency is a concern, I think it'd be reasonable to keep the variables themselves at the package level and have a package level mutex and SetDefaultParser(QueryParser) and SetDefaultEncoder(QueryEncoder) functions. This doesn't necessarily take care of the case that an init function could spawn a goroutine while another is setting it, and it may be indeterminate which parser is used. I think this is a reasonable risk because no such code currently exists, so it can be warned against from the start.

As a general rule I would consider it a faux pas to set it anywhere other than main


I should also note, and perhaps i wasn't clear in the proposal, that users of url.ParseQuery should be updated to accept a query parser/encoder, so they're not reliant on the global default.

That is, http server should be extended to have a field:

type Server struct {
    ... existing fields...
   QueryParser url.QueryParser
}

which should be used if set, falling back to url.ParseQuery.

This might actually give a means to disable the log message that gets printed when http.Server receives a url with a semicolon in it, but without using http.AllowSemicolons. If a user has a custom url parser set, it could omit the message.

(the log message was introduced in https://github.com/golang/go/issues/25192)


http.Request automatically calls ParseQuery to populate Request.Form on demand, but code that wants a different parser can always parse the query itself.

This means that users need to maintain their own ParseForm, just to be able to use a different query string parser.

ReverseProxy uses ParseQuery to sanitize forwarded URLs, but this behavior can be disabled

Honestly, it didn't occur to me that it could be avoided in 1.19.2, I was under the impression that there was no way to restore prior behavior before 1.20. The release notes barely mentions it, let alone that it's breaking, and no public documentation was updated for it. The only place it's noted is on a comment made in the issue that initiated the change. I'm glad to know that there's an upgrade path until 1.20 comes out.


The list of hacks is adding up though:

There's also the overhead of having document and teach all of this to anyone onboarding onto a codebase.

At the end of the day, my program should be able to easily have a single consistent url parser that obeys the parsing rules I need it to follow. And I shouldn't have to fork net/url and net/http to do so. Nor any third party library that may use url.ParseQuery.

Life would be so much easier if I could easily set a global parser & encoder once, and just explain what behavior that's needed out of it that the default parser and encoder don't provide.

After two backwards incompatible changes in roughly a year, it'd feel much safer knowing that the go team can make whatever decisions around the default parser without impacting code that's very sensitive to parser changes.