PuerkitoBio / purell

tiny Go library to normalize URLs
BSD 3-Clause "New" or "Revised" License
468 stars 59 forks source link

Reserved characters should not be percent-encoded #7

Closed ghost closed 9 years ago

ghost commented 9 years ago

purell should not normalize reserved characters, as per RFC3986:

  reserved    = gen-delims / sub-delims

  gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

  sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="
package main

import (
    "fmt"

    "github.com/PuerkitoBio/purell"
)

func main() {
    fmt.Println(purell.MustNormalizeURLString("my_(url)", purell.FlagsSafe))
}

The above code outputs my_%28url%29, whereas it should be my_(url). This is due to a bug in Go stdlib (issue 5684).

mna commented 9 years ago

Totally agree, but as you mention, this is due to the parsing and escaping done by Go's stdlib. Once/if the bug is fixed in Go, this will be fixed too. Not a purell bug per se.

ghost commented 9 years ago

I believe that purell should use its own implementation of url.Parse/url.String and not rely on the buggy stdlib.

mna commented 9 years ago

Pull requests welcome.