golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.04k stars 17.67k forks source link

mime.ParseMediaType can't handle non-ASCII filename #1119

Closed gopherbot closed 9 years ago

gopherbot commented 14 years ago

by maddogfyg:

What steps will reproduce the problem?
1.Upload a file from <input type="file" name="someinput" ......
2.If the file name include non-ASCII character(such as "付云阁.jpg", my
chinese name), the post data is like:

Content-Disposition: form-data; name="someinput";
filename="付云阁.jpg"

But mime.ParseMediaType just return "",nil

In mime package, two funcs cause this problem:

func consumeValue(v string) (value, rest string) {
    if !strings.HasPrefix(v, `"`) {
        return consumeToken(v)
    }

    // parse a quoted-string
    rest = v[1:] // consume the leading quote
    buffer := new(bytes.Buffer)
    var idx, rune int
    var nextIsLiteral bool
    for idx, rune = range rest {
        switch {
        case nextIsLiteral:
            if rune >= 0x80 {
                return "", v
            }
            buffer.WriteRune(rune)
            nextIsLiteral = false
        case rune == '"':
            return buffer.String(), rest[idx+1:]
        case IsQText(rune):
            buffer.WriteRune(rune)
        case rune == '\\':
            nextIsLiteral = true
        default:
            return "", v
        }
    }
    return "", v
}

// IsQText returns true if rune is in 'qtext' as defined by RFC 822.
func IsQText(rune int) bool {
    // CHAR        =  <any ASCII character>        ; (  0-177,  0.-127.)
    // qtext       =  <any CHAR excepting <">,     ; => may be folded
    //                "\" & CR, and including
    //                linear-white-space>
    switch rune {
    case '"', '\\', '\r':
        return false
    }
    //return rune < 0x80
    return true
}

The RFC 822 is published in 1982, it just designed for ASCII-base system. Now
"Go" can handle utf8 very well, the 0x80 limitation is unnecessary.
So the last sentence in func IsQText, "return rune < 0x80", should be
"return true".
adg commented 14 years ago

Comment 1:

Owner changed to steph...@golang.org.

gopherbot commented 14 years ago

Comment 2 by stephenm@golang.org:

rfc 2616 (the http/1.1 spec) states that quoted strings in the header values can contain
only iso 8859-1 characters, and also states that non iso 8859-1 text needs to be encoded
using rfc 2047. This is repeated in rfc 5987.
Having said that, support for utf-8 encoding using rfc 5987 should be added. Also, if
there's a way in which we can correctly handle iso 8859-1 characters in the filename, it
would make sense to implement that as well.
For exhaustive background reading, see http://greenbytes.de/tech/tc2231/
gopherbot commented 14 years ago

Comment 3 by maddogfyg:

Hi,
Thanks for your reply.
So far, I have no problem with non-latin-1 character after I change the
source code, anyway, I'll check it and inform to you if something wrong,
thanks.
Yunge
rsc commented 14 years ago

Comment 4:

Status changed to Accepted.

bradfitz commented 13 years ago

Comment 5:

Owner changed to @bradfitz.

Status changed to Started.

bradfitz commented 13 years ago

Comment 6:

This issue was closed by revision 98176b7760bbdc592396f7fed6af0bc1a3a70a1.

Status changed to Fixed.