emersion / go-message

✉️ A streaming Go library for the Internet Message Format and mail messages
MIT License
381 stars 109 forks source link

NextPart() fails with quoted-printable encoding and = in boundary #48

Closed BluePsyduck closed 5 years ago

BluePsyduck commented 5 years ago

When I tried to parse a mail with some attachments using emersion/go-imap, I came across a strange behavior: If the mail is encoded in "quoted-printable", and the boundary contains any = signs, then the Reader.NextPart() fails to correctly detect the parts of the message, leading to an "unexpected EOF" on the first call.

Changing either the encoding to something else or changing the boundary to not contain the = sign avoids the error.

Is this an actual bug in the package, or am I doing something wrong?

Example code:

The following code reproduces the unexpected behavior of the MultiPartReader:

package main

import (
    "fmt"
    "github.com/emersion/go-message/mail"
    "io/ioutil"
    "strings"
)

func main() {
    content := `Content-Type: multipart/mixed;charset="utf-8";boundary="BrokenBoundary=="
Content-Transfer-Encoding: quoted-printable

--BrokenBoundary==
Content-Type: text/html;charset="utf-8"
Content-Transfer-Encoding: 8bit

Some Fancy Content.

--BrokenBoundary==--
`

    mr, _ := mail.CreateReader(strings.NewReader(content))
    for {
        part, err := mr.NextPart()
        if err != nil {
            fmt.Println(err)
            break
        }
        b, _ := ioutil.ReadAll(part.Body)
        fmt.Println(string(b))
    }
}

Expected output:

Some Fancy Content.

EOF

Actual output:

multipart: NextPart: unexpected EOF
emersion commented 5 years ago

Can you print mr.Header.ContentType()?

BluePsyduck commented 5 years ago

Output of ContentType is multipart/mixed and the parameter map is map[boundary:BrokenBoundary== charset:utf-8]

The problem is also reproducable on the playground: https://play.golang.org/p/sJ0fa_T3ehQ

emersion commented 5 years ago

Seems like a bug to me.

emersion commented 5 years ago

Hmm, m1sdirection made me realize on IRC that the topmost part has the Content-Transfer-Encoding: quoted-printable header field. This means that multipart body is quoted-printable encoded. This doesn't make a lot of sense, and I'm not sure what's the best way to fix it.

emersion commented 5 years ago

Alright, found the relevant section of the RFC:

RFC 2045 6.4: If an entity is of type "multipart" the Content-Transfer-Encoding is not permitted to have any value other than "7bit", "8bit" or "binary".

I guess we'll need to add a quirk for those non-conformant messages.

BluePsyduck commented 5 years ago

Reading the attachments is working fine now. Thank you for the very fast fix :)

swagftw commented 9 months ago

Late to the party but I am getting this error again. I am using v1.2.1

Content-Type: multipart/mixed; boundary="-----SECBOUND"

-------SECBOUND
Content-Type: text/html
Content-Transfer-Encoding: 8bit

// html content

-------SECBOUND--
emersion commented 9 months ago

Please open a new issue with full details, this doesn't seem like the same bug to me.

swagftw commented 9 months ago

Please open a new issue with full details, this doesn't seem like the same bug to me.

Opened a new issue #174