Closed dvcrn closed 1 year ago
Only UTF-8 is supported for writing. Creating new non-UTF-8 messages is doing a disservice to the ecosystem.
While I understand that, it's sadly often not up to us to decide that, especially when integrating with existing services
Any quick recommendation to re-encode an entire message into UTF8? Was hoping WriteTo would just do that because reading is already decoding all the charsets
t, _, err := entity.Header.ContentType()
if err != nil {
fmt.Println("error reading content type", err.Error())
return nil
}
entity.Header.SetContentType(t, map[string]string{"charset": "utf-8"})
In my quick test that resulted in a lot of chaos when those emails were rendered
Hm, yeah, go-message should encode everything as UTF-8. What kind of chaos were you running into?
I'm using the library in combination with go-imap. Adding SetContentType is getting rid of the "Unhandled charset" error, but causing emails to get rendered like this:
Full code:
func RewriteHeader(emailBytes []byte, targetHeader string, targetValue string) ([]byte, error) {
targetHeader = strings.TrimSpace(targetHeader)
targetValue = strings.TrimSpace(targetValue)
entity, err := message.Read(bytes.NewReader(emailBytes))
if err != nil {
fmt.Println("error reading message into entity", err.Error())
return nil, err
}
entity.Header.Set(targetHeader, targetValue)
// reconstruct email with rewritten headers
out := bytes.Buffer{}
//t, _, err := entity.Header.ContentType()
//if err != nil {
// fmt.Println("error reading content type", err.Error())
// return nil, err
//}
//entity.Header.SetContentType(t, map[string]string{"charset": "utf-8"})
if err := entity.WriteTo(&out); err != nil {
fmt.Println("error writing message into entity", err.Error())
return nil, err
}
return out.Bytes(), nil
}
I didn't have time to properly debug this yet, may be something obvious as well
I don't think it's correct to set the Content-Type
on the whole entity: it's a multipart message.
Although it's a bit surprising to me, WriteTo
will not decode charsets for nested parts... So I wonder what's going on here. Maybe you're hitting the "invalid charset" issue only for single-part messages?
A quick fix would be to check strings.HasPrefix(t, "multipart/")
before setting the utf-8 charset. Also I'd recommend retaining the params returned by ContentType()
, only overwriting the charset
but keeping the rest of the params as-is.
But if you just want to set a header value, you don't need any of go-message's charset stuff: you can just use the lower-level go-message/textproto package instead.
Thanks, let me give that a try and report back!
But if you just want to set a header value, you don't need any of go-message's charset stuff: you can just use the lower-level go-message/textproto package instead.
Yes, but the charset stuff is what makes it so nice to extract some headers, act on them and conditionally overwrite, otherwise I would need to decode the headers manually based on the charset that they're in. go-message does all of that for me 😄
If you are only interested in decoding header fields, and can treat the body as an opaque blob, then you can use go-message/textproto to read the header, then wrap it with message.Header
to get the charset-enabled functions.
Forgot to update - using textproto directly, then wrapping with message.Header works perfectly for extracting stuff, thanks for the tip. Skipping content-type headers that start with multipart/
stopped causing rendering weirdness in email clients, but go-message refuses to write messages with those into a non-utf8 charset as described earlier
eg: unhandled charset "windows-1252" header: multipart/alternative map[boundary:----=_NextPart_033_6113F784.6113F784 charset:windows-1252]
, where the charset is part of the multipart
My final solution was more or less what you suggested - only touch the headers with lowlevel textproto and don't do anything with the body. Something like this:
hdr, err := textproto.ReadHeader(bufio.NewReader(bytes.NewBuffer(emailBytes)))
if err != nil {
panic(err)
}
wrappedHdr := message.Header{Header: hdr}
wrappedHdr.Set(targetHeader, targetValue)
mail, err := mail.ReadMessage(bytes.NewReader(emailBytes))
if err != nil {
panic(err)
}
out := bytes.Buffer{}
if err := textproto.WriteHeader(&out, wrappedHdr.Header); err != nil {
panic(err)
}
if _, err := io.Copy(&out, mail.Body); err != nil {
panic(err)
}
Note, charset
is meaningless for multipart messages.
Hi,
first of all thanks for maintaining this! It takes away a huge amount of headache with parsing messages. I keep getting logs messages about unhandled charsets:
Importing
at the beginning of the file. Also tried manually specifying the ISO-2022-JP charset (though that's already done with the same encoding package in the packages
encoding
uses under the hood):ISO2022JP is not a small cryptic charset and should be supported out of the box in the charset package, eg : https://cs.opensource.google/go/x/text/+/refs/tags/v0.10.0:encoding/ianaindex/ianaindex.go;l=206-209
Is this supposed to work as is, or does
writeTo
not consider the charset the same wayRead()
does?