andyedinborough / aenetmail

C# POP/IMAP Mail Client
370 stars 153 forks source link

Quoted printable subject decoded wrong #66

Closed meehi closed 12 years ago

meehi commented 12 years ago

Received a mail with the following subject: H=C3=BAsv=C3=A9ti=20=C3=9Cnnepeket!

Related to this site: http://www.convertstring.com/EncodeDecode/QuotedPrintableDecode it should look like this: Húsvéti Ünnepeket!

The function in Utilities.cs internal static string DecodeWords(string encodedWords) resulting the following instead: Húsvéti ??nnepeket!

So this function does not recognize the Ü character.

meehi commented 12 years ago

Here is the complete subject value: "=?UTF-8?Q?H=C3=BAsv=C3=A9ti=20=C3=9Cnnepeket!?="

meehi commented 12 years ago

I have fixed this issue with the following code block replacement:

internal static string DecodeWords(string encodedWords)
        {
            if (string.IsNullOrEmpty(encodedWords))
                return string.Empty;

            string decodedWords = encodedWords;

            // Notice that RFC2231 redefines the BNF to
            // encoded-word := "=?" charset ["*" language] "?" encoded-text "?="
            // but no usage of this BNF have been spotted yet. It is here to
            // ease debugging if such a case is discovered.

            // This is the regex that should fit the BNF
            // RFC Says that NO WHITESPACE is allowed in this encoding, but there are examples
            // where whitespace is there, and therefore this regex allows for such.
            const string strRegEx = @"\=\?(?<Charset>\S+?)\?(?<Encoding>\w)\?(?<Content>.+?)\?\=";
            // \w   Matches any word character including underscore. Equivalent to "[A-Za-z0-9_]".
            // \S   Matches any nonwhite space character. Equivalent to "[^ \f\n\r\t\v]".
            // +?   non-gready equivalent to +
            // (?<NAME>REGEX) is a named group with name NAME and regular expression REGEX

            var matches = Regex.Matches(encodedWords, strRegEx);
            foreach (Match match in matches)
            {
                // If this match was not a success, we should not use it
                if (!match.Success)
                    continue;

                string fullMatchValue = match.Value;
                //new version start
                System.Net.Mail.Attachment attachment = System.Net.Mail.Attachment.CreateAttachmentFromString("", fullMatchValue);
                string decodedText = attachment.Name;
                //new version end

                /*remove start
                string encodedText = match.Groups["Content"].Value;
                string encoding = match.Groups["Encoding"].Value;
                string charset = match.Groups["Charset"].Value;

                // Get the encoding which corrosponds to the character set
                Encoding charsetEncoding = ParseCharsetToEncoding(charset);

                // Store decoded text here when done
                string decodedText;

                // Encoding may also be written in lowercase
                switch (encoding.ToUpperInvariant())
                {
                    // RFC:
                    // The "B" encoding is identical to the "BASE64" 
                    // encoding defined by RFC 2045.
                    // http://tools.ietf.org/html/rfc2045#section-6.8
                    case "B":
                        decodedText = DecodeBase64(encodedText, charsetEncoding);
                        break;

                    // RFC:
                    // The "Q" encoding is similar to the "Quoted-Printable" content-
                    // transfer-encoding defined in RFC 2045.
                    // There are more details to this. Please check
                    // http://tools.ietf.org/html/rfc2047#section-4.2
                    // 
                    case "Q":
                        decodedText = DecodeQuotedPrintable(encodedText, charsetEncoding);
                        break;

                    default:
                        throw new ArgumentException("The encoding " + encoding + " was not recognized");
                }
                remove end*/

                // Repalce our encoded value with our decoded value
                decodedWords = decodedWords.Replace(fullMatchValue, decodedText);
            }

            return decodedWords;
        }
andyedinborough commented 12 years ago

Thanks for submitting this. The function that decodes quoted printable was fairly messy anyway. I've rewritten the bulk of it, removing the dependency on regular expressions.

meehi commented 12 years ago

Thanks, that's working.

meehi commented 12 years ago

The decoder working just fine but it's removing whitecharacters now like \r\n. May I reopen this issue? :)

andyedinborough commented 12 years ago

Sorry about that, there were a couple lines I forgot to remove. Working now?

meehi commented 12 years ago

Yes, thanks. Now it's perfect.