Wrong body length ruins getResponse();

piher commented 12 years ago

Hi,

There is a mail in my mailbox that causes the getMessage(i,true); method to just stop. After digging into it, it appears that the last lines of the stream are :

) then

``` xm0 and finally 76 OK Success

whereas the last two lines expected are : ) and then xm076 OK Success

Since the code just tests if the last line contains xm076 OK Success by doing

          break; ```
when the last line is ```76 OK Success``` the code just rolls back to GetResponse() wich causes the readLine() method to just sit there waiting for an imaginary end of the stream. (Eventually it will return null after a very long timeout)

I'm guessing that the the body length answered by the server is too much by one charachter.
Here's the size response :
     ```* 73 FETCH (UID 106 RFC822.SIZE 1765 FLAGS (\Seen) BODY[HEADER] {1372}```

If that is, I would suggest something like this :

public MailMessage[] GetMessages(string start, string end, bool uid, bool headersonly, bool setseen) { CheckMailboxSelected(); IdlePause();

        string UID, HEADERS, SETSEEN;
        UID = HEADERS = SETSEEN = String.Empty;
        if (uid)
            UID = "UID ";
        if (headersonly)
            HEADERS = "HEADER";
        if (!setseen)
            SETSEEN = ".PEEK";
        string tag = GetTag();
        string command = tag + UID + "FETCH " + start + ":" + end + " (UID RFC822.SIZE FLAGS BODY" + SETSEEN + "[" + HEADERS + "])";
        string response;

        string endOfStreamTmp = String.Empty;
        string endOfStreamTag = tag + "OK Success";

        var x = new List<MailMessage>();
        SendCommand(command);
        while (true)
        {
            response = GetResponse();
            // 3) We apend the latest response to the potential beginning
            // if no potential beginning was found in 2) or if this is the first time the code gets here
            // then endOfStreamTmp is empty which means we are just testing the latest response
            endOfStreamTmp += response;
            if (endOfStreamTmp.Contains(endOfStreamTag))
            {
                break;
            }
            endOfStreamTmp = String.Empty;
            var m = rxGetBodyLength.Match(response);
            if (!m.Success)
                continue;
            int length = m.Groups[1].Value.ToInt();
            var mail = new MailMessage();
            var body = new StringBuilder();
            var buffer = new char[8192];
            int read;

            string bodyStr = string.Empty;

            while (length > 0)
            {
                read = _Reader.Read(buffer, 0, Math.Min(length, buffer.Length));
                body.Append(buffer, 0, read);
                length -= read;
            }
            bodyStr = body.ToString();

            // 1) We are going to check if the body doesn't end with the beginning of the endOfStreamTag
            bool tagSubStrNotFound = true;
            for (int i = 1; i <= endOfStreamTag.Length && tagSubStrNotFound; i++)
            {
                if (bodyStr.EndsWith(endOfStreamTag.Substring(0, i)))
                {
                    // 2) if it does, we store the matching end wich is potentially the beginning of the endOfStreamTag
                    endOfStreamTmp = bodyStr.Substring(bodyStr.Length - 1 - i);
                    tagSubStrNotFound = false;
                }
            }
            mail.Load(bodyStr, headersonly);
            var m2 = rxUID.Match(response);
            if (m2.Groups[1] != null)
                mail.Uid = m2.Groups[1].ToString();
            m2 = rxFlags.Match(response);
            if (m2.Groups[1] != null)
                mail.SetFlags(m2.Groups[1].ToString());
            m2 = rxSize.Match(response);
            if (m2.Groups[1] != null)
                mail.Size = Convert.ToInt32(m2.Groups[1].ToString());

            x.Add(mail);
        }

        IdleResume();
        return x.ToArray();
    }


What solves the problem but does not remove the extra chars from the body ( ```xm0```)

piher commented 12 years ago

I've found a second message with this problem this time the headersOnly method worked fine but the full message method received a body ending with the full xmXXX OK Success tag so the following getResponse() had nothing to get. The addition of the code above also solved this problem.

piher commented 12 years ago

I'm going nuts, now I have a mail for which the size answered by the server is so wrong that the code doesn't even get out of the while (length > 0) loop because it's waiting for the rest of the stream which is atually already finished :-/ I'm starting to think that i'm doing something wrong but I can't see where, has anyone encountered these problems yet ? Is it plausible that the gmail imap server sometimes returns a wrong size ?

andyedinborough commented 12 years ago

Is this still an issue?

piher commented 12 years ago

Yes, and I believe issue 42 is talking about the same thing. I will try to find an example to test on.

piher commented 12 years ago

Update : I downloaded the latest version and ran the exact same code with it on the exact same mailbox (except for a few new mails but i'm testing on old mails that did not change). The program crashes on the same message as it did before but the behavior is a little bit different : Since the xm005 OK Success tag is received while building the body, the next getresponse returns an empty string which causes the if (response[0] != '*' || !response.Contains("FETCH (")) line to throw an IndexOutOfRangeException ...

piher commented 12 years ago

Please take a look at https://github.com/piher/AENetTest It's a very light project emulating the problematic server response with a file I uploaded on a file hoster. It reproduces the exact error I described right above.

piher commented 12 years ago

Didn't see your latest commit. The solution proposed does avoid the exception but it does not solve the problem since the body of the email still ends with the tag... Sorry for the thousands of posts today... !

andyedinborough commented 12 years ago

:] No problem. I think of got a solution. So the issue is the mismatch between RFC822.SIZE and the size reported by BODY[...]--apparently a classic problem (how has humanity managed this long with problems like this!?).

Anyway. I'm testing something now. In effect, it will pick the smaller of these two sizes, and read that much body. If the value of _Reader.Peek() != ')', then it continues reading the larger size.

andyedinborough commented 12 years ago

Was this example message fetched when the code was reading the body line by line?

piher commented 12 years ago

I'm not sure I understood your question. To record the full stream I added some code just before the while(true) loop. I deleted it but i think it was : response = String.Empty; while(!response.Contains(tag+"OK"){ response+=_Reader.readLine() + Environment.NewLine; } File.WriteAllText(somePath,response);

andyedinborough commented 12 years ago

Ok, that makes sense. I'm wondering if the stream you recorded had different line endings. The way you've recorded it would normalize them according to your local environment--\r\n. The code now will preserve them, which I think is better. I need to find an example message where the body size is reported different from the RFC822 size and the line-endings have been preserved. I'll try sending myself an email from my mac--that should do it. :]

piher commented 12 years ago

It's official, I hate emails. I tried out the new library on the particular mailbox that was causing my issues and guess what... Still can't download the same message ! In the header the RFC size is 2472, and the body size is 2522. At first the loop reads 2472 chars like it's supposed to. But as we saw, this number is not right and there are still a few chars of the body that where not read. So then the code updates the size to 2522-2572 = 50 and reads the remaining 50 chars. But problem is that these chars include the success tag and are followed by a new line char ! So the code resets the size to 50 and tries again but there's nothing else to read so it waits indefinitly.... :-/

andyedinborough commented 12 years ago

Yeah... https://twitter.com/andyedinborough/status/175627996554723329. :]. I fixed that issue with it retrying continually, but that doesn't help when body size and rfc822 size are different and both are wrong. I've tried several things and can't get anything to match up to this one rather odd email I found. I'll be researching this more. Let me know if you find anything.

Sent from my iPhone

On Mar 4, 2012, at 7:57 AM, "piher" reply@reply.github.com wrote:

It's official, I hate emails. I tried out the new library on the particular mailbox that was causing my issues and guess what... Still can't download the same message ! In the header the RFC size is 2472, and the body size is 2522. At first the loop reads 2472 chars like it's supposed to. But as we saw, this number is not right and there are still a few chars of the body that where not read. So then the code updates the size to 2522-2572 = 50 and reads the remaining 50 chars. But problem is that these chars include the success tag and are followed by a new line char ! So the code resets the size to 50 and tries again but there's nothing else to read so it waits indefinitly.... :-/

Reply to this email directly or view it on GitHub: https://github.com/andyedinborough/aenetmail/issues/34#issuecomment-4309861

piher commented 12 years ago

Yeah it's really weird. And although this situation is not supposed to happen (if servers don't even know what they are sending us how will The Internet survive I ask you ? ), it would be great if the code could at least detect there has been a mistake and do something rather than wait... I'm thinking we could read the stream line by line while the length we've read is inferior to the bodysize, store the lines in an array, then check if the next line to read is ")". If it is then we join the strings together and that's our body. If not, we know the size was wrong and we have to find what to do with that (i have no idea so far). But i'm scared that using a string array would really slow down code...

andyedinborough commented 12 years ago

I'm planning to do a bunch of testing and analysis for this. A string array wouldn't really be any more or less efficient than it is now because the entire email is loaded into memory before parsing (although I do have a plan and method to not have to do that in the future).

Unfortunately, we can't rely on a single line with just ')' either because there is no escaping for that in the email body.

One of the body sizes has to be correct. I'll be testing various ways of computing size, including different ways of counting line endings, whether quoted printable characters are counted as 1 or 3 bytes, and whether the encoding is causing the text length is different from byte length.

Sent from my iPhone

On Mar 4, 2012, at 3:11 PM, "piher" reply@reply.github.com wrote:

Yeah it's really weird. And although this situation is not supposed to happen (if servers don't even know what they are sending us how will The Internet survive I ask you ? ), it would be great if the code could at least detect there has been a mistake and do something rather than wait... I'm thinking we could read the stream line by line while the length we've read is inferior to the bodysize, store the lines in an array, then check if the next line to read is ")". If it is then we join the strings together and that's our body. If not, we know the size was wrong and we have to find what to do with that (i have no idea so far). But i'm scared that using a string array would really slow down code...

Reply to this email directly or view it on GitHub: https://github.com/andyedinborough/aenetmail/issues/34#issuecomment-4313348

piher commented 12 years ago

[EDIT : ] Better with the attached file : http://cjoint.com/12ma/BCfj3Dr0tDY_annoying_msgs.zip

Hi, I hadn't realised how big the problem is... I tried downloading my mailbox skipping the famous message, but then I encountered another one a few messages later. So i skipped it, but it kept going. Like that, I found 4 msgs with the same properties in only 80 messages ! I've recorded them for further testing.

Recording procedure to ensure original line endings :

                StringBuilder sb = new StringBuilder();
                String resp = string.Empty;
                while (!resp.Contains(tag + "OK Success"))
                {
                    int read = _Reader.Read(buff, 0, 8192);
                    sb.Append(buff, 0, read);
                    resp = sb.ToString();
                }
                System.IO.File.WriteAllText(somePath, resp);
                _Reader.Dispose();
                Environment.Exit(0);

Then for confidentiality I did that (it's VB):


Dim sb As New StringBuilder(theStreamText.Length)
        For Each c As Char In theStreamText.ToCharArray
            If (Char.IsLetterOrDigit(c)) Then
                sb.Append("x")
            Else
                sb.Append(c, 1)
            End If
        Next

Then built the string and cautiously pasted back into it the * Fetch and xm OK Success lines.

Here's a zip file containing four records. For each buggy msg, I've recorded the previous, the msg, and the next one. E.g. : msg 26 was buggy, I recorded the response on getmessages(25,27,false) and saved it into "25-27.txt"

Hope this helps !

By the way, It's surprising this issue wasn't brought up before ! This makes me think that I'm french and therefor I may receive msgs containing different characters, but that's just a thought.

andyedinborough commented 12 years ago

You're French!? Well, there's your problem! :]] j/k

So, I just downloaded 1000 messages from GMail successfully. As I suspected, multi-byte characters are the culprit, text length != byte length. I removed the _Reader and so far everything matches up beautifully.

The only potential issue I see here still is that I'm only supporting CRLF and LF line-endings for communicating with the server. The message can be anything, but the code requires that responses from the server are terminated with LF (CR is just ignored). From what I've read, the RFC says the server should terminate responses with CRLF so this shouldn't be an issue... we'll just have to do some testing.

piher commented 12 years ago

Wow this is great ! I just downloaded 300 mails in a row on the same mailbox, and this includes the previous ones ! I feel like you just saved humanity so thanks a lot :-D

andyedinborough / aenetmail

Wrong body length ruins getResponse(); #34