Closed piher closed 12 years ago
I've found a second message with this problem
this time the headersOnly method worked fine but the full message method received a body ending with the full xmXXX OK Success
tag so the following getResponse() had nothing to get.
The addition of the code above also solved this problem.
I'm going nuts, now I have a mail for which the size answered by the server is so wrong that the code doesn't even get out of the while (length > 0)
loop because it's waiting for the rest of the stream which is atually already finished :-/
I'm starting to think that i'm doing something wrong but I can't see where, has anyone encountered these problems yet ?
Is it plausible that the gmail imap server sometimes returns a wrong size ?
Is this still an issue?
Yes, and I believe issue 42 is talking about the same thing. I will try to find an example to test on.
Update : I downloaded the latest version and ran the exact same code with it on the exact same mailbox (except for a few new mails but i'm testing on old mails that did not change).
The program crashes on the same message as it did before but the behavior is a little bit different :
Since the xm005 OK Success
tag is received while building the body, the next getresponse returns an empty string which causes the if (response[0] != '*' || !response.Contains("FETCH ("))
line to throw an IndexOutOfRangeException ...
Please take a look at https://github.com/piher/AENetTest It's a very light project emulating the problematic server response with a file I uploaded on a file hoster. It reproduces the exact error I described right above.
Didn't see your latest commit. The solution proposed does avoid the exception but it does not solve the problem since the body of the email still ends with the tag... Sorry for the thousands of posts today... !
:] No problem. I think of got a solution. So the issue is the mismatch between RFC822.SIZE and the size reported by BODY[...]--apparently a classic problem (how has humanity managed this long with problems like this!?).
Anyway. I'm testing something now. In effect, it will pick the smaller of these two sizes, and read that much body. If the value of _Reader.Peek() != ')'
, then it continues reading the larger size.
Was this example message fetched when the code was reading the body line by line?
I'm not sure I understood your question.
To record the full stream I added some code just before the while(true) loop. I deleted it but i think it was :
response = String.Empty; while(!response.Contains(tag+"OK"){ response+=_Reader.readLine() + Environment.NewLine; } File.WriteAllText(somePath,response);
Ok, that makes sense. I'm wondering if the stream you recorded had different line endings. The way you've recorded it would normalize them according to your local environment--\r\n. The code now will preserve them, which I think is better. I need to find an example message where the body size is reported different from the RFC822 size and the line-endings have been preserved. I'll try sending myself an email from my mac--that should do it. :]
It's official, I hate emails. I tried out the new library on the particular mailbox that was causing my issues and guess what... Still can't download the same message ! In the header the RFC size is 2472, and the body size is 2522. At first the loop reads 2472 chars like it's supposed to. But as we saw, this number is not right and there are still a few chars of the body that where not read. So then the code updates the size to 2522-2572 = 50 and reads the remaining 50 chars. But problem is that these chars include the success tag and are followed by a new line char ! So the code resets the size to 50 and tries again but there's nothing else to read so it waits indefinitly.... :-/
Yeah... https://twitter.com/andyedinborough/status/175627996554723329. :]. I fixed that issue with it retrying continually, but that doesn't help when body size and rfc822 size are different and both are wrong. I've tried several things and can't get anything to match up to this one rather odd email I found. I'll be researching this more. Let me know if you find anything.
Sent from my iPhone
On Mar 4, 2012, at 7:57 AM, "piher" reply@reply.github.com wrote:
It's official, I hate emails. I tried out the new library on the particular mailbox that was causing my issues and guess what... Still can't download the same message ! In the header the RFC size is 2472, and the body size is 2522. At first the loop reads 2472 chars like it's supposed to. But as we saw, this number is not right and there are still a few chars of the body that where not read. So then the code updates the size to 2522-2572 = 50 and reads the remaining 50 chars. But problem is that these chars include the success tag and are followed by a new line char ! So the code resets the size to 50 and tries again but there's nothing else to read so it waits indefinitly.... :-/
Reply to this email directly or view it on GitHub: https://github.com/andyedinborough/aenetmail/issues/34#issuecomment-4309861
Yeah it's really weird. And although this situation is not supposed to happen (if servers don't even know what they are sending us how will The Internet survive I ask you ? ), it would be great if the code could at least detect there has been a mistake and do something rather than wait... I'm thinking we could read the stream line by line while the length we've read is inferior to the bodysize, store the lines in an array, then check if the next line to read is ")". If it is then we join the strings together and that's our body. If not, we know the size was wrong and we have to find what to do with that (i have no idea so far). But i'm scared that using a string array would really slow down code...
I'm planning to do a bunch of testing and analysis for this. A string array wouldn't really be any more or less efficient than it is now because the entire email is loaded into memory before parsing (although I do have a plan and method to not have to do that in the future).
Unfortunately, we can't rely on a single line with just ')' either because there is no escaping for that in the email body.
One of the body sizes has to be correct. I'll be testing various ways of computing size, including different ways of counting line endings, whether quoted printable characters are counted as 1 or 3 bytes, and whether the encoding is causing the text length is different from byte length.
Sent from my iPhone
On Mar 4, 2012, at 3:11 PM, "piher" reply@reply.github.com wrote:
Yeah it's really weird. And although this situation is not supposed to happen (if servers don't even know what they are sending us how will The Internet survive I ask you ? ), it would be great if the code could at least detect there has been a mistake and do something rather than wait... I'm thinking we could read the stream line by line while the length we've read is inferior to the bodysize, store the lines in an array, then check if the next line to read is ")". If it is then we join the strings together and that's our body. If not, we know the size was wrong and we have to find what to do with that (i have no idea so far). But i'm scared that using a string array would really slow down code...
Reply to this email directly or view it on GitHub: https://github.com/andyedinborough/aenetmail/issues/34#issuecomment-4313348
[EDIT : ] Better with the attached file : http://cjoint.com/12ma/BCfj3Dr0tDY_annoying_msgs.zip
Hi, I hadn't realised how big the problem is... I tried downloading my mailbox skipping the famous message, but then I encountered another one a few messages later. So i skipped it, but it kept going. Like that, I found 4 msgs with the same properties in only 80 messages ! I've recorded them for further testing.
Recording procedure to ensure original line endings :
StringBuilder sb = new StringBuilder();
String resp = string.Empty;
while (!resp.Contains(tag + "OK Success"))
{
int read = _Reader.Read(buff, 0, 8192);
sb.Append(buff, 0, read);
resp = sb.ToString();
}
System.IO.File.WriteAllText(somePath, resp);
_Reader.Dispose();
Environment.Exit(0);
Then for confidentiality I did that (it's VB):
Dim sb As New StringBuilder(theStreamText.Length)
For Each c As Char In theStreamText.ToCharArray
If (Char.IsLetterOrDigit(c)) Then
sb.Append("x")
Else
sb.Append(c, 1)
End If
Next
Then built the string and cautiously pasted back into it the * Fetch and xm OK Success lines.
Here's a zip file containing four records. For each buggy msg, I've recorded the previous, the msg, and the next one. E.g. : msg 26 was buggy, I recorded the response on getmessages(25,27,false) and saved it into "25-27.txt"
Hope this helps !
By the way, It's surprising this issue wasn't brought up before ! This makes me think that I'm french and therefor I may receive msgs containing different characters, but that's just a thought.
You're French!? Well, there's your problem! :]] j/k
So, I just downloaded 1000 messages from GMail successfully. As I suspected, multi-byte characters are the culprit, text length != byte length. I removed the _Reader
and so far everything matches up beautifully.
The only potential issue I see here still is that I'm only supporting CRLF and LF line-endings for communicating with the server. The message can be anything, but the code requires that responses from the server are terminated with LF (CR is just ignored). From what I've read, the RFC says the server should terminate responses with CRLF so this shouldn't be an issue... we'll just have to do some testing.
Wow this is great ! I just downloaded 300 mails in a row on the same mailbox, and this includes the previous ones ! I feel like you just saved humanity so thanks a lot :-D
Hi,
There is a mail in my mailbox that causes the getMessage(i,true); method to just stop. After digging into it, it appears that the last lines of the stream are :
)
then``` xm0
and finally76 OK Success
whereas the last two lines expected are :
)
and thenxm076 OK Success
Since the code just tests if the last line contains
xm076 OK Success
by doingpublic MailMessage[] GetMessages(string start, string end, bool uid, bool headersonly, bool setseen) { CheckMailboxSelected(); IdlePause();