Closed rickhall closed 6 years ago
Local test results:
$ npm run test
> emailjs-imap-client@2.0.8 test /Users/rickhall/Projects/emailjs-imap-client
> grunt
[...output trimmed...]
182 passing (2s)
Done, without errors.
Just a comment on my patch. I decided to simply create a single buffer out of all incoming data, since I think it makes things simpler. The only thing I didn't really like is that to do so, I had to take the incoming data and temporarily put it into an Uint8Array so that I could append it to the new complete buffer.
I couldn't find a way to append it directly into the new complete buffer, so that is sort of a waste of an allocation, but other than that it is fairly clean.
Also, I don't claim to be any kind of expert in the IMAP protocol, I took my lead from how the existing code was trying to chunk server response lines and just tried to make it smarter based on my [limited] understanding of what the content of the server response should be.
The wasted allocation and unneccressary copy operations of incrementally creating the buffer like that is rather problematic. Let's say the total size is 1MB and the buffers 4k, then the intermediate wasted allocation is 128MB. For 10MB attachment, that's 12G of allocation and unnecessary copy and GC.
I was thinking about that some more. I think I could do it in a similar fashion as the old approach, where I use an array of buffers.
I could look into that and open a separate pull request to enhance this one. What do you think? Or I guess it should be possible to generate a new, complete pull request.
You got me thinking this could be structured more like a state machine, is that the direction you were thinking of taking this? I'm going to test an idea of a refactor, but I think your original pull request #148 is satisfactory to close this issue.
Well, there is a state machine in there, but it has a little bit of parser sprinkled in there too, since it has to understand escaping and whatnot. So, it's a little bit of a hyrbid, but possibly could be made more pure. Certainly seems to work better for me too.
I have another version mostly working where I don't concat the buffers into a single buffer, but I need to resolve one test case failure and then test it on real data.
Yes, #147. Feeding tokens to the parser would be best: it gets rid of double parsing and takes less memory than creating a buffer with the whole command.
I agree that using my basic approach to directly parse the data as it comes in would be smarter than breaking it into chunks and parsing the chunks, since breaking it into chunks requires knowledge of how to parse it anyway. Not only would that help resource consumption, it would be a little more efficient too, since the current approach has to start over from scratch each time the buffer doesn't contain enough information for it to continue.
Regardless, I am going to submit a new pull request, which is an alternate version of this pull request, which does not do extra copying of the buffers.
Ok, apparently github automatically adds my new commit to this pull request rather than letting me create another. Fine.
Despite what the CI failure says, all tests still pass for me locally and I have used the modified algorithm to sync 175k messages (minus attachments) without issue.
I don't know if you are going to work on a refactor or not, but if that isn't going to happen soon, I think it would be helpful to apply this PR in the meantime, since the original code is fairly broken as the two new test cases demonstrate. Thanks.
Any progress on this?
I pushed cb477365e9eea57f2d6588ec827b93fe2fff7b03 where _iterateIncomingBuffer handles the buffer explicitly as a state machine.
I also commited your zero-length literal test with it as that fixed #147. The other test wasn't set up correctly - ran appendIncomingBuffer twice in a row without calling iterator.next() in between - so I skipped it.
First of all, thanks!
Regarding the second test case, it is not clear to me why appendIncomingBuffer needs to be interleaved with iterator.next(), since it is just setting up a data structure (i.e., an array of two buffers) which is a legal structure.
Regardless, that test is very important since it demonstrates the most egregious bug that was fixed by the pull request, so I'll try to reformulate it. I'll open a separate pull request for that, ok?
If you look at _onData you see that every time a buffer is pushed to this._incomingBuffers, it is processed in _iterateIncomingBuffer. Therefore the test doesn't describe a situation that can happen in real code. The test passes when it looks like this:
it('should ignore incomplete literals with line feeds', () => {
appendIncomingBuffer('* 1 FETCH (UID {1024}\r\nThis is a partial literal.');
var iterator1 = client._iterateIncomingBuffer();
expect(iterator1.next().value).to.be.undefined;
appendIncomingBuffer('It should return undefined\r\nsince it is not complete.');
var iterator2 = client._iterateIncomingBuffer();
expect(iterator2.next().value).to.be.undefined;
});
Ok, I see what you are saying, but then I guess I don't know how to create a test case that fails then. If you look at #143, there is another more complicated test case in there, but it is still formulated incorrectly because I don't iterate after each append.
However, those three strings correspond to the incomingBuffers array that I saw in the wild when the parsing failed. Not sure if that helps, but that's all I have for now.
I see the same with those three strings when iterating after each push. Is it possible that it was something else that caused the issue?
Yeah, I agree, if I iterate after each it doesn't give me an error either. But my point was, those strings represent the three entries in incomingBuffers when the parse failure occurred. I could attempt to go back and recreate with the old version, but I am fairly certain that the issue I described is what was happening (i.e., it was seeing a linefeed/carriage return in an incomplete buffer and thinking it was terminating the line).
Regardless, I'm testing the new version of my 175k messages right now and will see how that goes and will potentially go back and see if I can further pin down the error on the old version. Thanks.
BTW, are you planning on doing a release?
Just a follow up on the other test case. I am still not sure how to best recreate the issue, especially since it happens randomly depending on how data is received from the socket. However, I tried to dig up some more information in hopes that you might be able to figure out how to define a test case.
So, with some debug statements added in emailjs-imap-client-imap.js, I see this situation:
In a call to onData() before calling iterateIncomingBuffer() I see the following entries in _incomingBuffers:
0: "* 174642 FETCH (UID 635725 BODY[2.1] {3019}\r\n<div style=3D"font: normal 13px Arial; color:rgb(0, 0, 0);"><br><br class=\r\n=3D""><div style=3D"font: normal 13px Arial; color:rgb(0, 0, 0);"><span><=\r\nimg apple-inline=3D"yes" id=3D"BE0BE358-782C-45D4-8E5A-F145C1A8E30A" src=3D=\r\n"cid:1456510376" _djrealurl=3D"" class=3D""></span><br class=3D""><div cl=\r\nass=3D"">Geertjan Wielenga | Principal Product Manager<br class=\r\n=3D"">Phone: +31620320056 | <br class=3D"">Oracle Dev=\r\neloper Tools<br class=3D""><br class=3D"">ORACLE Netherlands | Herto=\r\ngswetering 163-167 | 3543 AS Utrecht | Netherlands<br class=3D"=\r\n"><span><img apple-inline=3D"yes" id=3D"4E0B2199-761A-43F7-97C3-6B3BB3943=\r\n344" src=3D"cid:868247369" _djrealurl=3D"" class=3D""></span><br class=3D=\r\n""><br class=3D"">Oracle is committed to developing practices and product=\r\ns that help protect the environment<br class=3D""></div><br>=3D=3D=3D=3D=3D=\r\n=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br>thank you Geertjan.<br><br>I'll"
1: " take a =\r\nlook there, but I am beginning with OpenShift, <br><span id=3D"result_box=\r\n" class=3D"" tabindex=3D"-1" lang=3D"en"><span class=3D"">for this reason=\r\n do not expect much from me at the moment :-)<br><br><br>regards<br><br>A=\r\nngelo<br><br>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br><br></span></span><span=\r\n id=3D"msgText_new_message_draft1508485647979_b2acb486-d1e9-1c11-9848-294=\r\n836415103" dir=3D"ltr">Hi Geertjan,<br></span><br><span id=3D"resul=\r\nt_box" class=3D"" tabindex=3D"-1" lang=3D"en"><span class=3D"">sorry to d=\r\nisturb you, but having take a look to your code <br><br>I found there a p=\r\nroblem that I can not solve :<br>----------<br>In the package <org.net=\r\nbeans.modules.cloud.openshift.serverplugin>, the class<br><OpenShif=\r\ntServerInstanceImplementation><br>has an annotation : <br><br>@S=\r\ntaticResource<br> " private static final String RUNNING_ICON&=\r\nnbsp; =3D "org/netbeans/modules/cloud/OpenShift/serverplugin/resources/ru=\r\nnning.png"; // NOI18N "<br><br><span style=3D"text-decoration: underline;=\r\n"><span style=3D"font-weight: bold;">For that line of code I give the err=\r\nor:<br>cannot find resource org/netbeans/modules/cloud/OpenShift/serverpl=\r\nugin/resources/waiting.png<br></span></span>----<br><br></span></span><sp=\r\nan id="
2: "3D"result_box" class=3D"" tabindex=3D"-1" lang=3D"en"><span class=3D=\r\n""><span id=3D"result_box" class=3D"short_text" tabindex=3D"-1" lang=3D"e=\r\nn"><span class=3D""><span id=3D"result_box" class=3D"" tabindex=3D"-1" la=\r\nng=3D"en"><span class=3D"">I was not able to</span></span> find any infor=\r\nmation about the annotation</span></span> : </span></span><span id=3D"res=\r\nult_box" class=3D"" tabindex=3D"-1" lang=3D"en"><span class=3D""><span id=\r\n=3D"result_box" class=3D"" tabindex=3D"-1" lang=3D"en"><span class=3D"">@=\r\nStaticResource<br><br>could you give me some short reference to understan=\r\nd the reason of the error ?<br><br>thank you <br><br>Angelo<br><br><=\r\nbr></span></span> </span></span></div>=0A=0A</div> BODY[1] {14"
3: "98}\r\n=0A=0A=0AGeertjan Wielenga | Principal=C2=A0Product=C2=A0Manager=0APhone:=\r\n=C2=A0+31620320056=C2=A0|=C2=A0=0AOracle=C2=A0Developer Tools=0A=0AORACLE=\r\n Netherlands |=C2=A0Hertogswetering 163-167 |=C2=A03543 AS=C2=A0Utrecht |=\r\n Netherlands=0A=0A=0AOracle is committed to developing practices and prod=\r\nucts that help protect the environment=0A=0A=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=\r\n=3D=3D=3D=3D=3D=0Athank you Geertjan.=0A=0AI'll take a look there, but I =\r\nam beginning with OpenShift, =0Afor this reason do not expect much from m=\r\ne at the moment :-)=0A=0A=0Aregards=0A=0AAngelo=0A=0A=3D=3D=3D=3D=3D=3D=3D=\r\n=3D=3D=3D=3D=0A=0AHi=C2=A0 Geertjan,=0A=0Asorry to disturb you, but havin=\r\ng take a look to your code =0A=0AI found there a problem that I can not s=\r\nolve :=0A----------=0AIn the package <org.netbeans.modules.cloud.openshif=\r\nt.serverplugin>, the class=0A<OpenShiftServerInstanceImplementation>=0Aha=\r\ns an annotation :=C2=A0 =0A=0A@StaticResource=0A=C2=A0 "=C2=A0 private st=\r\natic final String RUNNING_ICON=C2=A0 =3D "org/netbeans/modules/cloud/Open=\r\nShift/serverplugin/resources/running.png"; // NOI18N "=0A=0AFor that line=\r\n of code I give the error:=0Acannot find resource org/netbeans/modules/cl=\r\noud/OpenShift/serverplugin/resources/waiting.png=0A----=0A=0AI was not ab=\r\nle to find"
How exactly it gets into this state, I'm not sure but you can see by scrolling to the end of the last buffer that it is not a complete command. This is the type of incomingBuffers I was trying to create with my test case. Again, the issue here is that these errors are dependent on how the data arrives.
The command that was retrieved then from inside _parseIncomingCommands() via _iterateIncomingBuffer() was:
command = "* 174642 FETCH (UID 635725 BODY[2.1] {3019}\r\n<div style=3D"font: normal 13px Arial; color:rgb(0, 0, 0);"><br><br class=\r\n=3D""><div style=3D"font: normal 13px Arial; color:rgb(0, 0, 0);"><span><=\r\nimg apple-inline=3D"yes" id=3D"BE0BE358-782C-45D4-8E5A-F145C1A8E30A" src=3D=\r\n"cid:1456510376" _djrealurl=3D"" class=3D""></span><br class=3D""><div cl=\r\nass=3D"">Geertjan Wielenga | Principal Product Manager<br class=\r\n=3D"">Phone: +31620320056 | <br class=3D"">Oracle Dev=\r\neloper Tools<br class=3D""><br class=3D"">ORACLE Netherlands | Herto=\r\ngswetering 163-167 | 3543 AS Utrecht | Netherlands<br class=3D"=\r\n"><span><img apple-inline=3D"yes" id=3D"4E0B2199-761A-43F7-97C3-6B3BB3943=\r\n344" src=3D"cid:868247369" _djrealurl=3D"" class=3D""></span><br class=3D=\r\n""><br class=3D"">Oracle is committed to developing practices and product=\r\ns that help protect the environment<br class=3D""></div><br>=3D=3D=3D=3D=3D=\r\n=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br>thank you Geertjan.<br><br>I'll take a =\r\nlook there, but I am beginning with OpenShift, <br><span id=3D"result_box=\r\n" class=3D"" tabindex=3D"-1" lang=3D"en"><span class=3D"">for this reason=\r\n do not expect much from me at the moment :-)<br><br><br>regards<br><br>A=\r\nngelo<br><br>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br><br></span></span><span=\r\n id=3D"msgText_new_message_draft1508485647979_b2acb486-d1e9-1c11-9848-294=\r\n836415103" dir=3D"ltr">Hi Geertjan,<br></span><br><span id=3D"resul=\r\nt_box" class=3D"" tabindex=3D"-1" lang=3D"en"><span class=3D"">sorry to d=\r\nisturb you, but having take a look to your code <br><br>I found there a p=\r\nroblem that I can not solve :<br>----------<br>In the package <org.net=\r\nbeans.modules.cloud.openshift.serverplugin>, the class<br><OpenShif=\r\ntServerInstanceImplementation><br>has an annotation : <br><br>@S=\r\ntaticResource<br> " private static final String RUNNING_ICON&=\r\nnbsp; =3D "org/netbeans/modules/cloud/OpenShift/serverplugin/resources/ru=\r\nnning.png"; // NOI18N "<br><br><span style=3D"text-decoration: underline;=\r\n"><span style=3D"font-weight: bold;">For that line of code I give the err=\r\nor:<br>cannot find resource org/netbeans/modules/cloud/OpenShift/serverpl=\r\nugin/resources/waiting.png<br></span></span>----<br><br></span></span><sp=\r\nan id=3D"result_box" class=3D"" tabindex=3D"-1" lang=3D"en"><span class=3D="
This results in a parse error in _parseIncomingCommands(). This is a fairly important bug to have a test case for. I understand the argument that iterate is called in onData each time data is received, but regardless the incomingBuffers look like what I have above and resulted in an exception.
By calling appendIncomingBuffer without interleaving iterates, I'm able to recreate this situation in the test case and my patch worked correctly in that case where the original code did not.
The problem is already there in the first entry of _incomingBuffers, {3019}\n and not {3019}\r\n as the standard says it should be https://tools.ietf.org/html/rfc3501#page-85. Are the higher level emailjs libraries forgiving to this this kind of IMAP commands and parse these kind of literals?
Sorry, that might have been a mistake in my translating the buffers to strings. In this case, the email server is Gmail, so I'm sure it's doing the right thing. My simplified example in the submitted test case demonstrated the same issue, but was properly formatted.
One thing I should point out, after some tests, I have not reproduced this error with master. However, if I add my submitted test case, then master does fail, but my patch didn't fail in that case, for what it's worth.
So, it is possible that your patch potentially prevents it from getting into that state in the wild. I could easily reproduce this prior to your patch.
I edited my previous comment to fix the \r\n issues.
Rewrite _iterateIncomingBuffer() to parse from the beginning of the buffer to the end in an attempt to make it easier to understand and to better cope with content handling while trying to chunk the server response into lines.
This addresses issue #147 for me. I don't know if it addresses the reported issue in #143, since I didn't report it, but I think it might. It does address the other similar issue that I reported in the comments of #143.