Errors when issuing multiple POST request simultaneous

GoogleCodeExporter commented 8 years ago

What steps will reproduce the problem?
1. Extract the attached zip file
2. Build and run the demo application
3. Open IE9 and type http://localhost:8080/test.html
4. Press F12 to show the IE debugger and show the console
5. Press the button 'Test' and wait until you see 'LOG: complete, error 
count...'

What is the expected output? What do you see instead?
Expected output in the IE debugger console:
no timeouts, no errors

Actual output in the IE debugger console:
LOG: id: 2 (9/10), status: success 
LOG: id: 1 (8/10), status: success 
LOG: id: 4 (7/10), status: success 
LOG: id: 5 (6/10), status: success 
LOG: id: 3 (5/10), status: success 
LOG: id: 9 (4/10), status: error 
LOG: id: 6 (3/10), status: success 
LOG: id: 7 (2/10), status: timeout 
LOG: id: 8 (1/10), status: timeout 
LOG: id: 10 (0/10), status: timeout 
LOG: complete, error count: 4 

What version of the product are you using? On what operating system?
The whole VS2010 project inclusive website is attached to demonstrate the 
problem.

mongoose rev: 8cdb0d40ce44
OS: Windows 7 32bit
Browser: IE9

Please provide any additional information below.
The demo program intrecepts POST requests for the url /echo and sends back the 
received post data.
The test.html website will issue several POST request (by using jquery) 
simultaneously and waits for the result (success, error, timeout).

Two problems are occurring (not everytime, but often):
1. The request is answered with an error
2. The request is not answered and therefore the timeout occurs

Observations:
- The problem seems to be rooted at a multithreading issue - probably related 
to using POST.
- Using the browser and server on the same machine (localhost) seems to 
intensify the problem.

Original issue reported on code.google.com by nullable...@gmail.com on 26 Apr 2012 at 12:28

Attachments:

mongoose_test.zip

GoogleCodeExporter commented 8 years ago


I did quite some tests with your Win32 version, and indeed all post issues seem 
to be solved also for WinXP/32/IE8. Thank you very much for that.
However, it seems that it does not work at all for Linux. The server starts and 
opens a port (according to netstat), but it does not seem to handle any request?
There is also no error log. Did you use any special build options for Linux?

Original comment by bel2...@gmail.com on 26 Jun 2012 at 4:13

GoogleCodeExporter commented 8 years ago

Issue confirmed. Is related to the IPv6 code somehow; investigating further 
while getting rid of all those pesky GCC warnings too.

<rant>Why can't google code pick up basic email reply to issue messages like 
github can - you /have/ to go to the website? Me stupid or g.c. dumb?</rant>

Will message again when fix is available.

Original comment by ger.hobbelt on 28 Jun 2012 at 1:15

GoogleCodeExporter commented 8 years ago

Okay, located the culprit. (It was me.) wasn't the IPv6 code, but the select()s 
not getting a correct first argument (off by one error). 

Updated the github repo (master branch) and the issue349 hg repo. Tested with 
mingw and ubuntu 10.

N.B. there are a few more hairy issues with mongoose connections but those are 
rather fringe-or-MSIE-specific and I'd like to close this issue, i.e. see if 
this code is acceptable. It's a big improvement on the current state of affair 
anyhow.
Backporting isn't always fun. ;-) Besides, the fix for those is another chunk 
of edits and I _guess_ Sergey might want to see them separately.

Original comment by ger.hobbelt on 28 Jun 2012 at 4:35

GoogleCodeExporter commented 8 years ago

Successfully tested on WinXP (VC2010), i568 Linux and ARM Linux. Without any 
doubt, now it is certainly a huge improvement. 

If you run it on WinXP it will stop with “InitializeSRWLock not found in 
KERNEL32.dll”, a possible fix is attached.

Original comment by bel2...@gmail.com on 2 Jul 2012 at 9:01

Attachments:

WinXP_No_InitializeSRWLock_in_KERNEL32_dll.patch

GoogleCodeExporter commented 8 years ago

If anything, this proves the worth of multi-platform testing.

Thanks for the feedback; driving off towards applying the fix like the dark 
rider in Ronal Barbaren.

Original comment by ger.hobbelt on 2 Jul 2012 at 9:27

GoogleCodeExporter commented 8 years ago

Grepping through the code and comments, but still can't figure out what is 
wrong with the current implementation.  Something is definitely broken.
If somebody gives me a summary on what is broken, I'd apprectiate that, thanks!

Original comment by valenok on 16 Aug 2012 at 10:31

GoogleCodeExporter commented 8 years ago

It is a compounded issue; comments #33, #30, #46, #48 would about cover it.

To see and understand what's wrong with the current TCP code, there are two 
inroads:

1) run the tests to observe the failures: bel2125's browser Ajax/POST tests and 
augmented testclient are included in 
https://github.com/GerHobbelt/mongoose/tree/master/test/ajax and 
https://github.com/GerHobbelt/mongoose/tree/master/testclient
(easiest is to grab the repo, build and run against your own mongoose)

(WARNING: current GerHobbelt HEAD rev on github does not work; I'm working on 
fixing it since the latest merges + RFC2616 sec 4.2 conformance and will give a 
holler when it's done; meanwhile, the AJAX tests in /test/ajax do not depend on 
any mongoose C code, so should be useful in at least viewing part of the 
problem set until then)

2) there are several issues which have been addressed in the mentioned repo's 
349 fix branch: first the ones which aren't about 'graceful close' directly:

a) mongoose doesn't take care about situations where 'HTTP keep-alive' 
connections receive requests, which include content data, where the server 
side, either via callback or CGI, doesn't care about all received content, i.e. 
where a response is generated and finalized before all received *content* has 
been read. Current mongoose does a bit of fixup when the internal header-fetch 
buffer still has some data, but that's flaky in two senses: (1) it doesn't 
fetch any later data that's part of the same received content - only mg_read() 
until mg_read()->0 would do that, and (2) there's the anomaly when the network 
traffic and server 'speed' is such that multiple requests are received at once 
from the POV of recv(), i.e. where said header-fetch buffer also contains part 
or whole of a *subsequent* request, which the current code will nuke and thus 
corrupt the entire keep-alive req/resp chain from thereon out.

b) MSIE doesn't like it *at all* when you, as a server, take the initiative to 
close a connection which you just declared 'keep-alive' in your own server 
response you sent last. I didn't go and created the 'push back onto the listen 
queue' for fun; it's mandatory to prevent mongoose from being a very easy DoS 
target while the server code must permit the browser to 'time out' on such 
connections: the 'push back onto queue' code ensures that the mongoose threads 
can do work on *any* pending and *active* request, i.e. for those requests for 
which we know data has been received by our server TCP stack, while very slow 
keep-alive connections and MSIE-like browsers don't 'occupy' the server threads 
which would have to wait for quite a long time (multiple seconds). The way MSIE 
acts, this behaviour is mandatory, anything less has been shown to fail. (Run 
as many MSIE clients as you have mongoose worker threads and it b0rks very 
quickly.)

Just set the default mongoose config option 'keep-alive' to 'yes' and the 
errors will come flying.

c) 'graceful close' isn't just a 'SO_LINGER' and be done with it. 
Theoretically, yes, it would, but again, different browsers, different minds, 
and there are those which do *not* appreciate it (MSIE again, for one) when you 
just SO_LINGER and close: some browsers only recognize a graceful close as one 
when *all* *their* *transmitted* data has been 'received': SO_LINGER doesn't do 
this, so you need to split it up: you'll have to ensure that you recv()'d all 
incoming data for the connection while the 'graceful' timeout tick-tocks down 
to zero, and *only* once you've concluded that no more data will be incoming 
(or your own 'graceful' timeout has expired) do you proceed to a 
so_linger-based close. Of course when your own timeout has expired, you don't 
go and SO_LINGER some more but forcibly close the connection anyway as it's 
taken too long: that procedure is completely accounted for in the 
connection_close logic in https://github.com/GerHobbelt/mongoose :: mongoose.c

These are the big ones that I recall off the top of my head; the mentioned 
tests (ajax and testclient) are the reference material for any failure, as both 
should pass on any decent web server (which has a similar /echo URI handler for 
testclient).

When analyzing, you'll need to test with various networks and clients as some 
failure modes only trigger in particular circumstances (e.g. where the mongoose 
server is slow enough to have the keep-alive client submit 2 more requests 
while mongoose answers the first, in order to trigger the 'cleanup' issue in 
(a). This can be very hard to do (happened only randomly for me) so a code 
review ~ code flow analysis path  might be faster to recognize the error and 
validate the fix (which is to mg_read() until is returns 0).

---

I know my code has quite a few edits compared to the current code, but it would 
be good to see them merged into the mainline; particularly because additional, 
non-trivial work such as full HTTP/1.1 chunked transfer support is built on top.

Aside: as bel2125's testclient is a very good test client as it doesn't play 
nice all the the time, there's also the mutual lockup due to buffers being 
filled and not flushed: this happens in the scenario when mongoose does not 
first collect the entire response (content data) before starting to send the 
corresponding response, which happened in the custom '/echo/' handler. Not a 
'mongoose per se' issue, but definitely something to keep in mind while working 
with the test code. My version of the custom '/echo' handler interleaves 
mg_read and mg_write to prevent such a buffer-based lockup from happening.

Original comment by ger.hobbelt on 16 Aug 2012 at 3:42

Letractively / mongoose

Errors when issuing multiple POST request simultaneous #349