Killerduty / mongoose

Automatically exported from code.google.com/p/mongoose
MIT License
0 stars 0 forks source link

Errors when issuing multiple POST request simultaneous #349

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Extract the attached zip file
2. Build and run the demo application
3. Open IE9 and type http://localhost:8080/test.html
4. Press F12 to show the IE debugger and show the console
5. Press the button 'Test' and wait until you see 'LOG: complete, error 
count...'

What is the expected output? What do you see instead?
Expected output in the IE debugger console:
no timeouts, no errors

Actual output in the IE debugger console:
LOG: id: 2 (9/10), status: success 
LOG: id: 1 (8/10), status: success 
LOG: id: 4 (7/10), status: success 
LOG: id: 5 (6/10), status: success 
LOG: id: 3 (5/10), status: success 
LOG: id: 9 (4/10), status: error 
LOG: id: 6 (3/10), status: success 
LOG: id: 7 (2/10), status: timeout 
LOG: id: 8 (1/10), status: timeout 
LOG: id: 10 (0/10), status: timeout 
LOG: complete, error count: 4 

What version of the product are you using? On what operating system?
The whole VS2010 project inclusive website is attached to demonstrate the 
problem.

mongoose rev: 8cdb0d40ce44
OS: Windows 7 32bit
Browser: IE9

Please provide any additional information below.
The demo program intrecepts POST requests for the url /echo and sends back the 
received post data.
The test.html website will issue several POST request (by using jquery) 
simultaneously and waits for the result (success, error, timeout).

Two problems are occurring (not everytime, but often):
1. The request is answered with an error
2. The request is not answered and therefore the timeout occurs

Observations:
- The problem seems to be rooted at a multithreading issue - probably related 
to using POST.
- Using the browser and server on the same machine (localhost) seems to 
intensify the problem.

Original issue reported on code.google.com by nullable...@gmail.com on 26 Apr 2012 at 12:28

Attachments:

GoogleCodeExporter commented 9 years ago
Unfortunately your test does not seem to run with Internet Explorer 8, and 
Internet Explorer 9 does not run on Windows XP. 

So I made a modified Version of the test which runs for IE8 (and many other 
Browsers) and does not require a debug console in the browser. Furthermore it 
uses a CGI script "echo.cgi" in the same folder instead of a callback, so it 
can be tested with an unmodified mongoose. It actually does not matter what the 
cgi script sends, as long as it is valid a json object.

The test HTML is attached.

I tried it with 8 different browsers and the server running on both, win32 and 
linux, localhost and remote, with rather bad results: Actually all major 
browsers had troubles with the async post requests. Browsers based on Presto 
(Opera) and Trident (IE) had a lot of problems, but I also saw troubles with 
WebKit (Safari, Chrome, ..) and sometimes Gecko (Firefox) browsers. Since the 
error can be reproduced with different web engines, and with CGI as well as 
with callbacks, it is very likely an issue in the mongoose core implementation.

@nullable: Could you please have a quick look if the essential meaning of your 
test and mine is still the same (if not I will open another issue for this), 
and maybe try it with IE9.

Original comment by bel2...@gmail.com on 2 May 2012 at 12:53

Attachments:

GoogleCodeExporter commented 9 years ago
> @nullable: Could you please have a quick look if the essential meaning of 
your test and mine is still the same (if not I will open another issue for 
this), and maybe try it with IE9.

Your test HTML is showing (as i expect) the same problem. Thank you for 
investigating on this issue.

I forgot to say, that i tested also with the precompiled binary from this site 
(v3.1) in a similar manner like you and it shows the same problems. Probably 
you could attach the cgi script also?

BTW: IE9 can emulate IE8 and IE7.. :-)

The problem gets intensified the more CPU cores are available. On my Core 2 Duo 
it shows the problem relatively seldom - but on another machine with 8 cores it 
shows the problem very often. Defintly a multithreading issue.

Original comment by nullable...@gmail.com on 2 May 2012 at 7:28

GoogleCodeExporter commented 9 years ago
> Probably you could attach the cgi script also?

All files required for the test are here:
http://code.google.com/r/bel2125-mongoose/source/browse#hg%2Ftest%2Fajax
In case I make any improvements to the test, they will be available there as 
well.

The problem does also occur on single core ARM CPUs and forcing the server to a 
single CPU in Win32 (by SetProcessAffinityMask(GetCurrentProcess(), 1);) does 
not seem to make it better. So it is at least not a "multicore only" issue.
Also reducing the number of worker thread to one does not eliminate the 
problem, so I think it must be somewhere in the interaction between master 
thread and worker thread and/or the sockets. 

The result of a test with IE8, just one worker thread and process affinity set 
to one core is attached as an image. The server log that belongs to the image 
is also attached. The test was performed on localhost at a WinXP machine which 
is strong and idle enough that there should be no timeout.

The log shows several errors:
- The test makes 10 async requests, but according to the 
  log there are just 5 accesses to the resource
- "accepted socket", "queued socket" and "grabbed socket"
  occurs 9 times each
- The log shows one error 577 and one error 400

It seems that the POST data does not reach the right socket?

Original comment by bel2...@gmail.com on 3 May 2012 at 7:29

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks for looking at this!
I am repeating bel2125's tests on macos, and don't see errors.
Let's narrow down the problem -- are we seeing this only on mongoose running on 
windows?
Do you guys see any trouble when mongoose is running on linux/bsd/mac?

Original comment by valenok on 5 May 2012 at 4:41

GoogleCodeExporter commented 9 years ago
The errors can be observed on the following OS:
- Win7 (original post from nullable with IE9)
- WinXP (dual core X86 with my modified test; IE8 and other browsers)
- Linux (dual core X86, Debian V6)
- Linux (single core ARM)

I did the last three tests and found how often the problem occurs does not seem 
to depend on the platform but the browser. 
Opera does not seem to understand the test at all, so I got only errors from 
that browser (or I made something wrong with this browser). Internet Explorer 
(I used 8, nullable used 9) showed a lot of errors, also Safari and Chrome did 
not do well. Firefox worked well with the 10 test cases, but instead of the 
"done" message below the result table it keeps showing "running" - other 
browsers do not. Embedded browsers and Aurora seem to work better. However, it 
is not an "Internet Explorer only" problem, it is just more frequent there - if 
you repeat the test more often you could see the problem with any major browser.

I also tested with LunarScape, which is a browser that supports three different 
web engines and found the same results.

In all my tests the server was running on real machines, not VMs. A virtual 
machine may show a different timing, so it is probably not ideal for this kind 
of test.
I do not have a MacOS computer available so I did not test it there.

Original comment by bel2...@gmail.com on 5 May 2012 at 9:59

GoogleCodeExporter commented 9 years ago
I changed my test a bit by giving each request another url - but still just 
returning the post data. I attached the original debug log of a session with a 
failure (one timeout) - and the same log grouped by requests.

What seems strange to me are the different get_request_len values:
2917 (/echo_x) or 2918 (/echo_xy) seems to be the correct length, but there 
also 2707 (/echo_1), 415 and 416.

I attached again my changed test - just for completness. There some other small 
changes as reporting no longer to the browsers debug console but displaying the 
result on the website.

Original comment by nullable...@gmail.com on 6 May 2012 at 4:06

Attachments:

GoogleCodeExporter commented 9 years ago
Unfortunately your test does again not run with Internet Explorer 8, which is 
the most recent version available for WinXP ("Object required, line 77, row 1").
Also my Qt Browser gets stuck after the first of ten retries.

Since my test shows basically the same phenomena I continued using mine.
In my CGI script I tried to store the post data in a file, just for analysis, 
but when I do this the test seems to work all the time. Write to files changes 
the timing or leads to a serialization of the requests at the file system 
operations.
Certainly this cannot be a solution to the issue, but only instrumentation. 

Furthermore, the problem seems to depend on the size of the post data. On the 
other hand, the reply data is not relevant.

Original comment by bel2...@gmail.com on 7 May 2012 at 7:17

Attachments:

GoogleCodeExporter commented 9 years ago
Have you similar observations about the mentioned len stated in the debug 
output of get_request_len(..)?

I have noticed the same about the size of the post data.

Original comment by nullable...@gmail.com on 7 May 2012 at 8:25

GoogleCodeExporter commented 9 years ago
I made a simplified test with reduced data, basically just the ID:

data: { 'id' : id }

in this test I always get a CONTENT_LENGTH of 4 bytes (5 bytes for id=10), but 
there seems to be some error when receiving the post data in the CGI script - I 
did not analyze that in more detail yet, since my method of just writing all 
data in a file in the CGI script seems to change the timing too much.

Original comment by bel2...@gmail.com on 7 May 2012 at 2:12

GoogleCodeExporter commented 9 years ago
I made an additional observation:
I set num_threads to 1 and tried to load the initial web-site 
(http://localhost:8080/test.html) of the second test provided by me.

Now the following files should be transferred:
- test.html
- jquery.js
- favicon.ico -> Not Found

If i make break point in consume_socket(..) and "synchronize" the calls this 
way, all is fine. If i don't have a break point, the client thread hang 
immediately in the recv(..) call of pull(..). It can't serve the files.

1. Can someone confirm this behaviour?
2. Is it probably related to the bug we are discussing here?

Original comment by nullable...@gmail.com on 8 May 2012 at 6:54

GoogleCodeExporter commented 9 years ago
I have some more test results, with the test scripts from 
http://code.google.com/r/bel2125-mongoose/source/browse#hg%2Ftest%2Fajax, 
Version rcf8cef88d4ba. In contrast to the earlier tests, the test data is no 
longer generated but specified explicitly in test.html.

In all Tests the "async POST" button was used - then nothing was done for about 
a minute - then again "async POST" was pressed ... 

In case the test is used unmodified (test.html contains six lines of 
'longText', echo.cgi does not write the data to file), almost no test works. 
One has to try very often to find one "success" in the test log.
 -> see bad_case.log

In case the data is stored to a file (change line 47 in echo.cgi from doLogging 
= false to doLogging = true), the POST requests work well. I did not observe 
any errors yet.
 -> see good_case.log

In case the data is not stored to a file but very short (comment out all 
"longtext" -lines in test.html, i.e. line 52-57), it works much better. The 
data is now basically just "id:1" and "async:true". The "errors" are relatively 
rare, but they are still there.
 -> see short_post_data.log

Maybe the result depends on the fragmentation of the post data - at least it 
depends on the size. Indeed when looking at the logs, the lenghts differ much 
more in the bad case.

Original comment by bel2...@gmail.com on 8 May 2012 at 8:00

Attachments:

GoogleCodeExporter commented 9 years ago
> 1. Can someone confirm this behaviour?

I never had problems with any browser when loading the test page (test.html + 
jquery.js), not for t=1, not for t=10 - not with my test, not with yours. But I 
can only use IE8 - IE8 is not able to execute your scripts.

In my test sometimes threads are blocked for quite a while during the test, and 
if only one is available of course the entire server is blocked. I made most of 
the tests with my own clone (http://code.google.com/r/bel2125-mongoose/), 
beside some fixes for keep-alive it can be configured to close a socket 
forcefully when it is idle for 2 minutes. This way the server is never blocked 
completely for a longer time.

Original comment by bel2...@gmail.com on 8 May 2012 at 9:38

GoogleCodeExporter commented 9 years ago
I'm a bit confused... i recognized, that the problem is tight coupled with the 
number of CPUs you have. On my machine with 8 CPU i have big problems, as 
described also by simply loading the files (html, js, ..) without executing the 
javascript part. But even once loaded successfully, executing the javascript 
tests are failing (at a big part).

This is much less the case on my 2 CPU machine.

To get a bit deeper into this, i made a shortcut and processed every request 
directly on the master thread. I was very surprised to see the same problems 
even with only one (the master) thread! I attached my changed code (which is 
marked with a "DEBUG" comment). Unfortunatly i didn't yet converted the 
javascripts to be able running on IE8.

I hope i didn't messed up the code and my change is valid - otherwise let me 
know.

Original comment by nullable...@gmail.com on 9 May 2012 at 4:13

Attachments:

GoogleCodeExporter commented 9 years ago
I always had the problem on single core machines as well.
At a first glance your code changes look OK for me, but I did not test it yet. 

I made a test with the standard server and 10 threads using the "fiddler2" web 
debugger. It can be used at the client PC to log the communication between 
server and client (probably not for localhost). One can clearly see that in 
some cases the server sents a 577 error (see attachment).

@nullable: Maybe you could provide such a log for your case too? It would be 
interesting if your test on your machine with your IE will also yield in a 577 
server response.

Original comment by bel2...@gmail.com on 10 May 2012 at 1:14

Attachments:

GoogleCodeExporter commented 9 years ago
Here are some test results:

With fiddler and browser running on a VM:
Tests from comment 6: no problems, always 200 OK
Tests from comment 13: no problems, always 200 OK

Without fiddler - browser running on a VM:
Tests from comment 6: often timeouts
Tests from comment 13: often timeouts

With fiddler, browser and server running on the same machine (not a VM):
Tests from comment 6: no problems, always 200 OK
Tests from comment 13: no problems, always 200 OK

Without fiddler - browser and server running on the same machine (not a VM):
Tests from comment 6: often timeouts, some errors (XHR status code 12152)
Tests from comment 13: mostly could not load test.html and jquery.js, if 
loaded: often timeouts, some errors (XHR status code 12152)

I never got error 577. XHR status code 12152 seems to come from IE when the 
response is not valid.

Original comment by nullable...@gmail.com on 11 May 2012 at 7:37

GoogleCodeExporter commented 9 years ago
Hmm. Then the fiddler web debugging proxy changes the timing too much on your 
machine.

Did you see any error codes in the server debug log?

Original comment by bel2...@gmail.com on 11 May 2012 at 11:30

GoogleCodeExporter commented 9 years ago
The server debug log shows 404 (after the timeout) - i guess that is when the 
ajax timeout closes the connection, but the request was not fully received.

On monday i will try the same tests with mongoose version 3.1 and 2.12.

Original comment by nullable...@gmail.com on 12 May 2012 at 11:45

GoogleCodeExporter commented 9 years ago
I tested now version 3.2, 3.1 and 2.12 - the sources and binaries are contained 
in tests.zip.

The reports (server debug log and screenshot of the browser window after a 
test) are attached.

The tests were made on my 8 CPU machine with Win7 64bit using IE9.

All versions shows the problem.

Original comment by nullable...@gmail.com on 14 May 2012 at 7:27

Attachments:

GoogleCodeExporter commented 9 years ago
With some instrumentation I found that the errors arise somewhere in push and 
pull.
That there were no 577 errors in the test performed by nullable is because 577s 
are set in "forward_body_data" only, which is called for CGI pages but not for 
callbacks. Still the source of these errors could be the same, in the push/pull 
functions.

My push and pull functions with additional traces can be found as an attachment 
(together with forward_body_data), as well as the debug traces.

Still this does not explain WHY the functions fwrite and recv sometimes fail in 
push and pull. 

Original comment by bel2...@gmail.com on 16 May 2012 at 6:41

Attachments:

GoogleCodeExporter commented 9 years ago
Since it is not really possible to see inside Internet Explorer, I made a test 
client to reproduce this issue. It can be found here: 
http://code.google.com/r/bel2125-mongoose/source/browse/#hg%2Ftestclient
The test client sends multiple parallel requests to the web server, and indeed 
the issue can be reproduced.

It turns out that it there are errors not only in simultaneous POST requests, 
but also in GET Requests (although not that frequent). I made tests with GET 
requests (0 bytes of data) as well as POST requests with 3, 1000 and 10000 
bytes of data. The more data is sent with the POST requests, the higher the 
number of errors – but there are errors basically everywhere, even for GET.

/ajax/echo.cgi GET

CPUs (bit masks): process=3, system=3
Preparing test ... OK: 77 bytes of data
Starting multi client test: 20 cycles, 20 clients each

Test cylce 1 completed

Error: Client 2002 got 0 bytes instead of 77
Error: Client 2018 got 0 bytes instead of 77
Test cylce 2 completed

Test cylce 3 completed

Test cylce 4 completed
...

/ajax/echo.cgi POST 3

CPUs (bit masks): process=3, system=3
Preparing test ... OK: 95 bytes of data
Starting multi client test: 20 cycles, 20 clients each

Error: Client 1005 got 0 bytes instead of 95
Test cylce 1 completed

Test cylce 2 completed

Test cylce 3 completed

Test cylce 4 completed

Test cylce 5 completed

Error: Client 6007 got 0 bytes instead of 95
Error: Client 6013 got 0 bytes instead of 95
Test cylce 6 completed
...

/ajax/echo.cgi POST 1000

CPUs (bit masks): process=3, system=3
Preparing test ... OK: 98 bytes of data
Starting multi client test: 20 cycles, 20 clients each

Error: Client 1000 got 33 bytes instead of 98
Test cylce 1 completed

Test cylce 2 completed

Error: Client 3002 got 33 bytes instead of 98
Error: Client 3006 got 0 bytes instead of 98
Test cylce 3 completed

Test cylce 4 completed

Test cylce 5 completed

Test cylce 6 completed
...

/ajax/echo.cgi POST 10000

CPUs (bit masks): process=3, system=3
Preparing test ... OK: 99 bytes of data
Starting multi client test: 20 cycles, 20 clients each

Error: Client 1000 got 33 bytes instead of 99
Error: Client 1003 got 33 bytes instead of 99
Error: Client 1007 got 33 bytes instead of 99
Test cylce 1 completed

Error: Client 2000 got 33 bytes instead of 99
Error: Client 2002 got 33 bytes instead of 99
Error: Client 2001 got 33 bytes instead of 99
Error: Client 2008 got 33 bytes instead of 99
Error: Client 2006 got 33 bytes instead of 99
Test cylce 2 completed

Thread 3000 did not finish!
Test cylce 3 completed

Test cylce 4 completed

Error: Client 5002 got 33 bytes instead of 99
Thread 5000 did not finish!
Test cylce 5 completed

Error: Client 6004 got 33 bytes instead of 99
Error: Client 6006 got 33 bytes instead of 99
Error: Client 6001 got 33 bytes instead of 99
Error: Client 6008 got 33 bytes instead of 99
Error: Client 6010 got 33 bytes instead of 99
Error: Client 6012 got 33 bytes instead of 99
Error: Client 6018 got 33 bytes instead of 99
Thread 6000 did not finish!
Test cylce 6 completed

Error: Client 7001 got 33 bytes instead of 99
Error: Client 7006 got 33 bytes instead of 99
Error: Client 7003 got 33 bytes instead of 99
Error: Client 7005 got 33 bytes instead of 99
Error: Client 7007 got 33 bytes instead of 99
Error: Client 7009 got 33 bytes instead of 99
Error: Client 7011 got 33 bytes instead of 99
Test cylce 7 completed

Error: Client 8005 got 33 bytes instead of 99
Error: Client 8003 got 33 bytes instead of 99
Error: Client 8000 got 33 bytes instead of 99
Error: Client 8001 got 33 bytes instead of 99
Error: Client 8002 got 33 bytes instead of 99
Error: Client 8004 got 33 bytes instead of 99
Error: Client 8006 got 33 bytes instead of 99
Test cylce 8 completed

Error: Client 9001 got 33 bytes instead of 99
Error: Client 9003 got 33 bytes instead of 99
Error: Client 9002 got 33 bytes instead of 99
Error: Client 9000 got 33 bytes instead of 99
Error: Client 9004 got 33 bytes instead of 99
Error: Client 9008 got 33 bytes instead of 99
Test cylce 9 completed

Error: Client 10002 got 33 bytes instead of 99
Error: Client 10000 got 0 bytes instead of 99
Error: Client 10001 got 33 bytes instead of 99
Error: Client 10004 got 33 bytes instead of 99
Test cylce 10 completed

Error: Client 11003 got 33 bytes instead of 99
Error: Client 11001 got 33 bytes instead of 99
Test cylce 11 completed

Error: Client 12000 got 33 bytes instead of 99
Error: Client 12018 got 33 bytes instead of 99
Test cylce 12 completed
…

Original comment by bel2...@gmail.com on 21 May 2012 at 9:50

GoogleCodeExporter commented 9 years ago
I modified the test so that it runs automatically and tests POST requests with 
different content lengths. The test case can be found at 
http://code.google.com/r/bel2125-mongoose/source/browse/#hg%2Ftestclient as 
version r7ffffb3606d1.

The test logs are attached.

Errors can be found for ANY content length, there are just more frequent for 
larger POST requests. There are at least two kinds of errors. One with a 33 
byte response, another with no response at all (0 bytes).

Original comment by bel2...@gmail.com on 22 May 2012 at 10:04

Attachments:

GoogleCodeExporter commented 9 years ago
When looking closer at the log, it seems like the misbehavior of mongoose is 
not limited to parallel requests from many clients.

There are also several error situations with just one client making a POST 
request, e.g.,
- 
http://code.google.com/p/mongoose/issues/attachmentText?id=349&aid=3490021001&na
me=testrun1.txt&token=cKV2NU5T5QLIZoW8EHzi4QtEju0%3A1337681079528#1216
- 
http://code.google.com/p/mongoose/issues/attachmentText?id=349&aid=3490021001&na
me=testrun1.txt&token=cKV2NU5T5QLIZoW8EHzi4QtEju0%3A1337681079528#2305
- 
http://code.google.com/p/mongoose/issues/attachmentText?id=349&aid=3490021001&na
me=testrun1.txt&token=cKV2NU5T5QLIZoW8EHzi4QtEju0%3A1337681079528#3403
- 
http://code.google.com/p/mongoose/issues/attachmentText?id=349&aid=3490021001&na
me=testrun1.txt&token=cKV2NU5T5QLIZoW8EHzi4QtEju0%3A1337681079528#4498
- 
http://code.google.com/p/mongoose/issues/attachmentText?id=349&aid=3490021002&na
me=testrun2.txt&token=JO8nywuKrKMn8rD9oR60Kq_404I%3A1337681079528#432
- 
http://code.google.com/p/mongoose/issues/attachmentText?id=349&aid=3490021002&na
me=testrun2.txt&token=JO8nywuKrKMn8rD9oR60Kq_404I%3A1337681079528#1466

In other words, mongoose is unreliable in handling POST requests in all 
situations, more threads and a larger amount of data just increases the 
frequency of the errors. However, this error may strike you even for a single 
client with a very simple request.

Original comment by bel2...@gmail.com on 22 May 2012 at 10:27

GoogleCodeExporter commented 9 years ago
These are bad news... do you know whether this applies also GET requests in the 
way stated as in comment 22?

Original comment by nullable...@gmail.com on 22 May 2012 at 12:16

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
The POST test above was running for several hours, and then it took some time 
to evaluate it and to ensure the test was running correctly. I am doing the 
same for GET requests, but it's not completed yet.

In contrast to POST requests, GET requests may not have any body data. Only the 
size of the request head and the length of an optional query string may vary. 
Some extra data in the request head and an additional query string may actually 
be used for GET, POST and any other request.

In contrast to increasing the length of POST body data, increasing the length 
of the head data and using longer query strings does not seem to yield in a 
significant increase in the frequency of this error. Query strings up to 
somewhere between 3 and 4 kB can be used – for longer strings the server does 
not seem work at all, but 3 kB is well above the limit for Internet Explorer 
(2083 bytes according to 
http://en.wikipedia.org/wiki/Query_string#Compatibility_issues), so this should 
be sufficient. Adding some 10 kB of useless data fields in the request header 
does not seem to cause the same issues as adding the same amount of body data.

To find out if at least a non concurrent GET request is 100% reliable, I will 
first run a long term test (for several hours) with small request headers and 
just one client thread.

Original comment by bel2...@gmail.com on 22 May 2012 at 4:07

GoogleCodeExporter commented 9 years ago
I made another modified version of the test which performs 1 GET request every 
10 seconds (revision 8fabfb035854, 
http://code.google.com/r/bel2125-mongoose/source/browse/testclient/testclient.c?
r=8fabfb035854746c902d23e2981a4ef80de04304).

The result after 3000 tests (about 9 hours) is:
GET  998 2   (998 of 1000 OK, two cases of "Client 0 got %u bytes instead of 
77")
GET 1000 0  (1000 of 1000 OK)
GET  998 2   (998 of 1000 OK, two cases of "Client 0 got %u bytes instead of 
77")
... that means 4 of 3000 GET requests got lost (0.133%).

This is about the same ratio as for concurrent post request with less than a 
few hundred bytes of body data (see the first 10 lines of testclient.log in 
comment 21). There 14 of 10000 small POST requests got lost (0.14%).

So there seems to be a basic flaw that hits 0.14% of all requests, no matter if 
they are GET or POST requests and no matter how large they are ("got 0 bytes 
instead xx"). In addition there is a second error that depends on the post data 
size ("got 33 bytes instead xx").

Original comment by bel2...@gmail.com on 23 May 2012 at 7:06

GoogleCodeExporter commented 9 years ago
Some more test results:

A non-concurrent POST request with 0 bytes of data (Test Version 478bda26593b):

 OK  Errors
1000   0
 999   1
 999   1
 994   6

... not much difference to the GET request.

Original comment by bel2...@gmail.com on 25 May 2012 at 6:38

GoogleCodeExporter commented 9 years ago
Since both, GET and POST requests to CGI generated scripts are unreliable, I 
tested a GET request to static contents (Test version 5af5ab6c3e45):

Both, the single client and the 20-client concurrent test did not show any 
losses in 3000 (single client) and 33000 (multiclient) test requests.

In order to check if the flaw is in mongoose or the CGI process, I tested with 
a different CGI process. In fact I mofidied the line 

if (CreateProcessA(NULL, cmdline, NULL, NULL, TRUE,

to

if (CreateProcessA(NULL, "cmd /c type reply.txt", NULL, NULL, TRUE,

The result is the same as for the CGI process before:

Multiclient test:
997 3
996 4
992 8
996 4
998 2
1000    0
999 1
989 11
994 6

single client test:
CPUs (bit masks): process=3, system=3
Error: Client 1 got 0 bytes instead of 3
Error: Client 1 got 0 bytes instead of 3
Error: Client 1 got 0 bytes instead of 3
Error: Client 1 got 0 bytes instead of 3
Error: Client 1 got 0 bytes instead of 3
test cycle 0: 995 good, 5 bad
Error: Client 1 got 0 bytes instead of 3
Error: Client 1 got 0 bytes instead of 3
Error: Client 1 got 0 bytes instead of 3
Error: Client 1 got 0 bytes instead of 3
Error: Client 1 got 0 bytes instead of 3
Error: Client 1 got 0 bytes instead of 3
Error: Client 1 got 0 bytes instead of 3
test cycle 1: 993 good, 7 bad
Error: Client 1 got 0 bytes instead of 3
Error: Client 1 got 0 bytes instead of 3
Error: Client 1 got 0 bytes instead of 3
Error: Client 1 got 0 bytes instead of 3
Error: Client 1 got 0 bytes instead of 3
Error: Client 1 got 0 bytes instead of 3
Error: Client 1 got 0 bytes instead of 3
Error: Client 1 got 0 bytes instead of 3
Error: Client 1 got 0 bytes instead of 3
Error: Client 1 got 0 bytes instead of 3
Error: Client 1 got 0 bytes instead of 3
test cycle 2: 939 good, 11 bad

Original comment by bel2...@gmail.com on 25 May 2012 at 7:01

Attachments:

GoogleCodeExporter commented 9 years ago
All my tests before used either CGI (GET, POST) or static content (GET). 

I modified the test to check callback generated content 
(http://code.google.com/r/bel2125-mongoose/source/browse/testclient/testclient.c
?r=334dd2cf9681994ee69e66646ea3ed34ab118a3a). The callback is here: 
http://code.google.com/r/bel2125-mongoose/source/browse/main.c?r=4d4559368c99c17
a003456a4767fdee2b265fee1

It seems out that both, POST and GET requests to this callback are reliable ...

DATASIZE   OK  FAILED
       1  1000  0
       2  1000  0
       4  1000  0
       8  1000  0
      16  1000  0
      32  1000  0
      64  1000  0
     128  1000  0
     256  1000  0
     512  1000  0
    1024  1000  0
    2048  1000  0
    4096  1000  0
    8192  1000  0
   16384  1000  0
   32768  1000  0
   65536  1000  0
  131072  1000  0
  262144  all failed

... until a POST content length of about 200 kB. So this must be a third error.

Original comment by bel2...@gmail.com on 29 May 2012 at 6:59

GoogleCodeExporter commented 9 years ago
TL;DR:

major offender is incorrect 'graceful close', but that doesn't seem to cover it 
all. Yet.

.....................

/TL;DR:

Bugger: old message doesn't show up in here. Anyway, not really important as I 
was able to reproduce the issue with static content and callback + POST/GET.

There's more than one problem here - including issues in the test app 
'testclient' itself - but the major biggy is incorrect handling of the 
'graceful close', at least on Windows (tested on Win7/64/Winsock2).

I have a mongoose clone over at https://github.com/GerHobbelt/mongoose/network 
where branch bel2125-multitransfer-test now contains code that at least 
resolves this issue in mongoose.

Doesn't mean everything is well, as I still get some odd responses which are 
very timing dependent, so more analysis is being done.

Anyway, relevant URLs and code at 
https://github.com/GerHobbelt/mongoose/blob/c69da4db0f2fc6b0d60d984e6c839504fc12
421a/mongoose.c#L4920  - note the MSDN and particularly the netherlabs URL in 
the comments there if you want/need more in depth intel on this thing.

.......................

Other issues which popped up:

- testclient would lock up for large-and-beyond POSTs, etc. due to the TCP 
buffers filling up and not getting read (pulled) at the other end. This goes 
for both the client and the server side of the /_echo code.
( https://github.com/GerHobbelt/mongoose/commits/bel2125-multitransfer-test -- 
work started about near commit c81c3d / 29 May 2012)
  --> fixed by having client 'slurp' content while it is still sending POST data and server main.c code fetching data in chunks and spitting it back again.

- testclient code has been overhauled to reduce CPU load; even very small 
transfers would max out the CPUs.

- UNRESOLVED: 'testclient 35 1 .' (that'd be 1 thread testing POST /_echo with 
N bytes) _can_ quit prematurely due to 'no bytes received'; this is an as yet 
unresolved timing issue that is related to the 'graceful close' issue that's 
the major offender. Exact cause undetermined, (of course) does not exhibit this 
behaviour in the debugger; effect is clearly very timing dependent.

- UNRESOLVED: 'testclient' running the entire test set will sometimes show a 
lock up, which lasts until timeout; exact cause not yet determined.

Original comment by ger.hobbelt on 31 May 2012 at 8:13

GoogleCodeExporter commented 9 years ago
oh-kay. Turned out the remaining 'odd behaviour' items over here were all due 
to bugs in testclient - all my own mistakes (heck, I altered it so much I can't 
blame anyone else :-) ).

Anyway, all tests in testclient PASS for 
https://github.com/GerHobbelt/mongoose/commits/bel2125-multitransfer-test , 
latest commit.

TODO: cherrypick/backport and then push back into gc/hg, so this can be turned 
into pull request.

Original comment by ger.hobbelt on 31 May 2012 at 12:35

GoogleCodeExporter commented 9 years ago
Thank you for your work on this topic. Since you made quite some changes, it 
will take me a while to really look through the code.

I know that the testclient uses some unusual way of closing the connection, 
basically waiting for 10 seconds if no more data is received. On the one hand, 
if the server does not respond well the client will not hang, on the other hand 
I think it is realistic that a user presses reload or close if there is no 
response for 10 seconds anyway. I also thought that I should not have any 
problem with the Nagle algorithm, since it will send any remaining data after 
some 100 ms anyway – maybe this was not a correct assumption?

In any case, the problem must not be in the testclient alone, since the entire 
issue started with missing responses to post requests within Internet Explorer. 
I just used the testclient to see a bit deeper inside, deeper that it is 
possible with Internet Explorer. Especially since web debugging tools like 
Fiddler alter the behavior of the system too much – maybe by serializing the 
requests?
So there must be at least some issue in the server code.

I have seen that there are quite some differences between your version of 
mongoose.c and mine, and even more differences to the mainline. I think many of 
them are structure and coding style. What would you think is the main 
functional difference regarding to this issue? 

Original comment by bel2...@gmail.com on 4 Jun 2012 at 4:02

GoogleCodeExporter commented 9 years ago
Grmbl >:-((  Had written a reply explaining what's changed in testclient and 
why, plus changes in testcode in mongoose event callback, then Safari crashes 
and it's bye bye. 

Anyway, bottom line: 
- TCP is a stream protocol (and yes, I had to dust off my W/ Richard Stevens'es 
and be reminded of that) and Nagle in combination with HTTP is not a fix, but 
rather bad ducktape for a botched interchange. (There's no 'end of message 
flush moment' in a stream protocol other than close (or more properly: 
shutdown(SHUT_RW) so other side knows you're done as client, then wait for 
server to see this (instantly! no Nagle needed!) and have him do the same 
(shutdown(SHUT_RW) or full 'graceful' close. Since the graceful close is 
botched in mongoose...)

- testclient would cause a connection and thus thread lockup in mongoose - this 
happened if data sent is larger than SO_RCVBUF of client + 
SO_SNDUF+internal-buffer-sizes of mongoose as client will block in send() 
because it's not done yet, mongoose will have got a return from recv() already 
and jumped to the event callback code, which will do further recv(), which is 
fine, but it'll try to send() the data received so far as part of the echo code 
and since testclient isn't fetching any data yet (still busy blocking in the 
large send() action itself), buffers fill up and then both lock up in blocking 
calls: testclient isn't done sending yet, while mongoose needs to get its 
written data out the door as everything is full.

If you haven't got the Stevens' books, here's a few links which got me going 
the right way (contrasting with a lot of 'disable Nagle' distraction on the 
Net):

http://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why
-is-my-tcp-not-reliable

http://www.techrepublic.com/article/tcpip-options-for-high-performance-data-tran
smission/1050878

More to follow later. Getting late again :-(

Original comment by ger.hobbelt on 4 Jun 2012 at 10:22

GoogleCodeExporter commented 9 years ago
And in answer to what's the main functional differences:

- main.c (hence mongoose) has been changed in the echo event callback to listen 
for both remaining incoming data and send buffer available simultaneously as 
you can't predict how much data is incoming (read: waiting in the TCP stack 
buffers; you only know the _total_ when it's a _proper_ HTTP/1.1 client 
transmit, but that doesn't help you as you need to be able to send when 
possible and read when required or you'ld get the previously mentioned 
connection lockup again)

- mongoose.c has a corrected graceful close logic + corrected should_keep_alive 
logic (yes, I am aggressive in closing a connection in that no-one can 
guarantee full awareness of server state when a 5xx error happens -- which in 
original mongoose also means a status of -1 can arrive at should_keep_alive.
Thank you for reminding me about the should_keep_alive logic; mine was still 
flawed, and fatally in case of a 401! That's corrected: 
https://github.com/GerHobbelt/mongoose/blob/master/mongoose.c#L1383

- testclient now acts as a correct client by sending request, which will be 
followed by close anyway as we send Connection:close, so we know what to do: 
shutdown(write side) and then fetch the remainder of the server response.

I have to go back in and create a mode in testclient where it's BAD behaved by 
NOT doing a shutdown(WR); not doing that would make the TCP stack at the client 
side wait before sending the request thus leading to slow performance.

(When HTTP requests are pipelined they shouldn't depend on the previous 
responses in the pipeline as the stream protocol gives no guarantees whatsoever 
when data will arrive so you can't have a 'message-based' approach like send 
request, wait for response to complete, send another request, on a single TCP 
connection. That mistake is made very often and Nagle is that used to 'fix' 
that mistake.)

- mongoose.c also has timeouts (SO_RCVTIMEO et al) set for the _listening_ 
socket (and then sets them again on the socket produced by accept() in case the 
stack isn't conform specs: options of a listened socket propagate into the 
sockets produced by accept()). Original mongoose didn't do this and just set 
timeouts on the accepted ones. Minor difference and probably not relevant to 
this issue.

- also note the comments at the graceful close site: sockets are explicitly 
switched back to BLOCKING mode before doing the graceful close dance; like MSDN 
docs and others say: don't do this on a non-blocking socket.

The rest of the edits would probably be irrelevant to this case.

Original comment by ger.hobbelt on 4 Jun 2012 at 10:49

GoogleCodeExporter commented 9 years ago
The source of a minimal Qt WebKit based browser is available here: 
http://code.google.com/r/bel2125-mongoose/source/browse#hg%2FBrowser

Do not think of it as a fully featured browser that could be a replacement for 
your favorit PC/Mac/desktop browser. It could be a starting point for building 
a browser for an embedded device, e.g., on one of those embedded-something 
boards you may get at an electronics components store for less than 50 $ or €.
It could also be a a starting point for a browser window you may add to an 
application, e.g., to show help pages or dynamic contents generated from that 
application. It does not have a menu or navigation bar, just specify the 
starting url on the command line. It does not contain a window resize handler, 
so the size is fixed and can be set from the command line by using -w###x### 
(e.g., -w600x400).

Original comment by bel2...@gmail.com on 8 Jun 2012 at 6:17

GoogleCodeExporter commented 9 years ago

Studying the links from ger.hobbelt (comment 33), it seems that TCP has quite 
some properties that are undesired for HTTP. After reading that, I felt like I 
have to write down some important use cases one would have for a HTTP server: 
U1) the classical interactive use by a human user
U2) as a protocol for devices that interchange information (no humans)
U3) use in a bad network (mobile internet in rural areas)
U4) exposure to the internet (an unfriendly to vicious environment)
U5) use it mainly in keep-alive mode
U6) use it on embedded devices (small memory, non-standard operating systems)
U7) use it embedded in other applications

U1 means it has to be reasonably responsive. There the must not be long or 
frequent delays in interactive user sessions.
U2 means it has to be reliable. A user may tolerate a loading fault once in a 
while and push the reload button, a fully automatic client may not.
U3 means not every connection is shut down nicely, sometimes the network is 
rather slow, sometimes there are connection leaks for a few seconds and 
sometimes it breaks down completely for an hour, because of weather conditions.
U4 means that you have to expect unknown and bad behaved clients. Probably not 
attacks from experienced hackers (I would not use it as a government, banking 
or large corporate web server anyway), but it should no crash when some kid 
that happened to find the IP address fiddles around with the server. 
Anyway, not only in the internet environment and bad networks, but as a general 
design principle, a server should be robust against rude clients. Not every 
rude behavior is an attack.
U5: keep-alive mode is recommended for HTTP/1.1 and all browsers that I know 
use it (that my testclient used “connection: close” was just meant to 
simply the situation). Multiple requests reuse the same connection, therefore 
one cannot shutdown the socket after the first request. Regarding to this, I 
need to study if Ajax (http://en.wikipedia.org/wiki/Ajax_(programming)) 
actually uses the same connection for every request of opens a different one. 
Remember, the entire issue started with Ajax POST requests. In case they reuse 
the connection, Ajax is doing exactly what should not be done according to the 
above considerations: the second Ajax request may depend on the data received 
from the first Ajax request (it does not in the test, but often it will – 
depending on the web site content).
U6 means, that I must consider situations other than a recent desktop Windows 
and a major Linux distribution as well. There might be some differences in the 
TCP/IP stack implementations as well, and there are no books, mailing lists, 
experts, .. available on the details. In addition, one must try to keep 
everything “small”. 
U7 means one must not change any setting that affects the entire process. Of 
course one must not do anything that makes the process crash or hang. 

Ideally all of the above requirements are fulfilled at the same time. Is that 
unrealistic? I hope not. However, I am aware that some of them cannot be 
fulfilled by using TCP in the most friendly way possible. That is where my 
timeouts and ungraceful close after some seconds came from.

Original comment by bel2...@gmail.com on 8 Jun 2012 at 7:47

GoogleCodeExporter commented 9 years ago
I started to merge some of the fixes by ger.hobbelt into my clone. A first 
merge is available in 
http://code.google.com/r/bel2125-mongoose/source/browse#hg, but that work is 
not yet finished. I was not able to merge everything, because there are quite 
some differences, some functions are not available on all my platforms, and I 
simply did not yet study every difference. I am not sure if I already got the 
most important differences merged.

I tried the shutdown code from ger.hobbelt in mongoose, but not in testclient 
(a server should not rely on only having nice clients, see comment 36). Still, 
I added a shutdown call. 

With this merge I repeated the test, and found that there are still errors when 
using mongoose and testclient for 2000 byte POST requests to CGI, as well as 
when using the Ajax test with Internet Explorer 8. 

One of the documents cited in comment 33 mentions that the Apache web server 
disables Nagle on all sockets. Ignoring all warnings I just tried that and added

{int disable_Nagle = 1; setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, 
(char*)disable_Nagle, sizeof(disable_Nagle));} 

in mongoose.c (where I set the socket timeout). Doing that in the merged 
version did not fix the errors, neither for the testclient, nor for the Ajax 
POST test. However, it seems to work if both, mongoose and testclient disable 
Nagle for every socket (no error in a quick test). Anyhow, this is no fix 
because the reference is Internet Explorer, not testclient. 

Original comment by bel2...@gmail.com on 8 Jun 2012 at 8:09

GoogleCodeExporter commented 9 years ago
I think by now I merged all the essential parts of mongoose.c from 
gerhobbelt-mongoose-issue-360 into my clone. Unfortunately not all the errors 
are completely gone. There are still some errors with testclient (that I did 
not merge completly). But, much more important, the POST test issues with 
Internet Explorer/Trident and WebKit/QtBrowser are still there (see images).

Original comment by bel2...@gmail.com on 12 Jun 2012 at 10:12

GoogleCodeExporter commented 9 years ago
.. here are the images mentioned in comment 38

Original comment by bel2...@gmail.com on 12 Jun 2012 at 10:13

Attachments:

GoogleCodeExporter commented 9 years ago
I think I've nailed the POST issue as well now; at least it tests OK (success) 
with IE8/IE9/FF with mongoose built on Win7/64/MSVC2010, Win7/32/MingW and 
Ubuntu10/gcc4.

http://code.google.com/r/gerhobbelt-issue349/

POST issue consists of several issues, depending on where you are, 
platform-wise.

1) On Windows, you're basically screwed if the CGI interpreter is in a path 
with spaces in it (e.g. c:\Program Files (x86)\Lua\5.1\lua.exe). Fixed by 
properly quoting the commandline.

2) Any CGI run will fail when your mongoose + website and CGI interpreter are 
on different drives and document_root is a relative path (issue 364: fixed)

3) When CGI scripts are DOS format (CRLF instead of LF in the script file 
itself), Linux at least will SILENTLY fail: nothing to stderr, nothing to 
stdout.

4) The echo.cgi script ignores content data that's part of the request, 
resulting in 'broken pipe' as soon as you're going to forward that data to the 
script, because the script has already terminated since it found it had enough 
input from the HTTP headers alone. (you WANT forward_content_data() to 
rigorously check I/O operations, because that's what you want for PUT, but in 
case of a GET+content or POST+content and a CGI script that ignores the 
incoming anyway, you DON'T want the same code to send a 5xx response. Fixed.

5) When you've got a persistent connection and CGI scripts or NEW_REQUEST 
callback handlers or whatnot IGNORING the request content, i.e. not 
mg_read()ing ALL request content, than for 'large-ish and beyond' content sizes 
the discard function would horribly fail - as that one believed it only had to 
adjust the buffer. Fixed by simply letting mg_read() handle it; replicating the 
logic in mg_read() -- which will produce return value 0 when the request 
content has been read in its entirety -- was found to be too much hassle: it's 
subtle so replicating it elsewhere in the same code is begging for bugs. Fixed.

6) the test.html page would b0rk fatally with IE8/9 since IE doesn't define the 
'console' object/methods when you're not running the IE debugger (F12). Note 
for the wicked: once you're invoked the [F12] debugger once and closed it 
again, the console object remains available. Nevertheless, JS code now checks 
for a valid console instance before console.log()ging. Fixed.

7) To work, you need all the #360 fixes as well (which concern themselves with 
TCP connectivity in a HTTP scenario: graceful close et al). These fixes are all 
included in the issue349 clone.

Original comment by ger.hobbelt on 13 Jun 2012 at 7:10

GoogleCodeExporter commented 9 years ago
Hi

I try to run the issue349 clone against my tests provided in comment 18 
(http://code.google.com/p/mongoose/issues/detail?id=349#c18).

Unfortunatley i got a linker error until changed the usage of mg_cry(..) in 
mg_start(..) to cry(..) (line 5197 mongoose.c).

After that, the test are still failing either with error 12152 or when the 
assert in the user callback kicks in because mg_read(..) returned -1.

Note that these tests make no use of CGI and console.log(..).

Original comment by nullable...@gmail.com on 17 Jun 2012 at 8:50

GoogleCodeExporter commented 9 years ago
Weird; I must've done something wrong then in the last commits; going to fetch 
& run this on a new machine. Hold one...

Original comment by ger.hobbelt on 17 Jun 2012 at 9:55

GoogleCodeExporter commented 9 years ago
Confirmed; mg_cry-->cry edit applied. Tests from #18 (at least the #18 
test.html + ajax code, note IE8 corrections in there) are located in issue349 
clone in ./test/ajax/ subdirectory - tested just now for MSVC build: these fail 
when lua isn't in the expected (UNIXy) spot; run
  mongoose -I "'C:\Program Files (x86)\Lua\5.1\lua.exe'"
after installing LuaForWindows (5.1.4) for all test/ajax/test.html tests to 
succeed.

(note the single quotes! This is to allow -I to accept interpreter paths with 
embedded space and/or interpreter commandline arguments, without Windows cmd 
screwing it up for you.)

Will look into the C code included in #18 test.zip now...

Original comment by ger.hobbelt on 17 Jun 2012 at 11:14

GoogleCodeExporter commented 9 years ago
:-(  I notice that my ajax/test.html isn't your #18 www/test.html in that #18 
zip.

Original comment by ger.hobbelt on 17 Jun 2012 at 11:21

GoogleCodeExporter commented 9 years ago
Indeed its not the same. For my tests (of comment #18) i just copied 
mongoose.h, mongoose.c and mongoose_sys_porting.h of the issue349 clone into a 
copy of echo-v3.2 (from the tests.zip). So, i went the otherway round.

Original comment by nullable...@gmail.com on 17 Jun 2012 at 11:32

GoogleCodeExporter commented 9 years ago
confirmed the 12152 errors for the test.zip included test.html code for IE9. 
'Funny thing' is FF and the others don't error out.

Did some hair pulling (and renamed your test.zip:www/test.html to test_18.html 
and dropped it in my issue349 repo), found this one:

http://stackoverflow.com/questions/3731420/why-does-ie-issue-random-xhr-408-1215
2-responses-using-jquery-post

did some more testing with the /echo handler code ripped from your main and 
found:

1- with the handler code as-is, the issue of hanging POST requests and random 
12152 errors recur. Hence: issue reproducible.

2- Adding a 'Connection: close' in the response output 'fixes' the issue as per 
stackoverflow suggestion. But that's not good enough, of course ;-) because the 
explanation there is sound and pointing straight at IE+WinSock themselves. 
Network sniff confirms this: the 12152 items are 'correct' on the wire so the 
browser screws up.

3- Turns out that the 12152 which means something like 'malformed HTTP 
response' is IE9 saying they _demand_ a Connection: header: when the code is 
changed to send a 'Connection: keep-alive', suddenly IE9 succeeds always, at 
least on this machine. (Win7/64 with latest MS updates)

(BTW/OT: if you want the access log to log the 'correct' status, you might want 
to 'request_info->status_code = 200' in any custom event handler. Otherwise the 
access log is filled with lines listing their status code as '-1'.) 

issue349 clone to be updated shortly...

Original comment by ger.hobbelt on 17 Jun 2012 at 12:45

GoogleCodeExporter commented 9 years ago
issue349 clone and github repo 'master' branch updated. But take note: it gets 
weirder and weirder...

My issue349 hg clone ONLY WORKS OK when you always follow the stackoverflow 
approach, i.e. Connection: close. 

Meanwhile the main github master branch /seemed/ to work OK with the very same 
test_18.html (renamed from your #18 www/test.html) and Connection: Keep-Alive.

However, when hammering [F5] and the [test] button, I can only state that the 
12152 errors are gone, but you still get 'pending' issues in IE9 for arbitrary 
requests (including serving 'static pages' such as the jquery.js file).

Network sniffing shows that IE9 somehow opens connections without sending any 
request through it, so you get to see a mongoose error log about '[Error 580: 
Mongoose Internal Server Error or client closed connection prematurely]' then 
while IE9 waits for about 5 seconds and then retries - with success for the 
static pages.
The POSTs of course fail when this happens to them, as there's no retry 
mechanism there.

This is an IE9 (or at least IE*, haven't tested with IE8, etc. yet) specific 
issue, and now that the 12152 errors are gone over here, I still remain with 
those pesky (Pending...) issues which seem to be entirely client-side now.

I must be missing something very subtle about HTTP/1.1 keep-alive IE9 logic... 
:-S

Original comment by ger.hobbelt on 17 Jun 2012 at 1:38

GoogleCodeExporter commented 9 years ago
updated the issue349 clone after extensive testing with another Win7/64/IE9 
machine. 
Note that commit 
http://code.google.com/r/gerhobbelt-issue349/source/detail?r=2722e77ff55c642877b
79085976925b57e1749d2 is the last one exhibiting the IE9 12152/pending/timeout 
issue: you can toggle the #if 0 in the next commit there 
http://code.google.com/r/gerhobbelt-issue349/source/detail?r=21ff9988b9bd2d668c0
0af91df4afc47be51aaf0 to observe the issue going away (#if 0) and recurring 
(#if 01).

It boils down to this: when you send IE9 a response with Connection: keep-alive 
yet close the connection immediately afterwards, then IE9 will get confused and 
throw the 12152 tantrum at random intervals. When you keep the connection open 
to serve the next request like your Connection: header said you would, 
everything succeeds.... with a little snag: 
http://support.microsoft.com/kb/813827 : note the mention of the '1 minute' in 
there.
This (combined with any socket timeout set up mongoose=server-side) determines 
how long your threads will be locked up by that single IE9 browser: in my case 
I see a usage of 3..4 threads in parallel to serve a single IE9 browser, so at 
10 threads you can DoS a mongoose with a few IE9 browsers pretty darn quickly.

The solution to *that* problem would be to push back inactive (i.e. waiting) 
kept-alive connections onto the queue, but that's a work yet to be done. I hope 
to get it done in the issue349 somewhere today. Till then, that last commit 
should be the point to verify my tests, work and observations.

Original comment by ger.hobbelt on 18 Jun 2012 at 11:52

GoogleCodeExporter commented 9 years ago
Implemented and tested the 'push back idle connections so HTTP keep-alive yet 
idle connections don't occupy mongoose threads for a long time' solution in the 
github repo (master branch):
https://github.com/GerHobbelt/mongoose/commit/d76e6f13b3c1e93cb7cea0eb13ae253d2d
f5c45f
+
https://github.com/GerHobbelt/mongoose/commit/c92ace3916a1afe5293198462f2b524d46
d2084d

Tested on Win7/64 with IE9 and latest FF with bel2125 #18 test page and /echo 
server-side event handler (in the github repo these are test/ajax/test_18.html 
and examples/bel2125-issue349.c respectively): all tests succeed (and quickly! 
:-) ), even when you hammer the mongoose server.

Tested with both 1(!) and 5 thread mongoose servers, with keep_alive support 
enabled.

Will backport to the hg issue349 clone later.

N.B.: consequence of this 'test & swap' is twofold:
+ well-behaved browsers/clients with keep-alive don't occupy threads for a long 
time (~ 1 minute) any more, so all incoming connections / requests get served 
ASAP.
+ few threads can serve *many* connections in parallel: requests are served in 
Round Robin fashion (i.e. we all get served slower together when the load grows)
+ INIT_CONNECTION and EXIT_CONNECTION event callbacks happen when they should
+ you can bind user data to the 'request_info' record for a particular 
connection: it will survive idle periods of that connection.

- as a test&swap (think of it as a 'task switch') can occur after having served 
a request, do NOT expect subsequent requests on the same connection to be 
processed by the very same thread: this is by design as another interleaved 
request can be long-running all by itself, e.g. when it invokes special user 
code in the NEW_REQUEST event callback.

Naive user event callback implementations may suffer from the latter point as 
old assumptions won't hold.

Original comment by ger.hobbelt on 19 Jun 2012 at 2:17

GoogleCodeExporter commented 9 years ago
http://code.google.com/r/gerhobbelt-issue349/source/detail?r=2936d125aed5cc31244
c6e075435c84aa4a5f2cc

(Tested to compile and run with the MSVC2010 solution; test_18.html test passes 
as with the github version.)

Original comment by ger.hobbelt on 19 Jun 2012 at 2:43