cesanta / mongoose

Embedded Web Server
https://mongoose.ws
Other
10.93k stars 2.7k forks source link

[Windows] 'unit_test.exe' failure #1605

Closed gvanem closed 2 years ago

gvanem commented 2 years ago

The unit_test.exe program always fails for me. Running as set V=5 & unit_test.exe I get this output:

-> [9876543210,7]
...
3cab2d23 3 sock.c:301:write_conn        4 0x2b8 29:0 29 err 0
3cab2d33 4 sock.c:608:mg_mgr_poll       1 -- tchrc
3cab2d43 4 sock.c:608:mg_mgr_poll       5 r- tchrc
3cab2d52 3 sock.c:291:read_conn         5 0x2b4 0:0 29 err 0
3cab2d62 4 http.c:660:uri_to_path2      5 /foo/version.h -> ./src/version.h 3
3cab2d81 4 sock.c:608:mg_mgr_poll       4 -- tchrc
3cab2d91 4 sock.c:608:mg_mgr_poll       1 -- tchrc
3cab2da0 4 sock.c:608:mg_mgr_poll       5 -w tchrc
3cab2db0 3 sock.c:301:write_conn        5 0x2b4 129:0 129 err 0
3cab2dcf 4 sock.c:608:mg_mgr_poll       4 -- tchrc
3cab2def 4 sock.c:608:mg_mgr_poll       1 -- tchrc
3cab2dfe 4 sock.c:608:mg_mgr_poll       5 -- tchrc
3cab2dfe 4 sock.c:608:mg_mgr_poll       4 r- tchrc
3cab2dfe 3 sock.c:291:read_conn         4 0x2b8 0:0 129 err 0
3cab2dfe 3 net.c:163:mg_close_conn      4 closed
3cab2e0e 4 sock.c:608:mg_mgr_poll       1 -- tchrc
3cab2e0e 4 sock.c:608:mg_mgr_poll       5 r- tchrc
3cab2e0e 3 sock.c:291:read_conn         5 0x2b4 0:0 -1 err 0
3cab2e0e 3 net.c:163:mg_close_conn      5 closed
3cab2e0e 4 sock.c:608:mg_mgr_poll       1 -- tchrc
FAILURE test/unit_test.c:2270: cmpbody(buf, expected) == 0

Seems the cause is DOS line-ending of src/version.h. If I first do a dos2unix src\version.h, it passes. Can the unit_test be suited for this case? Or are we expected to have git always convert files to Unix line-endings? core.autocrlf=false or something?

The 2nd problem is on the file test/data/київ.txt which also has a DOS file-ending. But my dos2unix program is unable to convert it.
A change to core.autocrlf=false and git pull after a check-out does not modify such files. Ref. https://github.com/atom/github/issues/1830

What to do with these things?

cpq commented 2 years ago

@gvanem those unit tests are run on linux under wine. Maybe that's the reason? But regardless where they run, the version.h is a repo file. Maybe your git client under windows automatically converts line endings? Can you download version.h from the repo as a binary, and compare to your local version.h ?

gvanem commented 2 years ago

Maybe your git client under windows automatically converts line endings?

It did. But now I've set core.autocrlf = false in my %APPDATA%/.gitconfig. So all is good with these file tests.

Only trouble is with -DMG_ENABLE_IPV6=1 since my ISP (Telenor) sux at their IPv6 setup. No IPv6 connections possible ATM. So with -DMG_ENABLE_IPV6=0, the failure now is at:

FAILURE test/unit_test.c:739: errored == 7

What is so special with "tcp://127.0.0.1:55117"?

cpq commented 2 years ago

On boxes without IPv6, do make clean all IPV6=0

Nothing special about 55117, just some random ethemeral port that is unlikely to have any TCP listener. Used to test "connection refused" error.

gvanem commented 2 years ago

Aha. But I get no ev == MG_EV_ERROR in the call-back. Even when I increase the for-loop to 500.

gvanem commented 2 years ago

I feel that mg_iotest() should have a test for an exception fd_set to detect this situation.

cpq commented 2 years ago

Could you elaborate more on your thinking about mg_iotest please? What compiler do you use - is it VS 22 ? Can you run the test with the log level 4 and share the output please? (you can set log level 4 by setting environment variable V=4)

gvanem commented 2 years ago

Could you elaborate more on your thinking about mg_iotest please?

I did not get an error connecting to 127.0.0.1:55117 in ev_data even with 500 loops. I was thinking mg_iotest() should be more like:

   FD_SET(FD(c), &xset);
   ...
   if ((rc = select((int) maxfd + 1, &rset, &wset, &xset, &tv)) < 0) {
   ...
    if (FD(c) != INVALID_SOCKET && FD_ISSET(FD(c), &xset)) {
      char buf[50];
      mg_error(c, "error connecting to %s",
               mg_straddr(&c->rem, buf, sizeof(buf)));
    }

Not sure if it's related to the Windows Firewall or the ephemeral port-range in Windows.

is it VS 22 ?

Yes.

A set V=4 & unit_test.exe is attached.

gvanem commented 2 years ago

With this patch, all 946 tests passes (but -DMG_ENABLE_IPV6=0):

--- a/src/sock.c 2022-06-28 10:39:35
+++ b/src/sock.c 2022-06-29 13:28:35
@@ -539,12 +539,13 @@
 #else
   struct timeval tv = {ms / 1000, (ms % 1000) * 1000}, tv_zero = {0, 0};
   struct mg_connection *c;
-  fd_set rset, wset;
+  fd_set rset, wset, xset;
   SOCKET maxfd = 0;
   int rc;

   FD_ZERO(&rset);
   FD_ZERO(&wset);
+  FD_ZERO(&xset);

   for (c = mgr->conns; c != NULL; c = c->next) {
     c->is_readable = c->is_writable = 0;
@@ -552,10 +553,11 @@
     if (can_read(c)) FD_SET(FD(c), &rset);
     if (can_write(c)) FD_SET(FD(c), &wset);
     if (mg_tls_pending(c) > 0) tv = tv_zero;
+    FD_SET(FD(c), &xset);
     if (FD(c) > maxfd) maxfd = FD(c);
   }

-  if ((rc = select((int) maxfd + 1, &rset, &wset, NULL, &tv)) < 0) {
+  if ((rc = select((int) maxfd + 1, &rset, &wset, &xset, &tv)) < 0) {
 #if MG_ARCH == MG_ARCH_WIN32
     if (maxfd == 0) Sleep(ms);  // On Windows, select fails if no sockets
 #else
@@ -569,6 +571,11 @@
     c->is_readable = FD(c) != INVALID_SOCKET && FD_ISSET(FD(c), &rset);
     c->is_writable = FD(c) != INVALID_SOCKET && FD_ISSET(FD(c), &wset);
     if (mg_tls_pending(c) > 0) c->is_readable = 1;
+    if (FD_ISSET(FD(c), &xset)) {
+      char buf[50];
+      mg_error(c, "error connecting to %s",
+               mg_straddr(&c->rem, buf, sizeof(buf)));
+    }
   }
 #endif
 }
cpq commented 2 years ago

Pushed https://github.com/cesanta/mongoose/commit/017c7290d6fd0ed54c5618c204d0bc4071c56e9c

Please verify on your side.

gvanem commented 2 years ago

Please verify on your side.

Almost. Only if I increase the time to detect this:

for (i = 0; i < 100 && errored == 0; i++) mg_mgr_poll(&mgr, 10);

it works. Not sure why Winsock's select() is so slow responding with a eset FD-bit for this connect(). Approx 2 seconds. The same happens with curl:

c:\> timer & curl -4v http://localhost:55117 & timer
* Trying 127.0.0.1:55117...
* connect to 127.0.0.1 port 55117 failed: Connection refused
* Failed to connect to localhost port 55117 after 2046 ms: Connection refused
 Elapsed: 0:00:02,13

Perhaps using WSAPoll() would improve this?

cpq commented 2 years ago

I guess you can try: make clean vc22 MG_ENABLE_POLL=1

The vc22 target uses VS2022 build under wine, and run unit test exe under wine. Executes fast - maybe on native windows it is different. Can this slowness be a windows firewall protection against port scanners? What if you temporarily disable the firewall?

gvanem commented 2 years ago

I guess you can try: make clean vc22 MG_ENABLE_POLL=1

Huh? A -DMG_ENABLE_POLL=1 is not for MSVC:

src/sock.c(519): error C2057: expected constant expression
src/sock.c(519): error C2466: cannot allocate an array of constant size 0
src/sock.c(519): error C2133: 'fds': unknown size
src/sock.c(521): warning C4034: sizeof returns 0
src/sock.c(537): warning C4013: 'poll' undefined; assuming extern returning int

What if you temporarily disable the firewall?

No difference.

Issue with the unit_test.exe is probably it's too impatient; there is no way to determine there is no listener unless one wait an appropriate amount of time for the possible server to respond. A time-factor determined by RTT and what not tcp/ip Registry settings. Then there is things like ICMP Port Unreachable that other OS'es probably uses to determine much faster that's there's no listener on that port. Tricky on Windows.

cpq commented 2 years ago

@gvanem hm.. The TCP stack must respond immediately with RST if there is no listener. RTT kicks in when a retransmittion is required due to e.g. packet loss - but in case of no listener, it should be an immediate RST back (unless afirewall blocks it). UDP should respond with UDP port unreachable. There must not be any delays or anything.. Can you run a wireshark on that port to see what is happening? Re: windows: you're right, windows was not able to use poll. Until the latest commit. Now it can! Fetch it and try. Though, it does not pass the test ..

cpq commented 2 years ago

And now windows tests with poll pass, too.

make clean vc22 TFLAGS=-DMG_ENABLE_POLL=1
gvanem commented 2 years ago

Can you run a wireshark on that port to see what is happening?

My WinPcap cannot capture localhost traffic. Should need NPcap for that, but that's still buggy; resent BSoD reports.

And now windows tests with poll pass

I do not see that:

-> [9876543210,7]
fb725a8 3 net.c:209:mg_listen           1 0x278 http://LOCALHOST:12358
fb725a8 3 net.c:186:mg_connect          2 -1 http://LOCALHOST:12358
fb725a8 3 sock.c:371:mg_connect_resolve 2 0x27c -> 100007f:17968 pend
fb725b8 4 sock.c:624:mg_mgr_poll        2 -- tChrc
...
fb72ae8 3 net.c:163:mg_close_conn       2 closed
fb72ae8 4 sock.c:624:mg_mgr_poll        1 -- tchrc
FAILURE test/unit_test.c:2273: fetch(&mgr, buf, url, "GET /a.txt HTTP/1.0\n\n") == 200

All calls to WSAPoll() returns WSAEINVAL. So worse than before AFAICS.

gvanem commented 2 years ago

Ok, I figured WSAPoll() does not like the POLLERR event. And it needs at least 1 sec. to detected a no-listening port. So with these patches:

--- a/src/sock.c 2022-07-04 09:41:50
+++ b/src/sock.c 2022-07-04 15:27:03
@@ -522,7 +522,7 @@
   nfds_t n = 0;
   for (struct mg_connection *c = mgr->conns; c != NULL; c = c->next) n++;
   struct pollfd *fds = (struct pollfd *) alloca(n * sizeof(fds[0]));
-  if (n > 0) memset(fds, 0, sizeof(n * sizeof(fds[0])));
+  if (n > 0) memset(fds, 0, n * sizeof(fds[0]));
   n = 0;
   for (struct mg_connection *c = mgr->conns; c != NULL; c = c->next) {
     c->is_readable = c->is_writable = 0;
@@ -532,7 +532,9 @@
       ms = 1;  // Don't wait if TLS is ready
     } else {
       fds[n].fd = FD(c);
+#if MG_ARCH != MG_ARCH_WIN32
       fds[n].events = POLLERR;
+#endif
       if (can_read(c)) fds[n].events |= POLLIN;
       if (can_write(c)) fds[n].events |= POLLOUT;
       n++;

--- a/test/unit_test.c 2022-07-02 14:26:10
+++ b/test/unit_test.c 2022-07-04 15:29:09
@@ -739,7 +739,7 @@
     // Test connection refused
     int i, errored = 0;
     mg_connect(&mgr, "tcp://127.0.0.1:55117", eh9, &errored);
-    for (i = 0; i < 10 && errored == 0; i++) mg_mgr_poll(&mgr, 1);
+    for (i = 0; i < 100 && errored == 0; i++) mg_mgr_poll(&mgr, 10);
     MG_INFO(("errored: %d, expected: 7", errored));
     ASSERT(errored == 7);
   }

all 950 tests works here. (still MG_ENABLE_IPV6 == 0).

cpq commented 2 years ago

Opps w.r.t. sizeof ! BTW, what's the issue with the POLLERR ?

gvanem commented 2 years ago

BTW, what's the issue with the POLLERR ?

I would assume it is for an output revent only, since the the docs has this:

The WSAPOLLFD structure must only contain a combination of the above flags that are supported by the Winsock provider. Any other values are considered errors and WSAPoll will return SOCKET_ERROR.

And POLLERR is not among these AFAICS.

cpq commented 2 years ago

Thank you. Updated the code and the unit test

gvanem commented 2 years ago

Updated the code and the unit test

Tried it. Works fine; all 1051 tests succeeds. But the discussion has strayed too far from the original issue. Hence closing this (an opening another).

gvanem commented 1 year ago

I was hit by this silly line-ending issue once again. Can something like this please be added at top of main() (in test/unit_test.c): system ("dos2unix.exe --quiet src/version.h");