ARMmbed / sockets

mbed sockets library abstraction layer
Other
6 stars 18 forks source link

Listening socket problems #35

Closed lws-team closed 8 years ago

lws-team commented 8 years ago

I can make the TCP echo listening app serve some http, using the current stuff from git on K64F. So that's encouraging.

However when this code is elaborated a bit, it acts differently depending on whether a browser is the connecting client (data is sent immediately on accept) or if I telnet to the port (no data is sent until I type something).

With the browser, ReadableHandler_t never gets called on the accepted TCP Stream socket. If I anyway try a recv(), there is no data waiting. The data packet from the browser is lost.

With telnet, no problem.

Since this service happens in interrupt context atm is it possible there is some kind of race between the accept() and setting the handlers? Any related "known problem" that can lead to losing the incoming packet?

bremoran commented 8 years ago

Which version of Sockets are you using? Socket handlers no longer execute in interrupt context. That issue was solved in version 0.2.0.

Can you provide a test case so that I can reproduce the failure?

lws-team commented 8 years ago

v1.0.2... you need to update the README.md then, it currently says

Event handling currently occurs in the interrupt context. In an upcoming release, there will be provision for event handling in the main loop.

I'm porting a big library at the moment (https://libwebsockets.org) so the code that I have to reproduce this is a bit big, although it's just one small yotta project (the test app) with the library as another yotta module project inside.

I already have a simpler test app that works... let me try adding delays in there and see if can show the same problem.

lws-team commented 8 years ago

Yes it is very easy to reproduce here anyway.

Clone this

https://github.com/lws-team/mbed3-dumb-http-test

and build for frdm-k64f-gcc

He gets an IP on DHCP and listen for anything on port 80. He responds with a canned string and then close the connection if it comes.

As he is, he works OK on browser or telnet.

However add the delay in

diff --git a/source/app.cpp b/source/app.cpp
index 3c58edc..b7cd5ec 100644
--- a/source/app.cpp
+++ b/source/app.cpp
@@ -43,7 +43,7 @@ protected:

        void onIncoming(TCPListener *s, void *impl)
        {
-//             volatile int n;
+               volatile int n;

                if (!impl) {
                        onError(s, SOCKET_ERROR_NULL_PTR);
@@ -55,12 +55,12 @@ protected:
                        return;
                }

-#if 0
+#if 1
                /* this loses the first data packet from the peer */
                for (n = 0; n < 1000; n++)
                        ;
 #else
-//             (void)n;
+               (void)n;
 #endif

                ts->setOnError(TCPStream::ErrorHandler_t(this, &Lws::onError));

and he can only work on telnet.

At least this is true on my setup, which is arm-none-eabi-gcc-cs bits from Fedora, gcc 5.2.

bremoran commented 8 years ago

Thank you for the additional information. I'll look into it.

ciarmcom commented 8 years ago

ARM Internal Ref: IOTSFW-1162

lws-team commented 8 years ago

I looked into what is happening here since it is blocking me. At least one of the ways this can go wrong is:

1) The TCP packet gets ACKed by the lwip layer on the wire

2) Socket::_eventHandler gets called for the packet on the accepted TCPStream, with SOCKET_EVENT_RX_DONE, as it should

3) However it occurs at a time before the TCPStream's _onReadable function pointer has been set. So he cannot let the user's callback know and the event is lost.

I can apparently work around that by

I'd send you a patch but I don't think you can accept patches... there might better be a deeper solution anyway but this lets me limp on.

lws-team commented 8 years ago

Actually here's the patch anyway, since I'm probably not the only guy with this problem

diff --git a/source/v0/Socket.cpp b/source/v0/Socket.cpp
index ec12be4..336e8dc 100644
--- a/source/v0/Socket.cpp
+++ b/source/v0/Socket.cpp
@@ -23,13 +23,14 @@ using namespace mbed::Sockets::v0;

 Socket::Socket(const socket_stack_t stack) :
     _onDNS(NULL), _onError(NULL), _onReadable(NULL), _onSent(NULL),
-    _irq(this), _event(NULL)
+    _irq(this), _pending_on_readable(0), _event(NULL)
 {
     _irq.callback(&Socket::_nvEventHandler);
     _socket.handler = (socket_api_handler_t)_irq.entry();
     _socket.impl = NULL;
     _socket.stack = stack;
     _socket.api = socket_get_api(stack);
+
     if (_socket.api == NULL) {
         error_check(SOCKET_ERROR_NULL_PTR);
     }
@@ -71,8 +72,10 @@ void Socket::_eventHandler(struct socket_event *ev)
             minar::Scheduler::postCallback(_onError.bind(this, ev->i.e));
         break;
     case SOCKET_EVENT_RX_DONE:
-        if(_onReadable)
+        if (_onReadable)
             minar::Scheduler::postCallback(_onReadable.bind(this));
+   else
+       _pending_on_readable = 1;
         break;
     case SOCKET_EVENT_TX_DONE:
         if (_onSent)
@@ -102,6 +105,13 @@ void Socket::setOnReadable(const ReadableHandler_t &onReadable)
     __disable_irq();
     _onReadable = onReadable;
     __enable_irq();
+    
+    if (_onReadable) {
+   if (_pending_on_readable)
+       minar::Scheduler::postCallback(_onReadable.bind(this));
+   _pending_on_readable = 0;
+    }
+       
 }
 void Socket::setOnSent(const SentHandler_t &onSent)
 {
diff --git a/sockets/v0/Socket.h b/sockets/v0/Socket.h
index 50e429c..d231254 100644
--- a/sockets/v0/Socket.h
+++ b/sockets/v0/Socket.h
@@ -254,6 +254,7 @@ protected:

     CThunk<Socket> _irq;
     struct socket _socket;
+    bool _pending_on_readable;
 private:
     socket_event_t *_event;
     /**
bremoran commented 8 years ago

There is a simpler solution. When you do set onReadable, you can immediately call stream->recv(). If there's data waiting, you will get the data, otherwise, you will see SOCKET_ERROR_WOULD_BLOCK. This might be the simplest solution.

I can see how this can be a problem, and I think the solution is probably to add the ability to bind handlers right into TCPListener::accept. In the short term, however, calling stream->recv until you get a SOCKET_ERROR_WOULD_BLOCK should solve the problem.

bremoran commented 8 years ago

Here's an example block which should work without needing to modify the socket library:

{
    mbed::util::CriticalSectionLock lock;
    socket_error_t err = SOCKET_ERROR_NONE;
    while(1) {
        char * buf[size];
        size_t len = size;
        err = stream->recv(buf, &len);
        if (err != SOCKET_ERROR_NONE) 
            break;
        process(buf, len);
    }
    stream->setOnReadable(mbed::sockets::v0::Socket::ReadableHandler_t(handler);
}
lws-team commented 8 years ago

Thanks... I can get a connect correctly one time with that... there are other problems elsewhere about subsequent connects, maybe just in my code.

However to make that work, the data had to wrongly be drained from the listen socket, not the accepted socket. So the underlying problem seems to a race in the library about to which "account" the RX packet is applied to.

That could have dire security implications (yes it's all prerelease...) it seems to me... if you connect with a live cookie but an attacker is spamming connects with no cookie, since the payloads from all connection attempts may due to this bug illegally all pile up at the same listen socket in order of reception, ie, race, the attacker connection may end up draining the wrong, authenticated HTTP headers with the live cookie on an accepted and workable stream.

lws-team commented 8 years ago

There is still something else going on, depending on I guess timing? uninitialized data?, the first sent packet after the connection is accepted can be acked but never turn up anywhere as an mbed event. This is chrome connecting to port 7681 where there is a webserver

12:00:23.678035 IP build.warmcat.com.58962 > lwipk64f.warmcat.com.7681: Flags [S], seq 711910855, win 29200, options [mss 1460,sackOK,TS val 3543135891 ecr 0,nop,wscale 7], length 0
    0x0000:  4500 003c 1354 4000 4006 a061 c0a8 02e9  E..<.T@.@..a....
    0x0010:  c0a8 02cd e652 1e01 2a6e e5c7 0000 0000  .....R..*n......
    0x0020:  a002 7210 8735 0000 0204 05b4 0402 080a  ..r..5..........
    0x0030:  d32f f693 0000 0000 0103 0307            ./..........
12:00:23.680581 ARP, Request who-has build.warmcat.com tell lwipk64f.warmcat.com, length 46
    0x0000:  0001 0800 0604 0001 f81a 67a6 dedf c0a8  ..........g.....
    0x0010:  02cd 0000 0000 0000 c0a8 02e9 0000 0000  ................
    0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
12:00:23.680595 ARP, Reply build.warmcat.com is-at 78:24:af:8f:a4:5d (oui Unknown), length 28
    0x0000:  0001 0800 0604 0002 7824 af8f a45d c0a8  ........x$...]..
    0x0010:  02e9 f81a 67a6 dedf c0a8 02cd            ....g.......
12:00:23.682409 IP lwipk64f.warmcat.com.7681 > build.warmcat.com.58962: Flags [S.], seq 6510, ack 711910856, win 2920, options [mss 1460], length 0
    0x0000:  4500 002c 0004 0000 ff06 34c1 c0a8 02cd  E..,......4.....
    0x0010:  c0a8 02e9 1e01 e652 0000 196e 2a6e e5c8  .......R...n*n..
    0x0020:  6012 0b68 d7ae 0000 0204 05b4 0000       `..h..........
12:00:23.682427 IP build.warmcat.com.58962 > lwipk64f.warmcat.com.7681: Flags [.], ack 1, win 29200, length 0
    0x0000:  4500 0028 1355 4000 4006 a074 c0a8 02e9  E..(.U@.@..t....
    0x0010:  c0a8 02cd e652 1e01 2a6e e5c8 0000 196f  .....R..*n.....o
    0x0020:  5010 7210 8721 0000                      P.r..!..
12:00:23.682574 IP build.warmcat.com.58962 > lwipk64f.warmcat.com.7681: Flags [P.], seq 1:405, ack 1, win 29200, length 404
    0x0000:  4500 01bc 1356 4000 4006 9edf c0a8 02e9  E....V@.@.......
    0x0010:  c0a8 02cd e652 1e01 2a6e e5c8 0000 196f  .....R..*n.....o
    0x0020:  5018 7210 88b5 0000 4745 5420 2f20 4854  P.r.....GET./.HT
    0x0030:  5450 2f31 2e31 0d0a 486f 7374 3a20 3139  TP/1.1..Host:.19
    0x0040:  322e 3136 382e 322e 3230 353a 3736 3831  2.168.2.205:7681
    0x0050:  0d0a 436f 6e6e 6563 7469 6f6e 3a20 6b65  ..Connection:.ke
    0x0060:  6570 2d61 6c69 7665 0d0a 4361 6368 652d  ep-alive..Cache-
    0x0070:  436f 6e74 726f 6c3a 206d 6178 2d61 6765  Control:.max-age
    0x0080:  3d30 0d0a 4163 6365 7074 3a20 7465 7874  =0..Accept:.text
    0x0090:  2f68 746d 6c2c 6170 706c 6963 6174 696f  /html,applicatio
    0x00a0:  6e2f 7868 746d 6c2b 786d 6c2c 6170 706c  n/xhtml+xml,appl
    0x00b0:  6963 6174 696f 6e2f 786d 6c3b 713d 302e  ication/xml;q=0.
    0x00c0:  392c 696d 6167 652f 7765 6270 2c2a 2f2a  9,image/webp,*/*
    0x00d0:  3b71 3d30 2e38 0d0a 5570 6772 6164 652d  ;q=0.8..Upgrade-
    0x00e0:  496e 7365 6375 7265 2d52 6571 7565 7374  Insecure-Request
    0x00f0:  733a 2031 0d0a 5573 6572 2d41 6765 6e74  s:.1..User-Agent
    0x0100:  3a20 4d6f 7a69 6c6c 612f 352e 3020 2858  :.Mozilla/5.0.(X
    0x0110:  3131 3b20 4665 646f 7261 3b20 4c69 6e75  11;.Fedora;.Linu
    0x0120:  7820 7838 365f 3634 2920 4170 706c 6557  x.x86_64).AppleW
    0x0130:  6562 4b69 742f 3533 372e 3336 2028 4b48  ebKit/537.36.(KH
    0x0140:  544d 4c2c 206c 696b 6520 4765 636b 6f29  TML,.like.Gecko)
    0x0150:  2043 6872 6f6d 652f 3438 2e30 2e32 3534  .Chrome/48.0.254
    0x0160:  372e 3020 5361 6661 7269 2f35 3337 2e33  7.0.Safari/537.3
    0x0170:  360d 0a41 6363 6570 742d 456e 636f 6469  6..Accept-Encodi
    0x0180:  6e67 3a20 677a 6970 2c20 6465 666c 6174  ng:.gzip,.deflat
    0x0190:  652c 2073 6463 680d 0a41 6363 6570 742d  e,.sdch..Accept-
    0x01a0:  4c61 6e67 7561 6765 3a20 656e 2d55 532c  Language:.en-US,
    0x01b0:  656e 3b71 3d30 2e38 0d0a 0d0a            en;q=0.8....
12:00:23.866982 IP lwipk64f.warmcat.com.7681 > build.warmcat.com.58962: Flags [.], ack 405, win 2516, length 0
    0x0000:  4500 0028 0005 0000 ff06 34c4 c0a8 02cd  E..(......4.....
    0x0010:  c0a8 02e9 1e01 e652 0000 196f 2a6e e75c  .......R...o*n.\
    0x0020:  5010 09d4 ef6b 0000 0000 0000 0000       P....k........
12:00:27.549990 IP build.warmcat.com.58962 > lwipk64f.warmcat.com.7681: Flags [F.], seq 405, ack 1, win 29200, length 0
    0x0000:  4500 0028 1357 4000 4006 a072 c0a8 02e9  E..(.W@.@..r....
    0x0010:  c0a8 02cd e652 1e01 2a6e e75c 0000 196f  .....R..*n.\...o
    0x0020:  5011 7210 8721 0000                      P.r..!..
12:00:27.553734 IP lwipk64f.warmcat.com.7681 > build.warmcat.com.58962: Flags [.], ack 406, win 2515, length 0
    0x0000:  4500 0028 0006 0000 ff06 34c3 c0a8 02cd  E..(......4.....
    0x0010:  c0a8 02e9 1e01 e652 0000 196f 2a6e e75d  .......R...o*n.]
    0x0020:  5010 09d3 ef6b 0000 0000 0000 0000       P....k........

I added printfs to the event handlers in Socket, TCPListener, TCPStream, the only events coming are SOCKET_EVENT_ACCEPT on the listener and SOCKET_EVENT_DISCONNECT on the accepted TCPStream when I stop the browser window. It's the same with wget.

But again if I connect by telnet (where the first data packet gets sent some time after the connect, when I typed it) and give it a HTTP get request, the correct events occur at the K64F and the HTML page is delivered into telnet.

However this same code was largely working in the browser over the weekend, with the "get the RX payload from the listening socket" method. So it seems the symptom can change according to ????

Edit: Maybe summarizing is a good idea. The outcomes I have seen are:

1) It works. It's always able to work with telnet where there is a long gap between connect and first RX packet sent. Sometimes it has worked with the browser directly, but that was a while ago now.

2) The first RX packet after accept wrongly appears on the LISTEN socket, NOT the accepted socket (subsequent packets are delivered correctly)

3) The first RX packet after accept is delivered to the RX event handler, but there is no event handler registered at that time and it is lost

4) The first RX packet after accept is ACKed on the wire, but appears in no Socket or higher level event handler and is lost

It's always related to first RX packet after accept, and timing is critical to reproduce a bad outcome. And what outcome you are going to get is "sticky", it will tend to do the same thing over and over.

lws-team commented 8 years ago

Guys... any news? I have ported my library but as a server library, broken listen support in the OS is kind of fatal...

bremoran commented 8 years ago

Here's what I know: the socket API does not have any mechanism by which data can appear on the listen socket instead of the receive socket. That could be a problem in a lower layer--it needs more investigation.

The LwIP-level handler for incoming data isn't installed until accept() has been called, so data shouldn't even be ack'd until accept() is called. This was the original reason for leaving out any special handling of receive events. LwIP installs a null recv handler when accept is called. The null recv handler acks data. That should be the root cause of the problem you are seeing.

bremoran commented 8 years ago

I would like to know if the PR I have just added fixes the problem you are seeing. The handler that LwIP installs in all new sockets does what I believe to be the wrong thing for this situation.

https://github.com/ARMmbed/sal-stack-lwip/pull/32

lws-team commented 8 years ago

Thanks a lot for looking at it.

It's hard to be immediately completely certain, because there was a spread of outcomes depending on the exact code in my app, but that patch seems to have done something very good.

I can remove the listen socket getting RX hack and I could reorder (delay) the accept in the listen accept handler to suit my code better, and he is accepting 2 x HTTP and 2 x WS connections okay.

So I think you nailed the whole thing there. I'll close it and reopen if there's any resurgence of funny business. Thanks again.

bremoran commented 8 years ago

The new version is now published and should be usable directly from the yotta registry.