ARMmbed / mbed-os

Arm Mbed OS is a platform operating system designed for the internet of things
https://mbed.com
Other
4.66k stars 2.97k forks source link

Socket should provide information about what state change caused an event #3620

Closed marcuschangarm closed 6 years ago

marcuschangarm commented 7 years ago

Description

In the socket callback function:

    /** Register a callback on state change of the socket
     *
     *  The specified callback will be called on state changes such as when
     *  the socket can recv/send/accept successfully and on when an error
     *  occurs. The callback may also be called spuriously without reason.
     *
     *  The callback may be called in an interrupt context and should not
     *  perform expensive operations such as recv/send calls.
     *
     *  @param func     Function to call on state change
     */
    void attach(mbed::Callback<void()> func);

https://github.com/ARMmbed/mbed-os/blob/master/features/netsocket/Socket.h#L157-L168

it would be useful if either the callback passed the state change as a parameter or a separate function was available to read the state change that generated the callback in the first place.

This would allow the application to infer what commands the network stack is ready for next.

marcuschangarm commented 7 years ago

@sg- @geky @0xc0170

geky commented 7 years ago

Hi @marcuschangarm, looking at what options may be available. Access to the socket state has been a requested feature, but would add additional requirements to socket implementations, so it may be difficult to introduce a new api.

What are thoughts on adding these two functions? (based on existing functions in the serial api, comparable to the MSG_PEEK flag in bsd sockets)

/** Check if data is available to be recieved from the socket
 *  @return True if a recv would return data immediately
 */
bool readable();

/** Check if it is possible to send data to the socket
 *  @return True if a send would return data immediately
 */
bool writeable();

These functions would need a few limitations to be feasible. The readable/writeable functions would not be callable from an interrupt context, which would prevent a full asynchronous api driven by interrupts, but it would allow a user to implement an asynchronous api when paired with an event loop.

hakanernam commented 7 years ago

Functions readable/writable would be useful, but still knowing the reason for the callback is very valuable.

marcuschangarm commented 7 years ago

@hakanernam I agree. The callback could pass an enum of states as argument, one of which could be NOT_SUPPORTED for backwards compatibility.

@geky I see where you are coming from, and a set of functions like that would be useful. I would add more functions though, for example bool isConnected() and int getLastError() for checking connectivity and reading the latest error reason.

bogdanm commented 7 years ago

Agreed with the posts above. readable and writeable are certainly useful, but not enough for this. What if, after getting an event, both readable and writeable are true? What if the event is unrelated (for example, a "connect complete" or "remote disconnect" event)? Also, in the absence of an event type, The callback may also be called spuriously without reason pretty much makes it impossible to keep a sane state on the application side.

hakanernam commented 7 years ago

And I was thinking of exactly "remote disconnect". when commenting. Being notified if(when, more likely) remote side disconnects is quite necessary for a robust program.

geky commented 7 years ago

Hmmm...

My concern is with the overhead to provide context for the events, which would be present for all socket instances. However, with the readable/writeable (maybe closed?) functions in the implementation layer, it wouldn't be unreasonable to provide an asynchronous api like the one @marcuschangarm is proposing.

What are thoughts on providing an asynchronous api as a seperate set of socket classes?

/** Socket event enum
 */
enum nsapi_event {
    SOCKET_EVENT_CONNECTED,
    SOCKET_EVENT_SEND,
    SOCKET_EVENT_RECV,
    SOCKET_EVENT_CLOSED,
    SOCKET_EVENT_ERROR,
};

class AsyncTCPSocket : public TCPSocket {
    /** Register a callback on state change of the socket
     *
     *  The specified callback will be called on state changes such as when
     *  the socket can recv/send/accept successfully and on when an error
     *  occurs. The callback may also be called spuriously without reason.
     *
     *  @param func     Function to call on state change
     */
    void attach(Callback<void(nsapi_event_t)> func);
};

Also ccing @hasnainvirk, @kjbracey-arm who may be interested in this discussion

kjbracey commented 7 years ago

Feasibility of implementation is the main concern here. You must remember, this is an abstraction layer.

I agree absolutely, it would be nice to have a working poll() (the generalisation of readable/writable), and I'm working in this area for devices (FileHandle-likes).

The documentation of the callback you see above is not an arbitrary choice. It is a very tightly-constrained lowest-common denominator. We are 100% confident can be implemented on top of every possible network stack.

We are 100% confident because a valid implementation is "call it 10 times a second using a timer". That's a valid fallback.

I am not confident that we can accurately poll readability/writability on all possible network stacks (remembering we have to cover ESP8266-class devices).

And this is before you even consider other things like connect success, connect failure, sticky error conditions (eg ICMP errors to UDP connected sockets), etc etc etc.

Basically, the fine details of the event system is a massive can of worms. And there is no industry standard equivalent here to follow the model on.

The existing widely used model is BSD/Linux's SIGIO. And that's exactly what this is.

In BSD sockets you don't have a whole extra event API that has to cover all the complex cases. You just use the normal socket calls that already exist to interrogate the socket.

The expected model is:

event_handler:

SIGIO: trigger the event handler to run. Nothing else.

Simple as that. The state machine advances due to the socket calls, which are indirectly triggered by the wakeups from SIGIO. You do not directly do something because of the SIGIO.

Test your code by substituting the alternative dumb implementation - call your callback handler 10 times a second, disconnecting from the stack, and check it still works.

Now, I am looking at a system-wide poll() for FileHandles, and potentially integrating sockets with FileHandles to let it work for them, but my current confidence level that we can achieve that with all network stacks is low.

(What you get is actually already an extension, in that you get a socket number. SIGIO doesn't even give you that. The socket number is a substitute for lack of poll(), but increases the number of false positives - eg ESP8266 signals every socket simultaneously, due to lack of info available).

kjbracey commented 7 years ago

On the error reporting - you don't need a separate event API. The expectation is that any error condition wakes you with SIGIO.

Then you make a socket call because of that, and that then returns an error code (either a sticky one, or because the socket is now in a wrong state for that call).

Standard existing network stacks do not provide error callbacks - they do the above, and we have to implement NSAPI on top of them.

A higher-level transport layer, like http, coap, whatever, could certainly provide a higher level of aysnchronous error callbacks, but the socket abstraction layer is kind of stuck working like a standard socket layer, if it is going to be implementable on arbitrary stacks.

geky commented 7 years ago

I'm actually fairly confident (in threaded context, given enough time, not in the next minor release), we could add support for readable/writeable. At the very least I think it is a worthwhile goal to work towards in order to add support for poll-like mechanisms that Kevin points out to integrate with the filesystem api.

I should also clarify that adding async classes only require readable/writeable. The async classes can be constructed by binding an event loop to a socket, and driving a poll-like mechanism how Kevin describes.

It may be a crutch, but the async design is more familiar to users from event-based systems such as nodejs. It could be an additional feature of mbed OS to provide these apis (though it is a low priority).

And, the biggest benefit to me, is that the work to introduce async classes can come in parallel to the existing blocking/poll-based apis. Inevitable mistakes can be made during development of that layer without breaking the base socket apis.

kjbracey commented 7 years ago

If what's being proposed is specifying and implementing something fancier than any conventional system provides, I'm wary.

I personally expect people using socket interfaces to be familiar with standard ones, and using stackoverflow if necessary to learn how to use BSD sockets.

Creating a custom asynchronous socket API feels like a significant documentation and correctness burden.

But I certainly would like to implement poll() at least. But that won't give you error callbacks.

hasnainvirk commented 7 years ago

Lesson of the day. Do not invent more complicated system than POSIX has already invented :D. I think select functtionality like poll() along with readable() and writable() can serve the purpose if sockets are dealt with as file handles. I am working with @kjbracey-arm to extend existing FileHandle. Kevin is actually the architect of the extension plan. From my vantage point, I can see that the extension leading to sockets being treated as file handles for a myriad of stacks would not be a trivial task. Doable, yes, in theory. Would certainlly be a nice/valuable/soothing endeavour.

kjbracey commented 7 years ago

I can see that the extension leading to sockets being treated as file handles for a myriad of stacks would not be a trivial task.

If we were in control of the stack(s) and their internals, it would be relatively straightforward. The trick is doing it via an abstraction layer, based solely on whatever public API stacks give us.

Porting NSAPI onto, say, Linux, we'd be able to do all sorts of asynchronous trickery in user space, using SIGIO and non-blocking poll calls. (This is how glibc does asynchronous I/O calls like aio_read, I understand). But we'd have to do a serious review of what is possible in other stacks.

Just to cite one example - I know for a fact Nanostack does not provide enough API to implement readable()/writable(). We could extend it, yes, but that wouldn't be possible for deficient third-party or off-chip stacks.

My feeling is we will may well end up with integrated Socket/FileHandle but full functionality including poll() will not work reliably and efficiently on all stacks (or potentially even at all). Which rather limits the usefulness. It would encourage people to write application code that only works on a subset of platforms.

On the other hand, maybe we can just raise the bar for stack implementations - tell vendors that if you can't provide this more advanced control, we can't support you. Does feel like a significant philosophical shift from the current NSAPI though, which is effectively totally portable to any likely stack target.

hakanernam commented 7 years ago

I also believe in simplicity, but as they say "simple as possible, but no simpler". So I would feel raising the bar a little for stack implementations would be an very good step in the right direction. Discussion above mentions posix, and that leaves one with the impression nsapi has posix functionality, which I believe not the case.

My use case is an MQTT client based on mbed IBM MQTT example. After quite bit of refactoring I adapted to my needs. I still need to go a long way to consider it robust. Example:

I use eclipse mosquitto mqtt broker. When shut down msquitto with CTRL-C, all clients connected to broker ( mqtt-spy for example (https://github.com/eclipse/paho.mqtt-spy) ) understand the disconnect and react immediately. MBED TCPSocket get the disconnect too, and sends an event. (Two of them, actually). Even though I see it when I am connected with Terminal emulator, there seems to be no way to react to it since reads and other stuff generate the same event. I can not do a a socket write because it would go to mqtt broker/server and it must close the connection if what I send is not a valid MQTT message. I could send PING message, but then every message I get would also cause a PING (which has its own response, so not a solution). Trying to read is not good either, because then I may read part of a valid message just to see if we are still connected. For this I belive there is a PEEK option in Unix sockets when doing a read. Which we do not have.

So the half solution grudgingly implemented is shorten the keepalive period to 30 sec or so. So Which is only a stopgap since We will have many such devices, flooding the network with PING/PINGRESP messages. Obviously this is the preferred solution for half open connections, but in the case above, where socket is closed normally, Mbed board(K64F) realizes this. It just has no way to inform us via TCPSocket.

I apologize for the long comment. Tried to shorten as much as I could.

hakanernam commented 7 years ago

I personally believe strongly that having the philosopical shift from current NSAPI to a bit more functional network stack abstraction would be correct.

We selected MBED platform a long time ago over the likes of Arduino for serious work exactly because it was powerful and not feel like toy. I love it, but with a simple project I sometimes just use an Arduino and get over it. My broken microwave control card, for example. A simple circuit, an Atmega 328P-PU programmed with another Arduino board, two relays, two push buttons and a buzzer. Beatiful simplicity. Bu I would not put it into a serious customer project with its single UART, extremely limited interrupt capacities, no lack of ethernet controller, etc, etc.

So especially for people like me who set the minimum platform to be Cortex-M3 due to its ethernet capability, having a reasonably powerful system/stack is very important. And it seems quite logical to expect vendors to be able to support this reasonable level of minimum acceptable functionality.

Would it be too much of a burden on vendors to provide a more robust stack when having "MBED Enabled" label is becoming a badge of honor with many people? And also next question is, with many current vendors to choose from, why should we as users should/have to settle something less "good" as standard just because some small number of them (which we may never use anyways) cannot deliver something satisfactory on such an important feature as networking?

marcuschangarm commented 7 years ago

I agree with @hakanernam 's assessment. We have to set the bar high if we want to build better products.

hasnainvirk commented 7 years ago

@marcuschangarm Ofcourse we should. That's why a poll() method is being proposed. Its a lot better than semaphores based counters.

kjbracey commented 7 years ago

@hakanernam: "Trying to read is not good either, because then I may read part of a valid message just to see if we are still connected. "

I don't actually understand the problem here. You should just have one piece of code that reads from the socket, in response to events.

If there's data, it will read it. If the socket has been closed cleanly, it will read 0 (end of stream). If it's been reset, then you will get an error return. Simple.

You shouldn't have two pieces of code trying to read from one socket.

I agree peek might be generally useful, but it doesn't seem like the answer here.

kjbracey commented 7 years ago

@hakanernam: Think about how you would handle a connection using a thread and blocking socket calls. You'd just loop calling read() until you got 0 or an error, and never look at the callbacks. That would be simplest, if you're happy to rely on the RTOS.

If you want to be event-based rather than having a dedicated thread, you do basically exactly the same thing:

// note can't attach to socket callback directly
// - using EventQueue to defer to callback to thread would be appropriate
my_event_handler() 
{
     for (;;) {
          ret = socket.read(); // non-blocking
          if (ret > 0) {
                 process();
           } else if (ret == 0) {
                 // orderly end of stream - close session (and break)
           } else if (ret == EWOULDBLOCK) {
                 // no more data (for now) - just break to stop processing until next callback
                 break;
           } else {
                 // error - report and close session (and break)
           }
     }
}
geky commented 7 years ago

I think this discussion still has a bit to go, but given the current focus of mbed OS, I'm not sure when we will be able to resolve this issue (with either poll/peek or a proper attach function or both). In the meantime I'm going to a make a few tweaks to help reduce confusion with what the current functionality provides.

ghost commented 6 years ago

GitHib issue review: Closed due to inactivity. Please re-file if critical issues found.