Autobahn C Transport Layer Discussion

ericchapman commented 8 years ago

Hey @oberstet. I thought allot about this and wanted to describe the issues I see (just to make sure we are on the same page) and make a design suggestion.

Writing a library in C that is reusable is very interesting because you have so many different environments that it needs to support and there is no standard API with how the language interacts with the system. That is all environment dependent. Lets give a few examples for clarity

C running in Linux - Include the "pthread" and you have threading and mutex capabilities
C running on device with RTOS (such as free RTOS) - threads, semaphores, and queues are available to synchronize between threads, interrupts, etc
C running from "int main()" - Only main thread and interrupts are available. Need to use a timer interrupt to create events (and at that point you are starting to make your own scheduler)

As one can see, this poses many issues when trying to write a library that is reusable across the 3 environments because there is no standard way to interface the system.

The current implementations of WAMP use event loops of some sort but that requirement is really only introduced because the transport layer requires it, so it almost gets pushed up the entire stack as a requirement. Architecturally the current libraries are defined as follows

    +------------------------------+
    |          Connection          |
    +------------------------------+
    +----------------++------------+
    |   Transport    ||  Session   |
    +----------------++------------+

The connection basically ties together the transport and the session. Once the transport is established, the user code interfaces with the session directly. One interesting thing about this design is that technically, the session itself does NOT need the event loop in any way. If the user makes a call, it will ask the transport to send a packet. If a packet is received, it will process the packet (internally) and that may equate to some method callback or sending another packet. Regardless, the session does not need to schedule events, only the transport does. The session only needs callbacks when the state of the transport changes.

That all being said, I have created systems like this before where I actually had the application layer (the session essentially) send and receive over multiple transports at the same time (would have loved WAMP here. I probably half wrote it then ;)). For example, I was running the exact same application code in the following setups

ARM Cortex <--> UART <--> Host PC
ARM Cortex <--> Audio Port <--> iOS
ARM Cortex <--> BT <--> iOS
ARM Cortex <--> Socket (external WIFI) <--> iOS
ARM Cortex <--> RF <--> ARM Cortex
ARM Cortex <--> UART <--> ARM Cortex
etc.

You get the gist. One interesting thing is that I actually had the EXACT same library code running on the ARM Cortex, iOS, and the Host PC since they all support C/C++ (it was a fun project). Allot of these setups were also one-to-many connections. So I basically had the same code running on top of all sorts of different flavors of transport in all sorts of different environments.

Anyways, the way I was able to get it so they worked like Lego blocks was to not try and make it so the blocks just blindly connected together. I instead clearly specified all of the interfaces and then left it up to the integration code to tie the pieces together. The integration code for this system running on ARM Cortex with FreeRTOS (interrupts, semaphores, queues, waits, etc) looked allot different than it running in iOS (callbacks) BUT the libraries themselves did not need to be modified in any fashion. The application layer was well tested and once the connection and transport logic was written, it "just worked".

Another issue I was having was that the system threads in an Embedded project are defined at the system level and not at the library level so my library could not make any assumptions about threads or mutexes since it needed to be environment agnostic. I ended up abstracting those into callbacks where the integration code could provide a "lock/unlock" mechanism if desired so I could have a TX and RX thread running simultaneously by sharing the same mutex.

The other and probably larger issue is that even if it is the same protocol (regardless what it is), the implementation of it is going to look different between a Freescale, TI, Silicon Labs, etc SoC even if they are the same processor. And allot of times, the TX/RX device can be off chip and there is some serial interface controlling it.

All of that being said, from my experience, my suggestion would be to not focus on the transport or connection inside the Autobahn C library (for now) but rather define a very clear interface and documentation for the session and then leave it up to the implementor to connect the dots. I would totally agree we can make examples of using it over different interfaces in different designs but I would not try to include that logic in the library itself. There are just too many environments in the C language since you are "at the metal" so to speak and no way to really account for all of them.

oberstet commented 8 years ago

I think this is probably the most important issue/discussion design wise. At least, a crucial one.

Regarding clean abstractions, and decoupling from the endless diversity of different environments in the C and embedded world that you touch in above, here are my thoughts.

A WAMP session essentially can be modeled as an (extended) state machine, where the state transitions are either

triggered by user code performing an action on the session (eg publish()) and accompanied by the serialization/sending of a WAMP message on the transport (associated with the session) or
triggered by the transport when a WAMP message is received/deserialized and accompanied by user code being called (on_event())

This state machine is the core of a WAMP library. The transport is not. In this perspective, the library state machine is actually a "passive element" - as it is always driven by library external code: either user code (for sending) or transport implementation code (for receiving).

Following this, we need 2 interfaces: user code facing, and transport abstraction facing. And defining the syntax and semantics of these 2 interfaces for C (with the side requirements .. pure C, no malloc, etc) is the main design challenge we face.

Let's look at this in detail. For 1., the WAMP library needs to use a transport implementation provided function

void send_raw_bytes_message(transport_t *transport, const unsigned char* data, size_t len);

and the implementation of send_raw_bytes_message() is outside the core of the library, as it deeply depends on the specific environment. Different implementation of send_raw_bytes_message can still be part of the library (eg Posix socket, serial, ..).

The library also needs to expose a function that is used by a transport implementation in turn:

void consume_raw_bytes_message(session_t* session, const unsigned char* data, size_t len);

I am still thinking about if that would be enough in the face of "no malloc" ..

oberstet commented 8 years ago

Another point: the state machine being driven by 2 sources (user code or transport code) isn't enough. There needs to be a third source of driving state transitions: timeouts.

The library core needs a way to set up a timer

void setup_timer(session_t *session, float timeout, int timer_id);

and the timer needs a way to call back into the library core

void on_timeout(session_t *session, timer_id int);

(or similar using function pointers or what)

oberstet commented 8 years ago

Essentially, this line of thinking means we would create a "library core" thing, that interfaces with the multitude of different environments merely via 4 functions:

send_raw_bytes_message
consume_raw_bytes_message
setup_timer
on_timeout

The implementations of above would abstract away the diversity of environments. The library core itself can be pure C, with no dependencies whatsoever.

@ericchapman What do you think?

One more note: threads. No. Threads suck in general, and are not needed in this case. Without threads, there is no need for locking, and hence mutexes, semaphores, etc are non-issues. Threads might be needed in a transport implementation or outside the library core in certain situations, but the library core itself doesn't need. A single session is supposed to be used by a single (same) thread only.

ericchapman commented 8 years ago

@oberstet That all makes sense. I had thought of the "user" and "transport" triggers but forgot about the timeout one.

If we can pass raw byte structures down, and receive them coming back up, and it is a single thread, then this should be enough. Only reason I bring up single thread is because that ensures the packet is processed before "consume_raw_bytes_message" could be called again. This will also eliminate the need for queues inside the library at all if all 3 interfaces are executing in the same thread.

oberstet commented 8 years ago

@ericchapman Regarding the 2 API surfaces of the core library (the user code facing, and the one facing downwards .. network/timers):

I do think that it might be good to incubate these in a real environment. And I would like to propose RIOT for that.

Here is what I tried: https://github.com/crossbario/autobahn-c#sending-over-udp-between-two-riot-nodes

RIOT can run on "native" (=Linux), instead of a real device, which probably allows us to iterate faster in the beginning.

Here is a proposal for next steps: https://github.com/crossbario/autobahn-c#milestones

Other stuff in RIOT I'd like to give a shot: the CBOR support they have, and timers.

oberstet commented 8 years ago

Regarding this idea: "Create a fuzzing UDP bridge that can drop, delay and reorder UDP datagrams".

This could be created in Twisted/Python quite easily. It would allow us to explore different ways of how to map WAMP to an unreliable, unordered, limited size datagram transport (as 6LoWPAN/UDP is).

When we have that, we need a router that talks that of course. Naturally, I'd like to add that to Crossbar.io;)

The testbed would then look:

RIOT node <== WAMP-over-6LoWPAN/UDP ==> Crossbar.io <== WAMP-over-XXX (existing transports) ==> any WAMP client

ericchapman commented 8 years ago

@oberstet Oh good. I hadn't looked at RIOT yet but that would be great to emulate it without real hardware. That will definitely increase development time. I second that proposal.

I was thinking the same thing of adding 6LoWPAN (or whatever else we use) as another transport to Crossbar.io the same way it already supports Web Sockets and Raw Sockets.

oberstet commented 8 years ago

Yep, adding another transport to Crossbar.io. Twisted has UDP v6 support (which is required for 6LoWPAN).

FWIW:

I do care about open, IETF and in particular IP-based standards (that is 6LoWPAN)
I don't care at all about proprietory things like ZigBee. A non-goal, a won't go.
I almost don't care about semi-open standards like Bluetooth - there is a thing called "IP over Bluetooth (IPoBT)" though

ericchapman commented 8 years ago

@oberstet Sorry for the delay. I had some time to think about this.

Retransmission

The more I thought about this, the more I think I agree we should pull this up into WAMP itself like you were speculating. I fear that if we make some custom layer between WAMP and 6LoWPAN, we will be greatly affecting portability between environments. It would basically lead to creating another specification all together.

Also from my experience with UDP, ~99% of the packets still make it meaning you would leave it up to the user and their application to decide if they need to even implement that advanced profile feature (assuming you would make it advanced profile).

Args/Kwargs

I think I figured out an idea for this. If we create a size limitation of the WAMP packets (make it a define in the library), we can then create a buffer that is maybe 1.5x that size and use that to dynamically allocate the values of the args/kwargs. So we would create "list" and "dict" structures that are basically linked lists, have a "type", and a pointer to a value. We allocate the links from a static pool of links and we allocate the values from that static buffer.

If the user's requirements are such that they need more links or a bigger buffer, they just tweak the defines to their application.

I think this gives us the best of both worlds, no malloc/free, DOM like parsing, and user control to make the library as light weight or as heavy as they desire. We would be sort of making our own malloc and heap but I think it is light weight enough that it doesn't bother me at all and isn't much overhead.

oberstet commented 8 years ago

@ericchapman Hi Dave, np;) I was "in the flow" doing some urgent cleanups and stuff - but now.

Retransmission: .. we should pull this up into WAMP itself .. custom layer .. creating another specification all together

Yes, I agree. That was also my gut feeling. We would invent some ad-hoc reliable UDP protocol. My thinking was: if we need timers / state machinery for say RPC timeouts (because the actual callee isn't responding, not that the transport has lost something), then why not expand it there so we can cope with lossy datagram transports too.

So let's take this approach? Means: make it a "decision" (which is a major one). I am +1

If we create a size limitation of the WAMP packets (make it a define in the library), we can then create a buffer that is maybe 1.5x that size and use that to dynamically allocate the values of the args/kwargs.

This is an interesting angle! So you say, no matter what's inside a WAMP message, if the serialization is at most N bytes, then we can get away with 1.5 x N bytes memory on top for parsing it out into a DOM like thing?

oberstet commented 8 years ago

And further, what you suggest still allows for arbitrary payloads (modulo the size limitation) without any schemas and precompile stuff involved, right?

To round up this discussion, I just now remember this:

This is yet another approach. It does employ schemata and a generate/compile step, but it provides:

... allow you to add new fields to a schema over time, without breaking backwards-compatibility. New fields will be ignored by old binaries, and new binaries will fill in a default value when reading old data.

The gain is zero-copy. The downside is schemas + generate/compile of course.

ericchapman commented 8 years ago

@oberstet I created this library to give you an example of what I was talking about above

https://github.com/ericchapman/c_st_objects

It basically uses a static buffer and keeps allocating objects/variables from it. Then it destroys all of them on one shot. The good news is that the size of it is defined by the developer so they can adjust it accordingly for the types of payloads that they expect. It is also super fast since the malloc/free operation is just moving a pointer in the buffer.

crossbario / autobahn-c

Autobahn C Transport Layer Discussion #6

Retransmission

Args/Kwargs