Open ericchapman opened 8 years ago
Woah;) Quite elaborate. I can see this is a well thought out design that tries to bring as much OO to C as possible.
Going down that road ultimately leads to stuff like gobject https://en.wikipedia.org/wiki/GObject
I think this is overkill. My biggest issues:
And I have the feeling that going C OO without malloc doesn't really make sense. But maybe that's wrong.
"no malloc" is a must if we want to go to tiny devices. Everything lives on the stack, and yes, that has deep and far reaching consequences (eg hard, precompiled limits/buffers etc).
Probably we should first discuss the "no malloc" thing ..
Yeah, this design can pretty much give the developer every piece of OO that they need in C. I was curious where your thoughts were on how elaborate we should get.
Actually, the best thing to do here is probably to define what "tiny" is just so we have a baseline. I have implemented an entire TDMA RF protocol on a PIC16 with 2KB ROM and 200B of RAM and I didn't use malloc there for the reason you described above. No way that was fitting in 200B of data.
What device were you thinking? I saw in something I read "64KB" but wasn't sure if that was data or program and data.
@oberstet Only place I see where "malloc" would be useful would be for the linked lists used to implement "arrays" and "hashes" for the user code to create "args" and "kwargs". Alternatively we can make a "pool" of links that are "reserved" and then "returned" as needed and make the size of the pool a define so the user code can increase the number as their application needs it.
@oberstet I thought about it more and I think we can get away with static buffers and accomplish what we need to in order to keep this as simple as possible. I created the following sample header file to give us something to throw darts at
https://github.com/ericchapman/autobahn-c/blob/dev/lib/session.h
Do you still want the entire top layer of this thing to be like "autobahn.h" or was that more of a conceptual brain dump? What would you like the top layer to be? As you can see in "session.h" I think we can make this similar to the other libraries from an API perspective which I believe is desirable.
What device were you thinking? I saw in something I read "64KB" but wasn't sure if that was data or program and data.
Not set in stone, just a rough number. Cortex M4 devices seem to start at 48KB RAM.
Do you still want the entire top layer of this thing to be like "autobahn.h"
This, and the code in SPEC.md where just braindumps, yes. I often like starting something new by writing different flavors of an API "on paper" first (no working code yet).
I do think it would be nice if the user facing API would just be a single header (autobahn.h).
Regarding https://github.com/ericchapman/autobahn-c/blob/dev/lib/session.h - yes, I agree. If we can make the "look & feel" similar to the other Autobahn's, users would immediately feel at home (at least, they can directly translate concepts).
I think one crucial aspect is args
and kwargs
eg https://github.com/ericchapman/autobahn-c/blob/dev/lib/session.h#L57
WAMP is dynamically types, and C is statically typed. How do we approach that? How does that interplay with CBOR (the types that RIOT has for that)?
@oberstet Quick clarification on device. Is that 48 KB of RAM for "program and data" or just data? Most embedded devices I work with are, for example 128 KB of ROM (Flash) for program and then 32 KB of RAM for data. Just an important clarification when making tradeoff decisions.
As far as the top level header, I have seen some clever "namespace" like C implementation where you can actually expose your library at the top level using a "struct". This would allow us to combine the exposed methods into one single included header file. See here http://stackoverflow.com/a/28535585/462398
The "args" and "kwargs" aspect is interesting. What I was going to do was use a "void " for the object and then include a "char type" which would allow the receiving code to do something like this
typedef struct link_s {
struct link_t *prev;
struct link_t *next;
void *object;
char *type;
} link_t;
...
link_t *link = // some link
if (strcmp(link->type, "dict") == 0) {
dict_t *dict = link->object;
// Do Something
}
...
And then both my "list" and "dict" implementation would use those links. Let me look at CBOR and get back to you.
The other thing that keeps popping up in my head is the "no malloc". I understand the need for that requirement but I am having trouble figuring out how a JSON payload can be parsed from raw bytes and turned into "args" and "kwargs" without malloc? One thought I had was to map into the actual raw payload but for strings, there is no "null" terminator so that wouldn't work. You have to copy it somewhere to add the null terminator. The only other option is to have a pool of buffers that you could use, but that very quickly becomes your own custom "heap" with a "malloc" implementation. I looked at "Jansson" and that is using "malloc" for the exact problem I just described. I wrote a "link.h" and "link.c" here that is doing the static pool of links but it starts to feel more and more like I am re-creating "malloc". I pretty much made my own "heap" that can be resized at compile time. https://github.com/ericchapman/autobahn-c/blob/dev/lib/link.h https://github.com/ericchapman/autobahn-c/blob/dev/lib/link.c
@ericchapman I'll answer in multiple parts (there are some tricky questions here).
rgd memory
for example, this device is using this MCU.
512 KB Flash and 64 KB RAM or
256 KB Flash and 32 KB RAM with:
64K FlexNVM (MKW21D256 only) and
4k bytes of enhanced EEPROM/FlexRAM
I'm not sure how to interpret it. Can it run program code directly from Flash?
Here is the MCU fact sheet.
And here is the reference manual for that piece.
Memory is discussed on pp73 in the latter. But I still don't understand which memory can be used for running program code.
My assumption would be: program code does not have to be run from the 32/64kB SRAM necessarily, and if so, we could use the SRAM completely for data.
A proposal: lets make 32kB RAM data memory the minimal target for AutobahnC
The malloc, JSON, CBOR, args and kwargs questions are deeply interconnected.
I now think this is the most tricky one: the user facing API of the core library. In particular, how to represent args/kwargs to user code. It's harder than the system facing API.
And (if you agree about hardness), then I think we should first address this question. The hardest one first. Because depending on answers, it will have ramifications onto everything else inside the core library.
but I am having trouble figuring out how a JSON payload can be parsed from raw bytes and turned into "args" and "kwargs" without malloc
WAMP is dynamically typed, and if we want to uphold that for AutobahnC (no precompiled, statically types for user args/kwargs payloads, which would be possible too in principle), then AutobahnC must be ready to receive arbitrary args/kwargs.
Eg it must be able to receive an args = [1, 2, 3, .., N]
in an event handler, where N is not known at build time (if N would be fixed for an app, then we could do the precompiled statically types user payloads thing I hinted above .. but I think we should avoid that).
If the user callback for the event handler is supposed to be given the complete args array and N isn't fixed, then I can't see how to do that without a malloc like thing.
However, there is another approach: a streaming API for the args/kwargs user payloads. That is, the user event handler callback has additional user callbacks attached for reading arrays/dicts in a streaming like fashion. Similar to a SAX based XML API vs a DOM based XML API.
If we could make that fly, and more so, in a way that is "nice" for the user, then AutobahnC could in principle receive args/kwargs of arbitrary size.
I just stumbled across this: http://riot-os.org/api/group__sys__ubjson.html
Here:
oberstet@corei7ub1310:~/scm/3rdparty/RIOT$ find . -name "*.h" | grep cbor
./sys/include/cbor.h
oberstet@corei7ub1310:~/scm/3rdparty/RIOT$ find . -name "*.c" | grep cbor
./tests/unittests/tests-cbor/tests-cbor.c
./sys/cbor/cbor.c
oberstet@corei7ub1310:~/scm/3rdparty/RIOT$ find . -name "*.h" | grep json
./tests/unittests/tests-ubjson/tests-ubjson.h
./sys/include/ubjson.h
./sys/ubjson/ubjson-internal.h
oberstet@corei7ub1310:~/scm/3rdparty/RIOT$ find . -name "*.c" | grep json
./tests/unittests/tests-ubjson/tests-ubjson.c
./tests/unittests/tests-ubjson/test-ubjson-empty-object.c
./tests/unittests/tests-ubjson/test-ubjson-empty-array.c
./sys/ubjson/ubjson-write.c
./sys/ubjson/ubjson-read.c
Sidenote: adding UBJSON serialization to AutobahnPython (and hence Crossbar.io) would be trivial. Could be done in few lines of code, given a proper UBJSON Python library.
From looking at ./sys/ubjson/ubjson-read.c
: this does not use malloc. It's an event driven, SAX like API.
Comparing this to http://riot-os.org/api/cbor_8h.html, I don't get how the latter does it. Eg while there is cbor_deserialize_array and cbor_deserialize_array_indefinite, the former seems to require that you know the number and types of array elements before hand (which conflicts with the dynamic typing of WAMP), and the latter I don't get how to use it. Does this RIOT CBOR implementation support dynamically types payloads anyway?
From my view, JSON wire level support isn't that attractive for AutobahnC. CBOR is better. UBJSON might be fine too. And there is no reason to support multiple serialization formats in AutobahnC.
IMO, making AutobahnC XXX-only (where XXX is either CBOR or UBJSON) would be fine.
I'm also not sure if we really can achieve completely malloc-free implementation of AutobahnC. FWIW, this is what the RIOT docs have to say about memory.
Regarding the user API, my current view would be that we have to decide about the following:
@oberstet My answer in pieces
The MCU parameters can be interpreted as follows
You are interpreting it correctly and I agree with your 32 KB for data proposal. This is considered "small" in the ARM Cortex M4 domain. Let's capture this and close that part.
I agree this is the most difficult. I keep landing back on this every time i try to think about parsing the payload because as you pointed out, it bubbles up through the entire library.
Let's make some assumptions about this system and the answer will hopefully be clear assuming we both agree with the assumptions
Conclusion - We need to use a SAX like parser (option (A) (C)) and place the burden on the user code to store the things that it needs where it can use malloc if it desires. I think this is the only way we are going to fit the library into the footprint and support any message type coming in. SAX style parsers are a pain BUT this seems to make more sense vs. creating a DOM like parser in such a small memory footprint when the message payload is unknown. I feel it is too limiting to try and force the user to connect to a statically specified system and make them recompile if they introduce a new message type.
I would say we use UBJSON if it is trivial router side
It looks like the array length is actually a return from the method. http://riot-os.org/api/cbor_8h.html#aed22e8f18d3fa7409326f2c71892054a
size_t cbor_deserialize_array ( const cbor_stream_t * stream,
size_t offset,
size_t * array_length
)
size_t offset = cbor_deserialize_array(&stream, 0, &array_length);
array_length is an output since they are passing a pointer to it.
After thinking about it a little more, I think I came to a conclusion that might help us close all of this. The overall goal for this library should be lightweight given it is in raw C. Any system that has enough memory to start doing DOM parsing most likely supports embedded C++ which would be much more friendly for creating true "args" and "kwargs".
I would suggest we do the following from a milestone perspective
Once we complete this, I think 1 of 2 things will happen
My gut is telling me there won't be a need for DOM in raw C given the reasoning for using raw C.
You are interpreting it correctly and I agree with your 32 KB for data proposal. This is considered "small" in the ARM Cortex M4 domain. Let's capture this and close that part.
Agreed.
So we target devices with at least 32KB RAM for data, and we target to use only 25% = 8KB (which makes sense) of that for AutobahnC.
I would say we use UBJSON if it is trivial router side
Agreed.
I will add UBJSON support to AutobahnPython and Crossbar.io.
Any system that has enough memory to start doing DOM parsing most likely supports embedded C++ which would be much more friendly for creating true "args" and "kwargs".
Agreed. That's also my view. The whole point of AutobahnC is to go where C++ and Linux can't. And no-malloc, no-TCP are 2 aspects in this.
My gut is telling me there won't be a need for DOM in raw C given the reasoning for using raw C.
Yep. Agreed.
I also agree to the milestone plan you've outlined. Makes sense.
I'll add the UBJSON to AB/CB, and try to come up with a "UDP fuzzing bridge" (randomly dropping/reordering UDP datagrams).
This is exciting;) Cool to see how our discussing is progressing ..
@oberstet Agreed on the exciting part and how the discussion is progressing :). I am always excited to get my embedded stuff out (I have a scope/logic analyzer and signal generator) and I have been wanting WAMP on those devices since I started using Crossbar a year or so ago.
I think the only thing really left other than code reviews and things that come up is the interface when making calls to the session and what that interface looks like given we are using SAX serialization/deserialization. Give me a day or two to play around but I think I can come up with something that makes sense and feels right.
@ericchapman alright, UBJSON added to Autobahn (https://github.com/crossbario/autobahn-python/pull/639). Will do Crossbar.io later ..
@ericchapman I think this might touch on the discussion here https://github.com/crossbario/autobahn-c/issues/8
@oberstet About SAX, so I realized with going with a SAX like deserializer that this also means we will most likely want a SAX like serializer. This means that "args", "kwargs", and "options" creation are going to need to be callbacks so that the user code can generate the payload. Do you agree?
@ericchapman I have thought about that (serializer) also. A callback based serializer would be necessary - even only sending 1 string requires that, if the underlying stack has fixed, limited buffers. This, combined with the fact that the underlying network protocol (UDP, even with full 64k) is limited, will make things really complex IMO. Because if we'd allow a single WAMP message to cross multiple UDP datagrams, the full unreliable/unorder UDP aspect kick in - even for a single WAMP message. I think this is .. a research project on it's own.
Coping with retransmission/reordering of WAMP messages (but whole WAMP messages) is already stretching things. But adding frag/reas to that .. it's way too ambitious I'd say.
So right now I am leaning towards: on such WAMP transports, there is a (possibly negotiated) maximum size for WAMP messages. Eg 1000 octets.
Note that we have progressive call results at the WAMP level, and can extend that to progressive calls. So that would compensate for above limitations in a way ..
@ericchapman UBJSON support has landed on AutobahnPython and Crossbar.io trunk:
HTTP/1.1 101 Switching Protocols
Server: Crossbar/0.13.2
X-Powered-By: AutobahnPython/0.13.1
Upgrade: WebSocket
Connection: Upgrade
Sec-WebSocket-Protocol: wamp.2.ubjson.batched
Sec-WebSocket-Accept: m3vvAYqSx9sxQyjf1EV+7q53r5o=
Example: https://github.com/crossbario/crossbarexamples/tree/master/serializers
Given the complexities we will face with creating a WAMP stack in C, I would like to propose an OO style of coding to use that gives us as much OO capabilities as possible so we can cleverly use inheritance throughout the stack. The goal is to support as many OO features as we can without making the code difficult to develop/maintain. Here I will use an example of a "base" class and then create a "shape" and a "circle" using inheritance (ignore minor syntax issues and coding style)
base.h
base.c
shape.h
shape.c
circle.h
circle.c
main.c
output
Using this technique will provide inheritance, polymorphism, overloading, and access to "super" methods. It does not provide encapsulation (which is not possible in C). It also provides ways to add "is_kind_of" and "is_instance_of" functionality to allow more advanced coding patterns (as shown above).
The only downsides are