Using Tinq for ROS 2.0 - Githubissues

vmayoral commented 10 years ago

@brunodebus and @mouse256, trying to use this way to reach you (e-mail didn't work so far):

This is Víctor Mayoral Vilches, working at OSRF. Esteve contacted you a while ago showing our interest in Qeo and Tinq. I am the person pushing forward the ROS 2.0 development on the embedded/deep-embedded side and we are considering using Tinq's code for this purpose.

Our goal is to use a minimal DDS implementation in plain C (ideally ANSI C11) that can run in bare-metal (cortex-m3, -m4, -m0, msp430, ...) so we can't assume a POSIX interface.

So far we have the following architecture in mind:

The Transport abstraction layer will hook up the embedded network stack (probably device dependant) with the DDS.
We are hoping that the DCSP and RTPS layers can be provided by the tinq-core will provide once it has been stripped down from the POSIX networking and RT dependencies.
The ROS client nano library API will abstract the users with a rossified API of the underlying DDS.

How does this sound to you? Would it be fine if we contributed to tinq-core to make it deep-embedded capable? If so, maybe you could clarify to me the following aspects:

As tinq-core is coded right now, it assumes it's built on top of POSIX, right? How feasible is it to remove this dependency and provide an interface that allows the networking stack to be adaptable? (POSIX could be one option)
I see that currently, libdds.so is 3.5 MB. We are targeting something around 200 KB. Do you believe it's possible stripping it down to this size?
Is the real-time implementation depending on librt and if so, could this be bypassed easily?

After having spent some time looking at the code I admit that it's a quite amazing piece of work. I think it'll be great to use Tinq as a fundamental piece of software for embedded robotics.

bramstes commented 10 years ago

Hello, I'm going to try to get you in touch with another member of the original Qeo development team he's the main developer/designer behind the DDS layer and will be able to provide you with the correct answers. He's currently enjoying some well deserved holidays but I'll inform him and ask to join.

bramstes commented 10 years ago

The plain C requirement should be ok but there're dependencies on POSIX although things like thread, lock, socket, memory ... creation/handling/management have their own abstraction layers inside the tinq/qeo dds code (with implementations for linux, windows, possibly mac) With regards to size it will depend on which features you will need it was originally designed to be able to run on dsl gateways and stb'es. 200KB might be ambitious but our colleague will be able to tell more.

vmayoral commented 10 years ago

Hey @bramstes thanks for the quick answer. We are really excited about the chances of using Tinq. Although we'd like to make it as deep-embedded as possible we are not so strict about the 200 KB. We probably can get started with a less ambitious aim. Let's say 500 KB.

It'll be important to identify which are the POSIX dependencies and how could we bypass them into minimal implementations. It's specially important to make sure that RTPS is working without this dependencies.

I'll look forward for your input. Meanwhile i keep diving into your code.

brunodebus commented 10 years ago

Guys, I'm currently on holidays with limited access to the internet...(this and next week). But I like the idea of trying to get a subset of the code running on more constrained devices... I will try to look at the proposal ASAP... Op 6-aug.-2014 20:35 schreef "bramstes" notifications@github.com:

The plain C requirement should be ok but there're dependencies on POSIX although things like thread, lock, socket, memory ... creation/handling/management have their own abstraction layers inside the tinq/qeo dds code (with implementations for linux, windows, possibly mac) With regards to size it will depend on which features you will need it was originally designed to be able to run on dsl gateways and stb'es. 200KB might be ambitious but our colleague will be able to tell more.

— Reply to this email directly or view it on GitHub https://github.com/brunodebus/tinq-core/issues/7#issuecomment-51377322.

kriver commented 10 years ago

With the default setup on my 64-bit Linux libdds.so measures up to 4.4Mb unstripped, but only 823Kb stripped. When disabling security it goes down to 3.3Mb and 634Kb respectively. So depending on what you want in or out a goal of 500Kb may be feasable. (Haven't build ARM binaries yet.)

jvoe commented 10 years ago

Hi guys. As Bram already mentioned, we don't really have Posix-specific dependencies. The main dependencies on OS functions are all abstracted (see dds/src/co/*). Even threading/locking can be mapped to something other than pthreads. Actually, I once ported the DDS part to the Windows OS, using native Windows functions for every OS primitive that was required. An initial version of DDS didn't even use threading at all, but I think it would be difficult to get that variant running again. As to librt we only use the clock_gettime() function. The main reason why libdds is quite big is due to the fact that a lot of functionalities are taken along by default: the security framework, the X-types framework, and a lot of other stuff. Security can be stripped out completely, X-types can be replaced with the original type library, and other optimizations can be done by removing specific compile-time flags (removing a lot of tracing/debugging, fragmentation support, typecode exchange support, etc). I'll be glad to help if you have specific questions on how to do all this.

vmayoral commented 10 years ago

@jvoe @kriver @brunodebus and @bramstes, thanks a lot for your inputs.

Let me summarize the conclusions stated so far:

The size limitation should not necessarily be an issue. @kriver already informed about 634 Kb (without security) and we probably can further strip it down.
All OS dependencies (RT and networking) are abstracted under dds/src/co/ which makes it easier to port it to a different platform.

According to this facts, moving forward with Tinq's dds code seems a great solution. My next step will be to create a minimum prototype with FreeRTOS as the underlaying system.

I've selected FreeRTOS for it's popularity, weird/relaxed GPL license, and because of its portability. FreeRTOS has ports for many devices and is extremely lightweight.

The prototype should include:

dds chat
tinq dds
FreeRTOS

jvoe commented 10 years ago

@vmayoral, I had a quick look at FreeRTOS and i see a few issues porting Tinq DDS to it: The first issue is a licensing one. Standard FreeRTOS doesn't have UDP support, unless the FreeRTOS+UDP package is used. This one is freely downloadable but has a different licensing model from the vanilla FreeRTOS license. The latter (FreeRTOS) relaxes the GPL license so that other software (commercial or other) can be added on top of it without having to be disclosed. The former (FreeRTOS+UDP), however, does enforce a pure GPLv2 so that all software linked to it becomes GPLv2. I don't know whether this is a problem for ROS 2.0? Other networking components (TCP, TELNET, HTTP, FTP, etc) are not free and require a hefty fee to be used. Luckily, TinqDDS only requires UDP to function if only DDS is to be used. The other issue i see is that the FreeRTOS+UDP software doesn't include poll() support, i.e. only select() is implemented. This would require a change to the dds/src/co/sock.c since it is currently assumed that poll() is available on target systems. Not a big issue since the rest of DDS uses the functions in sock.c and this change is quite easy to do. Anyway, just my 2 cents .. Jan Voet

vmayoral commented 10 years ago

Hey @jvoe! Thanks a lot for the follow up.

With regard the licensing aspect, as you point out, at the OSRF we can't accept a GPL. FreeRTOS offers a relaxed GPL and after reviewing it we consider that it should be sufficient for our users. Regarding the networking stack, you're right. We won't be using the default FreeRTOS+UDP but lwIP which has a nice BSD license and is easily integrated in FreeRTOS.

I haven't fully dived yet on lwIP yet but i saw that it has a udp_recv function that allows to pass a callbac. I'd say that this function is implemented through polling so this might satisfy the polling requirement that you mentioned.

Anyhow it's really great hearing that the interfaces are hackable :). I'm coding a first prototype at https://github.com/vmayoral/ros2_embedded. Hopefully next week i'll come back with feedback.

vmayoral commented 10 years ago

@jvoe @kriver @brunodebus and @bramstes, after the having spent the last week working on freeRTOS (that provides a minimal layer of pseudo-threads called tasks) and a minimal UDP implementation on top of it (coded by us, we discarded lwIP) we are ready to start porting the Tinq's interfaces to our embedded target (STM32F4).

Since it's a non-POSIX environment (C99) i've spent some time trying to figure out (besides dds/src/co that @jvoe already mentioned) what should be ported. So far i've concluded that:

Everything under dds/src/co should be coded for the embedded target as pointed out by @jvoe
the dds/src/trans should also be specialized for the target
dds/src/dynip

A few questions if you allow me:

Does Tinq rely heavily on the sql (dds/src/sql) or can we just drop it from the compilation. Will this break Tinq?
How relevant is the dds/src/co/sock.c code? It seems to me like a scheduler interface.
How many threads does approximately Tinq's DDS implementation require?

jvoe commented 10 years ago

@vmayoral As to your questions:

The src/sql directory contains the bytecode compiler/interpreter for SQL-like filter expressions as used in DDS FilteredTopics as well as DDS QueryConditions. If you don't use filters it could be removed from the source code, I guess. But I would check first with the size tool to see how big this piece of code really is, before trying to remove it and all the code around it. It might lead to a lot of work for a relatively small gain.
the sock.c is a small layer on top of the file descriptor and socket code of an OS, abstracting the Posix poll() or select() functionality. The rest of the code uses this layer to dynamically add/remove file handles and sockets to the poll() (or select()) set of file descriptors. The poll()/select() functionality is checked from the core thread of DDS (see src/dds/dds.c) where the real scheduling occurs. In there, time-outs are checked as well as scheduling the rest of the internal DDS tasks. When there is nothing to do anymore, the sock poll()/select() functionality is called with an argument (derived from the list of active timers) specifying how long this function may block, waiting for file/socket events. If there are ready file descriptors (read/write/connect/accept), this is indicated as an internal DDS event bit. If there are DDS events detected, these are then processed by calling relevant internal DDS functions.
The current DDS implementation uses only 2 threads. The core thread (as described before) and the application thread (when the process started).
The DynIP layer is responsible for determining IP address changes and IP interfaces going down/up. On Linux machines, this abstracts the netlink functionality. On Windows/OSX/iOS/BSD, other mechanisms can be used.
For src/trans, I would simply not enable the non-relevant blocks by not specifying compile-time options. Examples: DDS_IPV6 determines support for IPv6, DDS_SECURITY and DDS_NATIVE_SECURITY determine whether you want the software security enabled (DTLS or TLS). DDS_TCP for support of TCP, etc. Check the Makefile of the DDS library, or even the apps/* Makefiles, where for the latter various sets of compile time options are used. It would be a lot of work to rewrite the whole IP interfacing infrastructure, I'm afraid. Making Posix-like read/write() variants on top of your IP networking code shouldn't be too hard.
A lot of functionality in dds/src/co is generic and doesn't need to be ported. Some are very system-specific however, such as sys.[ch], sock.[ch], thread.[ch], ipc.[ch]. Easiest approach would be adding code surrounded by #ifdefs, as already done to differentiate between Linux/OSX/iOS/Android/FreeBSD/NetBSD/Windows versions.

Let me know how it's going. This is exciting work and I hope you don't encounter too many difficulties. I'll be glad to answer more questions as you proceed.

vmayoral commented 9 years ago

@jvoe @kriver @brunodebus and @bramstes,

After struggling some time with FreeRTOS and its low level complexities we decided to take a step back and evaluate other RTOS options so we spent the last weeks putting together three prototypes:

After having evaluated the three of them we decided to select NuttX for three main reasons:

POSIX (pseudo) interface
BSD socket interface
Well documented and supported

I'm working now on full speed at https://github.com/vmayoral/ros2_embedded_nuttx which looks quite promising (specially interesting is the NuttX shell NSH which can be hacked to include DDS inspecting capabilities).

I ported sys and sock so far and I'm working on ipc. Semphores is something we will need indeed in the embedded world however we are not so sure about the shared memory functionality. In general purpose operative systems we understand that several processes sharing memory can lead to a more efficient performance however as we conceive ROS 2.0 for embedded systems, we can't think of an scenario requiring it.

Is shared memory a big pilar within Tinq's code? Is there an alternative way of interprocesses communication implemented (maybe a slower one?).

Would love to hear your opinion in this matter.

bramstes commented 9 years ago

Typically most of the time we used unix domain sockets to do ipc on our gateways but jvoe and Bruno are better placed to answer what is supported in dds (i remember shared memory is something we discussed at some point a.o.in the context of avoiding discovery info dupplication but don't remember us actually implementing it).Since you say the BSD socket interface is supported on nuttx, Unix domain sockets could be an option. But like I said I'd prefer jvoe and/or Bruno to answer :-).CheersBram.-------- Oorspronkelijk bericht --------Onderwerp: Re: [tinq-core] Using Tinq for ROS 2.0 (#7)Van: Víctor Mayoral Vilches Aan: brunodebus/tinq-core Cc: bramstes @jvoe @kriver @brunodebus and @bramstes,

After struggling some time with FreeRTOS and its low level complexities we decided to take a step back and evaluate other RTOS options so we spent the last weeks putting together three prototypes:

ros2_embedded_nuttx ros2_embedded_freertos ros2_embedded_riot After having evaluated the three of them we decided to select NuttX for three main reasons:

POSIX (pseudo) interface BSD socket interface Well documented and supported I'm working now on full speed at https://github.com/vmayoral/ros2_embedded_nuttx which looks quite promising (specially interesting is the NuttX shell NSH which can be hacked to include DDS inspecting capabilities).

I ported sys and sock so far and I'm working on ipc. Semphores is something we will need indeed in the embedded world however we are not so sure about the shared memory functionality. In general purpose operative systems we understand that several processes sharing memory can lead to a more efficient performance however as we conceive ROS 2.0 for embedded systems, we can't think of an scenario requiring it.

Is shared memory a big pilar within Tinq's code? Is there an alternative way of interprocesses communication implemented (maybe a slower one?).

Would love to hear your opinion in this matter.

—Reply to this email directly or view it on GitHub.

brunodebus commented 9 years ago

Hi,

IPC is only used to make sure multiple dds instances on the same system get a unique participant id. The partiticipant id is part of the guid (it makes the guid unique on one system) and it is used to choose which ports will be used by this participant for communication.

You can compile the code with the define NOIPC to turn of this code (we use this on android where we also cannot use posix ipc). Basically, the code will also work, but not as efficient: it generates a random partiticipant id, and tries to open the corresponding ports. If that fails it will restart and try the next participant id.

So you can try with NOIPC first. If you will ever run a lot of participants on the same device ( not the idea I think :-) ) we can figure something else out.

vmayoral commented 9 years ago

@brunodebus and @bramstes,

Thanks for your answers. Trying it with -DNOPIC for now seems like a great idea since we want to get a prototype running ASAP.

I'm fighting a bit with the security implemented in DDS. My current configuration is available here. Is there any other flag that you recommend to remove(or add)?

Since security is nicely integrated in the code and there's no official support for openssl in NuttX i've decided to omit security to the maximum point for now. I'm hand excluding code like:

#include "openssl/ssl.h"
...
#include "dds/dds_security.h"

Let me know if you have any comments on this matter please.

vmayoral commented 9 years ago

Instead of continuing that path which was a bit senseless because it was breaking all of your work i just droped sec_CSRCS and splug_CSRCS from the Makefile (https://github.com/vmayoral/ros2_embedded_nuttx/commit/4d65f25711904cf9ddd0a0e00667cc4da669045e).

vmayoral commented 9 years ago

@jvoe @kriver @brunodebus and @bramstes,

It seems we are getting way closer :+1:. The current prototype with NuttX is providing pretty good results on the integration side. Let me share with you my two current blockers:

As discussed by @jvoe before, in NuttX there's also the problem of the poll()/select() primitives that are not implemented for UDP. This is the point in the code to be addressed. Speaking with @nuttx he suggested these two options:

Implement UDP packet buffering, or

Create your one UDP listen thread that waits of recvfrom() all of the time and buffers UDP packets in memory where they can be read and informs some other thread that the data is available (perhaps via a signal).

I'd say it shouldn't be too hard to code something here that uses recvfrom() calls, stores the data in some buffer and returns as soon as there's data available. Could you advise if this is the best way to fix this issue?

The size of the binary is just too big. We are currently these flags to compile Tinq's DDS.

Currently the linker is complaining (i'm trying to build the DDS chat example):

arm-none-eabi-ld: /home/victor/Dropbox/OSRF/ros2_embedded_nuttx/nuttx/nuttx section .bss' will not fit in regionsram' arm-none-eabi-ld: region `sram' overflowed by 61400 bytes

The board we are using has 112KiB + 16 KiB of usable SRAM. @nuttx helped us understand that :

The error message above means that your program static data is currently 112KiB + 60KiB or about >182KiB. That does not fit into any available memory region. So you are stopped until you solve that >problem. But then you are going to have another big time memory usage program when your program >starts.

We definitely need to shrink the code size quite a bit if we wish to continue moving forward (specially because we also will need to create a ROS thin layer on top of DDS). Can you please provide suggestions on what we could remove? For now a minimal DDS implementation with RTPS should do it.

Once again, we are really happy and thankful for your support. Thanks everyone!

vmayoral commented 9 years ago

I removed -DEXTRA_STATS and -DCDD_USED but the size didn't decrease. I will try to remove next DDDS_DEBUG but a quick test suggest that some work fixing dependencies will be needed. Seems the same for XTYPES (actually it looks pretty hard to remove XTYPES if even possible)

brunodebus commented 9 years ago

About the memory issue: actually, code size is not the reason for the linking error you are getting, .bss (zero initialized data is). The main culprits for this are the send and receive buffers allocated in dds/src/trans/ip/rtps_ip.c (lines 157 & 172). They are each ~64kB.... (MAX_RX_SIZE/MAX_TX_SIZE)

You can change this size (and you will only be able to receive smaller messages) in ./dds/src/include/ri_data.h and your program should be able to link.

@jvoe: Does the buffer need to have a specific size/alignment? I cannot remember.

vmayoral commented 9 years ago

@brunodebus thanks for the suggestion. That did it.

@bramstes, @brunodebus, @kriver and @jvoe: I'm trying now to run the chat on the embedded board but it seems it might need quite a bit of work. Right now, i'm trying to figure out the right configuration for the ParVal_t parameters in NuttX. Usually they are loaded from a file but in this case NuttX has no FS (not that we are using at least).

My idea is to set them by default to a reasonable value in the NuttX case. Could you provide some reference on what values these parameters should take? (dds/doc/tdds-env.doc seems empty). Or maybe a set of tips on what values these params should take to be able to run in a embedded system?

gregory-nutt commented 9 years ago

I would suggest that you use the configdata interface to hide the implementation of where configuration parameters are stored (or even if they are hardcoded). See for example apps/examples/configdata.

NuttX, of course, does support a variety of file systems. The STM32F4DIS-BB board has a microSD slot on board that is fully supported by NuttX... you just have to enable it. It is already enabled in the configs/stm32f4discovery/netnsh configuration.

Traditionally, embedded systems keep configuration data in tiny serial FLASH or EEPROM. Most network-enabled embedded systems needed some configuration storage for, at a minimum, the MAC address of the device. A small serial storage device is usually all that is required.

Hardcoding anything is generally a bad idea. But hiding the source of configuration data behind the configdata interface at least makes what you planning portable.

Greg

On September 24, 2014 at 5:19 PM Víctor Mayoral Vilches notifications@github.com wrote:

@brunodebus https://github.com/brunodebus thanks for the suggestion. That did it.

@bramstes https://github.com/bramstes , @brunodebus https://github.com/brunodebus , @kriver https://github.com/kriver and @jvoe https://github.com/jvoe : I'm trying now to run the chat on the embedded board but it seems it might need quite a bit of work. Right now, i'm trying to figure out the right configuration for the ParVal_t parameters https://github.com/vmayoral/ros2_embedded_nuttx/blob/old-code/dds/src/co/config.c#L71 in NuttX. Usually they are loaded from a file but in this case NuttX has no FS (not that we are using at least).

My idea is to set them by default to a reasonable value in the NuttX case. Could you provide some reference on what values these parameters should take? (dds/doc/tdds-env.doc seems empty). Or maybe a set of tips on what values these params https://github.com/vmayoral/ros2_embedded_nuttx/blob/old-code/dds/src/co/config.c#L71 should take to be able to run in a embedded system?

— Reply to this email directly or view it on GitHub https://github.com/brunodebus/tinq-core/issues/7#issuecomment-56747873 .

jvoe commented 9 years ago

@vmayoral: for documentation on DDS configuration, as well as for other specifics of Tinq DDS, check the dds/doc/dita/out/user_manual.pdf. The XTypes type system can be replaced with an older variant of type handling that only supports TSM-based type definitions. This might save on some code/data. To do this, undefine XTYPESUSED from the Makefile, remove all src/xtypes/.[ch] files from CSRCS, and add the src/typecode/_.[ch] files instead to CSRCS (pid.c, cdr.c, pl_cdr.c and typecode.c). This probably will require a number of changes to the rest of DDS though, since a few parameters have been added to the type functions since the XTypes mechanism was introduced. However, it should be possible with a bit of work/patience. @brunodebus: I don't think there is an alignment restriction to the buffer sizes. I wouldn't go lower than 1500 though (the Ethernet frame size), and keep it a multiple of 4, just to be safe :-).

vmayoral commented 9 years ago

@jvoe thanks for your answer. I was hoping i could get a link to how things get configured for a sample application. There're many parameters that although well documented, i don't feel familiar enough with the DDS implementation to guess a good value. For example:

    { G_Pool, DC_Pool_Subscribers,      "SUBSCRIBERS",   V_Range,  0, NULL, {0}},
    { G_Pool, DC_Pool_Publishers,       "PUBLISHERS",    V_Range,  0, NULL, {0}},
    { G_Pool, DC_Pool_Readers,          "READERS",       V_Range,  0, NULL, {0}},
    { G_Pool, DC_Pool_Writers,          "WRITERS",       V_Range,  0, NULL, {0}},
    { G_Pool, DC_Pool_Topics,           "TOPICS",        V_Range,  0, NULL, {0}},
...

It will be great if you could provide a pointer to how things are configured for the chat application which is the one i'm trying to build for now.

jvoe commented 9 years ago

@vmayoral For the minimal amount of memory on startup, use -DFORCE_MALLOC. This will force all required memory to be malloc-ed from the heap instead of using static memory pools. Your application will only use what it needs then. There will be fragmentation overhead, of course, and this is not as fast as memory pools, but these disadvantages might not be that bad. On the other hand, if you really need things to be configured up front, thereby avoiding dynamic allocations altogether, the DDS_get_default_pool_constraints() and DDS_set_pool_constraints() functions allow you to do this very easily. To find out your actual pool requirements, run the application in its worst case usage, and check the really used memory amount with the "spool" DDS debug command.

vmayoral commented 9 years ago

@nuttx and @LorenzMeier, we are having some issues while running our application and we discovered that there's a portion of the FLASH memory that is just not accessible. Have you seen something similar? https://github.com/vmayoral/ros2_embedded_nuttx/issues/7.

vmayoral commented 9 years ago

The issue with the memory has been fixed. It seems like stlink had a bug that didn't allow to debug all the memory (1 MB) available in the STM32F4Discovery. Refer to https://github.com/ros2/stlink/commit/d8552796f2c630b4858eaf5613e1dc99943b572b for the fix.

Keep moving forward. @jvoe will try setting up the DDS Debug shell as soon as possible.

vmayoral commented 9 years ago

@bramstes, @brunodebus, @kriver, @nuttx and @jvoe: I finally got the chat example application working. After some fixes that allowed the serial UART to use the prompt i encountered the following issues while trying to interoperate the board with chat running in a Desktop machine (Ubuntu):

A quick hack has been applied to one of the mutexes in the code. It was causing the system to hang and it didn't seem trivial to fix. Refer to https://github.com/vmayoral/ros2_embedded_nuttx/issues/8. This hack allows the application to run.
Apparently, it's not possible to validate the parameters without causing the system to crash. Refer to this issue https://github.com/vmayoral/ros2_embedded_nuttx/issues/10
Network doesn't seem to be launched properly (no unicast interfaces in the embedded side) and the embedded board can't interoperate. Refer to this issue https://github.com/vmayoral/ros2_embedded_nuttx/issues/9. I am expecting to see (at least) the discovery packages (RTPS) coming from the embedded board but i'm not even seeing that.

Any comments will be appreciated.

gregory-nutt commented 9 years ago

A quick hack has been applied to one of the mutexes in the code. It was causing the system to hang and it didn't seem trivial to fix. Refer to vmayoral/ros2_embedded_nuttx#8. This hack allows the application to run.

I don't understand what the issue is

Apparently, it's not possible to validate the parameters without causing the system to crash. Refer to this issue vmayoral/ros2_embedded_nuttx#10

Looks like a stack overrun. This is very close to overflowing the stack. My guess is that probably has overflowed the stack in the past:

sp: 10003090 stack base: 10003750 stack size: 000007e4

Definitely overran the stack:

sp: 10002f48 stack base: 10003750 stack size: 000007e4 ERROR: Stack pointer is not within the allocated stack

You will need to increase the stack size. Overrunning the stack can cause all kinds of crazy, inexplicable behaviors.

Greg

vmayoral commented 9 years ago

@nuttx thanks for the suggestion. I increased the stack sizes to:

CONFIG_IDLETHREAD_STACKSIZE=8192
CONFIG_USERMAIN_STACKSIZE=8192
CONFIG_PTHREAD_STACK_MIN=2046
CONFIG_PTHREAD_STACK_DEFAULT=8192

But i still get issues when uncommenting this code. @jvoe, @brunodebus , for sure there should be a reason why this parameters are marked as invalid. Would you mind sharing some insight on this matter.

Besides the parameters issue my current blocker is the network aspect from the DDS side. Maybe this issue provides some insight? https://github.com/ros2/ros2_embedded_nuttx/issues/9.

Thanks

gregory-nutt commented 9 years ago

In your reference, https://gist.github.com/vmayoral/6c4e51e444c084429908 :

Assertion failed at file:mm_free.c line: 137

Line 137 of mm_free.c:

  DEBUGASSERT(prev->blink);

which means that your heap is corrupted. You have some bug that is completely unrelated to this failure that the trouncing on memory. The code that reports the failure is the victim of the trouncer, but is not the problem itself.

Most likely you have other cases where you are overrunning a stack or writing past the end of a heap allocation. When you write past the end of your memory allocation, you destroy heap data structures so the next time you access those heap structures you get this failure. Tough to debug.

There is a a stack monitor function that can be enabled. See CONFIG_DEBUG_STACK and apps/system/stackmonitor. That might help

Greg

On October 2, 2014 at 1:20 PM Víctor Mayoral Vilches notifications@github.com wrote:

@nuttx https://github.com/nuttx thanks for the suggestion. I increased the stack sizes to:

CONFIG_IDLETHREAD_STACKSIZE=8192 CONFIG_USERMAIN_STACKSIZE=8192 CONFIG_PTHREAD_STACK_MIN=2046 CONFIG_PTHREAD_STACK_DEFAULT=8192

But i still get issues https://gist.github.com/vmayoral/6c4e51e444c084429908 when uncommenting this code https://github.com/ros2/ros2_embedded_nuttx/blob/master/dds/src/co/config.c#L698 . @jvoe https://github.com/jvoe , @brunodebus https://github.com/brunodebus , for sure there should be a reason why this parameters are marked as invalid. Would you mind sharing some insight on this matter.

Besides the parameters issue my current blocker is the network aspect from the DDS side. Maybe this issue provides some insight? ros2/ros2_embedded_nuttx#9 https://github.com/ros2/ros2_embedded_nuttx/issues/9 .

Thanks

— Reply to this email directly or view it on GitHub https://github.com/brunodebus/tinq-core/issues/7#issuecomment-57677676 .

ashwinvijayakumar commented 9 years ago

vmayoral, Have you considered using https://mbed.org/technology/os/ ? An Alpha version of the OS has been around for quite some time but ARM is making an official release on Oct15th. I understand you have come quite a long way with nuttx but I would urge you to consider mbedOS mainly because with mbed, ARM is attempting to solve the fragmentation issue with embedded systems. This could very well help us roboticists battle the fragmentation issue w.r.t low-level components (sensor & actuator) drivers. Read more here: http://gigaom.com/2014/10/01/to-combat-fragmentation-arm-built-a-new-type-of-os-for-the-internet-of-things/

LorenzMeier commented 9 years ago

@ashwinvijayakumar You haven't qualified your recommendation with respect to the specs and features of the OS, and in contrast to mbed we know NuttX performs in the environment we care about (because it flies on thousands of drones).

Could you provide a little more performance indicators, other than ARM's marking claims? Clearly our application here is quite different from normal Internet of Things requirements. ROS embedded needs a "small Linux", not (quote from ARM) "mbed™ OS is an operating system for IoT devices and is especially well-suited to run in energy constrained environments".

It doesn't mean its not a fit, but more information and use cases similar to embedded control implemented in mbed would be appreciated.

ashwinvijayakumar commented 9 years ago

@LorenzMeier I should have mentioned that mbedOS is not really an independent RTOS, it is a combination of mbed framework (mainly provides peripheral & uC hardware abstraction) and CMSIS-RTOS which quote ARM "is a standardized API for Real-Time Operating System (RTOS) kernels". More here: http://www.arm.com/about/newsroom/arm-extends-cmsis-with-rtos-api-and-system-view-description.php

The key here is, providing an abstraction over multiple hardware and OS platforms. i.e product manufacturers have various requirements when it comes to their choice of hardware (uC) and RTOS, especially because of licencing and BOM cost. By utilizing mbed APIs we can ensure that product manufacturers do not have to either stick with the 'officially supported' hardware (STM32) & OS (nuttx) or end up spending time and resource to custom port the software and also maintain it.

Another big requirement for an end-user/maker/kickstarter-er like me would be the ability to customize/prototype my robot by easily adding different kind of sensors & actuators. Provided the mbed framework, it would be easy for people to download an existing community developed component driver and kickstart my prototyping. See http://developer.mbed.org/components/

Having said so, @nuttx : Are there any plans of providing a CMSIS-RTOS glue layer for nuttx?

I could gather the following requirements from this conv. thread:

POSIX (pseudo) interface - As Bram already mentioned, DDS doesn't really have Posix-specific dependencies. Even if it's absolutely needed, quote jvoe
Making Posix-like read/write() variants on top of your IP networking code shouldn't be too hard
- BSD socket interface - http://developer.mbed.org/users/donatien/notebook/bsd-sockets-api/ and a few other options for socket interface: http://developer.mbed.org/handbook/Socket, http://developer.mbed.org/users/mbed_official/code/lwip/
- Well documented and supported - I'll let http://developer.mbed.org/ speak for itself
- Shared memory - Can you use mutex for this? http://developer.mbed.org/handbook/CMSIS-RTOS#mutex

LorenzMeier commented 9 years ago

@ashwinvijayakumar I like the general concept and direction. What you will want to factor in for your assessment:

We can't rely on random driver libraries for drone or other safety critical robotic applications, every module needs thorough testing. Experience shows that our safety and reliability requirements are way above what others have and our sensor update rates are also higher than most platforms. Which means we need to soak test every single external contribution, so when you need a driver for a new chip, you have the tradeoff between having a skilled, trusted contributor write it based off an existing, working driver or pulling in a random community contribution. You will find that you will often lean towards reusing 90% off an existing driver you trust and just update the bits for the new sensor. Saves a lot of validation testing of infrastructure.
There is a large library of existing drivers for NuttX for the robotics space (inertial sensors, laser altimeters, GPS modules - anything that's useful on a robot / drone). In contrast to IoT library drivers, we know these have flown tens of thousands flight hours fine. If we could somehow know if they work similarly well in deployed IoT applications that of course would be a similarly valid baseline. With additional abstraction in place we of course don't know if a driver that worked great in NuttX still works in e.g. VxWorks (even if the API says it should).

From a reuse perspective your arguments are really valid. I'm however not sure if the level of abstraction you're proposing (mbed, CMSIS OS layer) is the right fit for ROS embedded. As far as I understood the concept the idea was to have "dumb" nodes that just shovel sensor data onto a network interface, which is close to a bare metal concept. Completely orthogonalizing the architecture (which I generally like as approach) might add more overhead in terms of development effort than the core functionality of the node would ever have - or in other words, this sounds a case where KISS would work well and being married to one OS (I'm not presuming NuttX here, rather, just picking one and sticking with it) might not be totally terrible.

It is frickin' hard to write really high-performance, low-latency device drivers with accurate timing that are completely OS agnostic, in particular once you have several running in parallel and need them to play nicely together (locking, interrupt handling, priorities, DMA all become immediately a concern).

But here is the key question - would you personally be willing to invest some serious development effort , including soak testing, in better abstraction? Because at the end of the day is all just a development bandwidth question 8).

gregory-nutt commented 9 years ago

Having said so, @nuttx : Are there any plans of providing a CMSIS-RTOS glue layer for nuttx? No, there will never be third party libraries incorporated into the NuttX source tree. That does not prevent anyone from using them, but I cannot maintain them.

ashwinvijayakumar commented 9 years ago

@LorenzMeier & @nuttx : Thanks for the immediate response

@LorenzMeier : You make a valid point with "well-tested software for safety critical robotics applications". To address the validity of mbed ports (OS and the peripheral framework), I believe ARM has setup the mbed framework such that ARM (mbed team) will maintain the abstraction layers while the port providers are responsible for their HAL (for peripheral framework) and RTOS glue layer. ex. ST microelectronics is expected be responsible for Nucleo's HAL and Real-Time Engineers Ltd. would be responsible for their CMSIS-RTOS glue layer.

As far as device drivers are concerned, I couldn't agree more with you about the difficulty involved in writing OS agnostic high-performance, low-latency drivers. Somebody definitely has to put in the development time and own these drivers, as much as I would like to step-up and contribute to this, I am unable to commit to this responsibility due to my current engagement with work and other projects. :)

vmayoral commented 9 years ago

@ashwinvijayakumar @LorenzMeier and @nuttx thanks for pushing forward the discussion but we have no intentions of moving from NuttX.

@bramstes, @brunodebus, @jvoe and @kriver: After addressing several issues we believe that it shouldn't take much to have a Desktop-embedded DDS chat application running. DDS Debug shell has proved to be quite helpful but we remained blocked at the networking side. There seems to be some issue with the "locators" in the embedded side. Please refer to https://github.com/ros2/ros2_embedded_nuttx/issues/9. Particularly, can you think of why:

The issue seems to remain at the locators (interfaces) at the embedded board. In the Desktop i get:

!!sloc UDP:239.255.0.1:7400(6)_12 UDP:239.255.0.1:7401(3)_2 UDP:172.23.1.215:7414(5)_10 UDP:192.168.0.2:7414(4)_10 UDP:172.23.1.215:7415(2)_2 UDP:192.168.0.2:7415(1)_2

While in the embedded board:

!!sloc UDP:239.255.0.1:7400(2)_12 UDP:239.255.0.1:7401(1)_2

vmayoral commented 9 years ago

To put it differently, i'm guessing that the Linux implementation that i used as a starting point for NuttX is somehow reading the network interfaces for setting up the locators. I have not modified this code (neither i found it yet) thereby I'm afraid that this might be the cause. I'll appreciate if you could comment on this matter.

jvoe commented 9 years ago

@vmayoral It looks like DDS didn't get the list of configured IP addresses correctly. What is shown is only the well known DDS multicast addresses, but these are always present. The unicast addresses are retrieved in src/co/sys.c: sys_own_ipv4addr(). This function should return the correct list, whether statically configured via ifconfig or ip, or populated via a DHCP-based mechanism. Note that If you want to use DHCP to configure the networking stack, this is usually done with a system-specific mechanism. In Linux for example, netlink is used. Other systems use other mechanisms. You'll have to add an extra NuttX mechanism if this is supported differently from other systems, I guess. The system-specific mechanism is abstracted in src/dynip/* where the di{}.c files are system-specific files. You could add a di_nuttx.c file there. Whatever the mechanism, however, the sys_own_ipv4_addr(), must still return the correct IPv4 address list. A similar mechanism is used for IPv6 address configurations, btw. Here, the sys_own_ipv6_addr() function is used. Of course, IPv6 support is only used if -DDDS_IPV6 is used at compile time. In the DDS Debug shell, you can use the 'scx' command to see the list of active DDS IP interfaces. Addresses configured via a dynamic IP address mechanism can be shown with 'sdip'.

vmayoral commented 9 years ago

Thanks @jvoe. https://github.com/ros2/ros2_embedded_nuttx/commit/d1e261d363f1ce1924ebaf58c40347f556902c30 addresses this matter and apparently we get the right locators configured:

Domain 0 (pid=95): {1}
        GUID prefix: 6003b4df:005f0004:00060000
        RTPS Protocol version: v2.1
        Vendor Id: 1.14 - Technicolor, Inc. - Qeo
        Technicolor DDS version: 4.0-0, Forward: 0
        Entity name: Technicolor Chatroom
        Flags: Enabled
        Meta Unicast: 
                UDP:192.168.0.3:7600(3) {MD,UC} H:3
        Meta Multicast: 
                UDP:239.255.0.1:7400(4) {MD,MC} H:4
        Default Unicast: 
                UDP:192.168.0.3:7601(1) {UD,UC} H:1
        Default Multicast: 
                UDP:239.255.0.1:7401(2) {UD,MC} H:2
        Manual Liveliness: 0
        Lease duration: 50.000000000s
        Endpoints: 10 entries (5 readers, 5 writers).
        Resend period: 10.000000000s
        Destination Locators: 
                UDP:239.255.0.1:7400(4) {MD,MC} H:4
        Discovered participants: <none>

@jvoe, @bramstes, @brunodebus and @kriver: I noticed that the chat application sends actually the information twice in many cases (even when tried with several instances in Linux)

The only difference that i can see now between Desktop and embedded is the IGMPv3 traffic:

Embedded

screenshot from 2014-10-07 15 20 30

Desktop

screenshot from 2014-10-07 15 18 36

I noticed that NuttX did not supported IGMPv3 thereby i changed my Linux box to IGMPv2. Tinq running in Desktop is now using IGMPv2:

screenshot from 2014-10-07 15 53 54

however it still doesn't interoperate with NuttX. Furthermore i keep not seeing any IGMPv2 traffic from NuttX). There's a simple example application in NuttX that allows to test IGMPv2 and it does work.

My guess here is that the way Tinq implements multicast is not supported by the NuttX IGMP interface. @jvoe, @bramstes, @brunodebus and @kriver, could you provide a pointer in the DDS code to where the IGMP packets (multicast implementation) are being sent? I could probably hack part make it compatible with NuttX.

I believe we are pretty close to make them interoperate :+1:

vmayoral commented 9 years ago

I updated the commend, Desktop and Embedded pictures where exchanged. Now they should be fine.

jvoe commented 9 years ago

@vmayoral Are you sure that the poll/select() and subsequent read() implementation actually works? From the output you gave, it looks like DDS is only sending packets and not receiving any. The DDS 'scx' debug shell command on the target should give some useful statistics on packet transmits/receives. The SPDP message being regularly retransmitted is normal and part of the DDS discovery process. The 'sdisc' command output clearly shows it didn't discover any other DDS entities, which could be explained by the reception not working properly.

brunodebus commented 9 years ago

@vmayoral We use setsockopt IP_ADD_MEMBERSHIP to control multicast group membership (see dds/src/trans/ip/ri_udp.c around line 835) . This triggers the Linux network stack to send out the initial IGMP message (and to reply to membership queries) . Looks like NuttX does not support this (see nuttx/TODO around line 866).. they talk about ioctl(SIOCSIPMSFILTER) and that indeed seems to force the initial send.

gregory-nutt commented 9 years ago

@vmayoral An application level interface to join or leave a multicast group is ipmsfilter(). It will manage all of the ioctl() stuff. There is an example of its use in apps/examples/igmp.

vmayoral commented 9 years ago

NuttX seems to respond nicely to IGMP when tested in an isolated way (as @nuttx) suggests: screenshot from 2014-10-08 09 50 26

but even if the chat is not receiving anything i'd expect to submit the IGMPv2 packages when launched (which is not happening). I will review the multicast implementation to see if the API matches the one in NuttX following @brunodebus comments.

@jvoe thanks for the suggestion. You are right, it seems to be something related to the UDP poll. No packages received by the embedded system:

!!scx

# of IP receive events   = 0
Sending UDP socket:
  UDP     0.0.0.0:0     fd:8      id:0  
            r.errors:0  w.errors:0  empty:0  too_short:0  no buffers:0
            octets Tx/Rx:0/0  packets Tx/Rx:0/0
Locators:
  UDP     192.168.0.3:7601  fd:9      id:1   USER UCAST SRC_MCAST
            r.errors:0  w.errors:0  empty:0  too_short:0  no buffers:0
            octets Tx/Rx:0/0  packets Tx/Rx:0/0
  UDP     239.255.0.1:7401  fd:10      id:1   USER MCAST
            r.errors:0  w.errors:0  empty:0  too_short:0  no buffers:0
            octets Tx/Rx:0/0  packets Tx/Rx:0/0
  UDP     192.168.0.3:7600  fd:11      id:1   META UCAST SRC_MCAST
            r.errors:0  w.errors:0  empty:0  too_short:0  no buffers:0
            octets Tx/Rx:3744/0  packets Tx/Rx:12/0
  UDP     239.255.0.1:7400  fd:12      id:1   META MCAST
            r.errors:0  w.errors:0  empty:0  too_short:0  no buffers:0
            octets Tx/Rx:0/0  packets Tx/Rx:0/0

Makes sense because there's nobody blocked in recvfrom() thereby the NuttX kernel just drops the incoming packages. I was hoping i could hack the DDS implementation to somehow to avoid using poll but it seems it is not avoidable anymore. I'll proceed implementing a nuttx_udp_poll() that will query 4 different threads (one for each locator): img_20141008_164739

gregory-nutt commented 9 years ago

Hi, Victor,

Makes sense because there's nobody blocked in recvfrom() thereby the NuttX kernel just drops the incoming packages. I was hoping i could hack the DDS implementation to somehow to avoid using poll but it seems it is not avoidable anymore. I'll proceed implementing a nuttx_udp_poll() that will query 4 different threads (one for each locator):

If there is critical OS functionality that you need and it conforms to some documented standard, I can help implement some of those pieces -- time permitting.

You have it right, the POLLIN poll() does not work on UDP packets because there is no UDP packet buffering; if you are not waiting in recvfrom() when a packet a UDP packet is received, it is simply dropped. The POLLIN poll() does not work on TCP/IP packets either unless you enable CONFIG_NET_TCP_READAHEAD which enables TCP packet buffering and, hence, TCP poll().

A similar thing can be said for O_NONBLOCKING. Since no UDP data is buffered, you cannot read UDP data with O_NONBLOCK... all UDP reads have to block to receive the next packet.

So without additional OS support you would have no option but create a thread that constantly waits on recvfrom() and buffers packets -- just as I see on your whiteboard. This is essentially the application space equivalent of the UDP packet buffering that could be implemented in the OS. This is a better implementation and simpler implementation than I could do in the OS in some ways: On the OS side there are additional complexities to get a generic solution: How would you handle multicast? How do you know that incoming UDP packets will every be read? Should there be a timeout to discard them if they are never read? etc. I would need a pretty good specification of the behavior to correctly implement the buffering in the OS.

But, within the kernel, I could implement the packet buffering with no additional threads. Threading is expensive because each thread requires a stack and, hence, eats up more of your limited RAM. That would be the biggest motivation for an OS-based solution.

Greg

gregory-nutt commented 9 years ago

@vmayoral Another implementation option occurs to me.

Last week, I implemented the first cut at POSIX asynchronous I/O. So something lie aio_read() would do the job for you: http://pubs.opengroup.org/onlinepubs/009695399/functions/aio_read.html Or perhaps better, lio_listio(): http://pubs.opengroup.org/onlinepubs/009695399/functions/lio_listio.html

The best thing about this solution is that it is portable, POSIX code. Unfortunately, That code is still immature and I still have some issues regarding sockets and asynchronous incoming I/O in general: See

http://sourceforge.net/p/nuttx/git/ci/master/tree/nuttx/TODO ... ASYNCHRONOUS I/O DOES NOT WORK WITH SOCKETS ASYNCHRONOUS I/O DOES NOT WORK WITH MANY DEVICES

Those two design deficiencies would have to be corrected first, but then whole solution would be complete for you.

Greg

gregory-nutt commented 9 years ago

@vmayoral Update: I did finish the AIO implementation for sockets and you should now be able to lio_list() and receive signals whenever a packet is receive by any of four pending reads. Not well tested yet, of course.

vmayoral commented 9 years ago

@nuttx thanks a lot for the pointers and the implementation. I finished a ring-buffer/threads based pseudo-udp polling implementation (the one described in my whiteboard) https://github.com/ros2/ros2_embedded_nuttx/commit/37ef52ad574ffc93e25f05983eccaaf2212678cf. The system now receives properly but:

It's quite slow. The overhead of the 4 additional threads seems to be quite big.
Although it receives packages, the packages received are still not processed appropiately. This is probably due to some bug in the ring buffer or a bad hook with the DDS implementation. Need to look it up.

Before spending more time catching the bug I'm going to code another prototype that makes use of lio_list(). Will place it here.

vmayoral commented 9 years ago

All right, i've finished studied the two paths:

pseudo-poll function using theads and ring buffers [status: implemented with bugs]
asynchronous I/O using lio_listio function (for a quick reference on how to use, refer to this link)

The second option (as far as I understand) will require to break the current DDS implementation (e.g. the poll(), sock_fd_schedule(), rtps_ip_rx_fd() and recvfrom() calls need to be reimplemented in a different way). It's not clear to me whether the asynchronous implementation (which i assume has some threads internally in the kernel) will provide clear benefits given the challenge of wiring everything up together. Still, i can guess that for real-time environments (such as the one we are trying to pursue), using lio_listio sounds more reasonable.

@bramstes, @brunodebus, @kriver and @jvoe could share your opinion on this matter?

brunodebus / tinq-core

Using Tinq for ROS 2.0 #7

Embedded

Desktop