eclipse-cyclonedds / cyclonedds

Eclipse Cyclone DDS project
https://projects.eclipse.org/projects/iot.cyclonedds
Other
845 stars 350 forks source link

Help!!! freertos + lwip + dds , block by lwip_select #956

Closed colazhu closed 2 years ago

colazhu commented 2 years ago

I porting the dds to freertos+lwip+posix on stm32H743

in q_receive.c uint32_t recv_thread (void *vrecv_thread_arg) { .... if ((ctx = os_sockWaitsetWait (waitset)) != NULL) <= this function cannot return and block by "lwip_select" { int idx; ddsi_tran_conn_t conn; while ((idx = os_sockWaitsetNextEvent (ctx, &conn)) >= 0) .... } }

on Linux uint32_t listen_thread (struct ddsi_tran_listener *listener) { ... os_sockWaitsetTrigger (gv->recv_threads[0].arg.u.many.ws); <= after this function will made os_sockWaitsetWait pass through ... }

but I found it do nothing when build with LWIP socket os_sockWaitsetTrigger {

if defined(LWIP_SOCKET)

(void)ws; <= do nothing in this function

else

.....

endif

} }

Need Help! how to resolve this problem?

k0ekk0ek commented 2 years ago

Hi @colazhu! Nice to see there's community interest in getting Cyclone DDS to run on FreeRTOS targets. The self-pipe trick cannot be used with lwIP. At least, not when I ported Cyclone DDS to the FreeRTOS targets we needed to support. That shouldn't matter too much. If I recall correctly, that only meant the waitset doesn't return directly when another participant is added. There's still the timeout that expires so there's only a slight delay when a new participant is added, which usually occurs only at the start. (@eboasson, not to take too much of your time, but does that statement still hold?) The targets worked fine for me otherwise. There's a good chance your problem lies elsewhere, but it's hard to tell without additional information.

On a side note, I'm currently working on replacing the os_sockWaitset which is really the only operating system specific part left in DDSI. For FreeRTOS specifically that means it becomes easier to use an alternate mechanism. I believe FreeRTOS has it's own, non-socket based, event mechanism that hopefully increases performance and makes the whole thing less relient on lwIP (So FreeRTOS+TCP can be used more easily).

eboasson commented 2 years ago

With

uint32_t listen_thread (struct ddsi_tran_listener *listener) {
...
os_sockWaitsetTrigger (gv->recv_threads[0].arg.u.many.ws); <= after this function will made os_sockWaitsetWait pass through

do you mean just os_sockWaitsetTrigger, or specifically when called from listen_thread? If it is the second, then I assume you are using TCP. Maybe that affects things.

The pragmatic solution is probably to create a pair of sockets instead of a pipe, but I am not sure if lwIP supports the socketpair function. If it does, it should be straightforward to use that.

colazhu commented 2 years ago

Thanks for the reply.

Yes,I am using TCP. I set DDSI_TRANS_TCP for tranport_selector.

I set 200ms timeout for lwip_select to pass through os_sockWaitsetWait in recv_thread. And let os_sockWaitsetNextEvent to check data available. Hope it works well.

xevent_thread not works yet, the callstack not same as it on linux. callstack as follows:

static uint32_t xevent_thread (struct xeventq xevq) { ... nn_xpack_send (xp, false); <= when xp->niov == 4 ... } .... size_t addrset_forall_count (struct addrset as, addrset_forall_fun_t f, void *arg) { .... ddsrt_avl_cwalk (&addrset_treedef, &as->ucaddrs, addrset_forall_helper, &arg1); <= enter here }

void ddsrt_avl_walk (const ddsrt_avl_treedef_t td, ddsrt_avl_tree_t tree, ddsrt_avl_walk_t f, void a) { ......... todop = tree->root; <= root is null on Linux, but another value on FreeRTOS, I guess the root value is not good . but not found who set it yet while (todop) {
..... static void ddsi_tcp_conn_connect (ddsi_tcp_conn_t conn, const ddsrt_msghdr_t
msg)
<= blocked by lwip_connect err = netconn_connect(sock->conn, &remote_addr, remote_port);

colazhu commented 2 years ago

“Hello world” is ok between FreeRTOS(publisher) and linux(subscriber)

1 domain config on Linux config_raw.transport_selector = DDSI_TRANS_TCP; config_raw.tcp_port = LINUX_TCP_PORT; struct ddsi_config_peer_listelem peer_local; char *local_addr = RTOS_TCP_ADDR; // "tcp/192.168.1.30:8000" peer_local.next = NULL; peer_local.peer = local_addr; config_raw.peers = &peer_local; config_raw.many_sockets_mode = DDSI_MSM_MANY_UNICAST;

domain config on FreeRTOS config_raw.transport_selector = DDSI_TRANS_TCP; config_raw.tcp_port = RTOS_TCP_PORT; // 8000

2 ddsrt_select().. set timeout time prevent blocking by os_sockWaitsetWait in LWIP_SOCKET

3 ddsrt_sendmsg(... flags ...) { if ((n = sendmsg(sock, msg, flags)) != -1) {
flags is not ok on LWIP , use MSG_DONTWAIT or MSG_MORE instead

4 open the Macro in sockets/posix.h

define DDSRT_HAVE_SSM 0 ->1 for LWIP_SOCKET

and disable the source code in ddsi_udp.cp for compile error in joinleave_ssm_mcgroup

5 remove the Macro "#define DDS_HAS_TYPE_DISCOVERY 1" "#define DDS_HAS_TOPIC_DISCOVERY 1" in features.h I don't know why qos mismatch when handle_sedp => if (!isb0 && !topickind_qos_match_p_lock (gv, &prd->e, prd->c.xqos, &wr->e, wr->xqos, &reason, &prd->c.type_id, &wr->c.type_id))

eboasson commented 2 years ago

That's good news @colazhu!

Re 2: I know I don't have the time right now to properly fix the "blocking by os_sockWatsetWait" problem, so even though the timeout you added is only a work around, I think it may make sense to merge that change (assuming you have limited properly to LWIP only with the preprocessor).

Re 3: The same goes for the flags. I don't know what MSG_MORE (I think I can guess at what MSG_DONTWAIT does), and so I can't say what change would be best. I do think that this particular problem is specific to the use of TCP, or else @k0ekk0ek would have run into it before. I'm sure there is a good way for setting the flags correctly when using TCP without breaking UDP (even if it involves checking the type of socket ...)

Re 4: This I don't understand. Firstly, if there is a build problem with SSM support excluded, then that should be fixed: the code is supposed to be compatible with platforms that don't have SSM support. All "normal" platforms we have access to support SSM, so it doesn't automatically get built in that mode and errors may have crept in, but if so, those should be fixed. Perhaps you can look at what actually fails when SSM is excluded?

Secondly, since you're using TCP, you shouldn't be using any of the code touching SSM (or any source multicast, for that matter). So it can't be that you had to enable it because you needed the functionality!

Re 5: The features.h file is written by CMake during configuration. DDS_HAS_TYPE_DISCOVERY and DDS_HAS_TOPIC_DISCOVERY are controlled by:

option(ENABLE_TYPE_DISCOVERY "Enable Type Discovery support" OFF)
option(ENABLE_TOPIC_DISCOVERY "Enable Topic Discovery support" OFF)

in src/CMakeLists.txt and as you can see they default to off. I don't understand why you had to edit features.h ...

Regarding the QoS mismatch specifically: is this mismatch there only when you use FreeRTOS? Because then it could be something low-level that is dependent on the port (different alignment rules, or other scary stuff). If it is also there when you run the FreeRTOS application on Linux, then it is probably simply a typo in the source code.

You can check programmatically, because when there is a QoS mismatch it is returned via the dds_get_offered_incompatible_qos_status and dds_get_requested_incompatible_qos_status calls. (Actually, if there are multiple mismatching QoS's, it'll only give you one; and if there are multiple reader/writer matches failing because of QoS mismatches before you read it, you'll only see the most recent one. I'm sure those limitations are manageable for you in this test setup.)

I do realize that there should be user-friendly tooling for this, but that's all work in progress. The beginnings exist in https://github.com/eclipse-cyclonedds/cyclonedds-python (the easiest me be to simply install the python binding on Linux and use ddsls to look at the readers/writers and their QoS settings). Alternatively, you're of course free to hook up a debugger (on Linux, I highly recommend https://rr-project.org as a tool!) add printfs, or write a trace file and look for lines with READER, WRITER, or SEDP [...] NEW.

sgf201 commented 1 year ago

“Hello world” is ok between FreeRTOS(publisher) and linux(subscriber)

1 domain config on Linux config_raw.transport_selector = DDSI_TRANS_TCP; config_raw.tcp_port = LINUX_TCP_PORT; struct ddsi_config_peer_listelem peer_local; char *local_addr = RTOS_TCP_ADDR; // "tcp/192.168.1.30:8000" peer_local.next = NULL; peer_local.peer = local_addr; config_raw.peers = &peer_local; config_raw.many_sockets_mode = DDSI_MSM_MANY_UNICAST;

domain config on FreeRTOS config_raw.transport_selector = DDSI_TRANS_TCP; config_raw.tcp_port = RTOS_TCP_PORT; // 8000

2 ddsrt_select().. set timeout time prevent blocking by os_sockWaitsetWait in LWIP_SOCKET

3 ddsrt_sendmsg(... flags ...) { if ((n = sendmsg(sock, msg, flags)) != -1) { flags is not ok on LWIP , use MSG_DONTWAIT or MSG_MORE instead

4 open the Macro in sockets/posix.h

define DDSRT_HAVE_SSM 0 ->1 for LWIP_SOCKET

and disable the source code in ddsi_udp.cp for compile error in joinleave_ssm_mcgroup

5 remove the Macro "#define DDS_HAS_TYPE_DISCOVERY 1" "#define DDS_HAS_TOPIC_DISCOVERY 1" in features.h I don't know why qos mismatch when handle_sedp => if (!isb0 && !topickind_qos_match_p_lock (gv, &prd->e, prd->c.xqos, &wr->e, wr->xqos, &reason, &prd->c.type_id, &wr->c.type_id))

could you give us a discription about your tools chain and main steps about building the project thanks so much