Posix Sockets - UDP not going out while using GNRC

jmgraeffe commented 3 years ago

Description

Posix sockets example with gnrc_ipv6_router_default works only one way:

UDP server is receiving messages correctly
UDP sender/client does send UDP packet but it does not arrive on the other end

Steps to reproduce the issue

Board: native Network: using tap0 with link-local addresses only

Steps to reproduce:

cd RIOT/examples/posix_sockets
added to Makefile of examples/posix_sockets (so the network stack supports NDP/ICMPv6):
```
USEMODULE += gnrc_ipv6_router_default
USEMODULE += gnrc_udp
```
make
sudo make term
ip link set tap0 up
started UDP server on host: nc -u6 -l 56830
on RIOT shell: udp send <link-local IPv6 address of tap0 on host> 56830 test

Expected results

UDP packet with payload "test" arriving at host interface tap0

Actual results

no UDP packet arrives on host, even though the shell outputs success (also checked with Wireshark)
ICMPv6 seems to work (RIOT does ask for the MAC address of the IPv6 address, checked with Wireshark)
sending UDP packets to UDP server on RIOT started with udp server start 56830 does work fine
examples/gnrc_networking also does work fine with both UDP server & client, with the same steps

Versions

GCC was used.

----------------------------
         Operating System: "Ubuntu" "20.04.1 LTS (Focal Fossa)"
                   Kernel: Linux 5.4.0-58-generic x86_64 x86_64
             System shell: /usr/bin/dash (probably dash)
             make's shell: /usr/bin/dash (probably dash)

Installed compiler toolchains
-----------------------------
               native gcc: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
        arm-none-eabi-gcc: arm-none-eabi-gcc (15:9-2019-q4-0ubuntu1) 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599]
                  avr-gcc: missing
         mips-mti-elf-gcc: missing
           msp430-elf-gcc: missing
       riscv-none-elf-gcc: missing
  riscv64-unknown-elf-gcc: missing
     riscv-none-embed-gcc: missing
     xtensa-esp32-elf-gcc: missing
   xtensa-esp8266-elf-gcc: missing
                    clang: clang version 10.0.0-4ubuntu1

Installed compiler libs
-----------------------
     arm-none-eabi-newlib: "3.3.0"
      mips-mti-elf-newlib: missing
        msp430-elf-newlib: missing
    riscv-none-elf-newlib: missing
riscv64-unknown-elf-newlib: missing
  riscv-none-embed-newlib: missing
  xtensa-esp32-elf-newlib: missing
xtensa-esp8266-elf-newlib: missing
                 avr-libc: missing (missing)

Installed development tools
---------------------------
                   ccache: missing
                    cmake: cmake version 3.16.3
                 cppcheck: missing
                  doxygen: 1.8.17
                      git: git version 2.25.1
                     make: GNU Make 4.2.1
                  openocd: Open On-Chip Debugger 0.10.0
                   python: missing
                  python2: missing
                  python3: Python 3.8.5
                   flake8: error: /usr/bin/python3: No module named flake8
               coccinelle: missing

Maybe it's expected behaviour, but if I understood right, the Posix wrapper does use the network stack-independent sock API and therefore it should work out of the box with the same modules / setup as in examples/gnrc_networking.

miri64 commented 3 years ago

Welcome to the RIOT community @jmgraeffe!

on RIOT shell: udp send <link-local IPv6 address of tap0 on host> 56830 test

Does your link-local address include the interface identifier? You can append it using the % sign. E.g. fe80::dead:coff:ee%5. Also, if you used dist/tools/tapsetup/tapsetup to create your TAP interfaces (or if you have the TAP interfaces in a bridge for any other reason), make sure you use the link-local address of the bridge, not the TAP interface, as the bridge takes away IPv6 capabilities of its sub-interfaces.

jmgraeffe commented 3 years ago

Welcome to the RIOT community @jmgraeffe!

Thank you!

Does your link-local address include the interface identifier? You can append it using the % sign. E.g. fe80::dead:coff:ee%5.

Oh, I didn't expect that I can actually specify the interface on the RIOT side too. Now it works, that's a bit embarrassing :D Just saw that in the gnrc_networking example, the send command does set the interface in the header automatically.

The reason I thought it's a problem in the first place is that my libcoap server (newest develop-branch of libcoap) gets incoming requests and sends the answer through Posix sendto, and it does not get to the tap0 interface. I guess there is something missing in their implementation so that the interface identifier is missing for link-local addresses.

Could I work around that issue by using a static or DHCPv6-assigned IPv6 address?

miri64 commented 3 years ago

Global addresses do not need the interface identifier, as with them the interface is clear (as their name says: link-local addresses are only valid to the link; i.e. the interface) if the FIB is configured properly. I am not sure about your setup, but if you use gnrc_ipv6_default and set your host up to send router advertisements (e.g. by using radvd), you could auto-configure your global address at the node.

miri64 commented 3 years ago

Our DHCPv6 client currently only supports prefix delegation, so no address assignment, so DHCPv6 would only help you to configure a whole subnet (e.g. by with a border router for a 6LoWPAN).

jmgraeffe commented 3 years ago

Alright, thanks. I'll close this issue as it was a false alert. Hopefully someone finds this useful sometime.

Our DHCPv6 client currently only supports prefix delegation, so no address assignment, so DHCPv6 would only help you to configure a whole subnet (e.g. by with a border router for a 6LoWPAN).

I tried the DHCP client but somehow it did not work for my home network. Might need to try setting up an own DHCP/RA daemon in the future.

miri64 commented 3 years ago

I tried the DHCP client but somehow it did not work for my home network.

Have you tried the gnrc_border_router example with USE_DHCPV6=1 in the environment. This at least provides you with everything you need to set-up a 6LoWPAN sub-net. From there you could try to get it running.

jmgraeffe commented 3 years ago

I tried the DHCP client but somehow it did not work for my home network.

Have you tried the gnrc_border_router example with USE_DHCPV6=1 in the environment. This at least provides you with everything you need to set-up a 6LoWPAN sub-net. From there you could try to get it running.

What would be the benefit of opening such a network (over a DHCPv6 server on the host)?

miri64 commented 3 years ago

What would be the benefit of opening such a network (over a DHCPv6 server on the host)?

For the time being for you: getting an understanding RIOT's DHCPv6 server. For everything else, again, not sure about your use-case, so I would have to guess.

jmgraeffe commented 3 years ago

Okay. I managed to set up a DHCPv6 server and client successfully, and after adding a static route to the subnet, I can communicate back and forth without problems. However, the setup is a bit annoying to set up for the first time and for portability reasons, it would be better to support link-local communication.

If a program uses Posix socket & sendto functions, without binding to a specific interface or port, shouldn't the network stack figure out how to properly send UDP packets automatically? The only address on the interface is the link-local one, thus it's the task of the underlying network stack to choose the right zone id too, right?

miri64 commented 3 years ago

The only address on the interface is the link-local one, thus it's the task of the underlying network stack to choose the right zone id too, right?

That's just the thing: You can't tell from a link-local address alone which zone you are in. It is e.g. allowed for a node to have two interfaces that are connected to two upstream routers and both of those routers having fe80::1 as their link-local address configured. To which router would you send this packet?

jmgraeffe commented 3 years ago

Yes, in this case it cannot be known, but even then there should be an error or something. Otherwise the program just thinks everything went fine. But maybe this is somewhat constrained by the Posix API, I don't know.

If there is only one interface on the node though, which is a common case for embedded devices, the only logical zone is the zone which corresponds to that interface. In my opinion, it only makes sense to maintain a successful transmission. But again, I don't know if there are specifications limiting the way those things can be done.

miri64 commented 3 years ago

With gnrc_neterr you might get that error :-).

RIOT-OS / RIOT