RIOT-OS / RIOT

RIOT - The friendly OS for IoT
https://riot-os.org
GNU Lesser General Public License v2.1
4.87k stars 1.98k forks source link

gcoap example request on tap I/F fails with NIB issue #8199

Open kb2ma opened 6 years ago

kb2ma commented 6 years ago

I have found that a gcoap shell example request fails to send a message on the tap interface. The problem looks to be in the NIB module. I describe the problem below, and then show a workaround that hopefully will help the networking wizards understand what's amiss.

The transcript below shows a gcoap example request to the tap interface fails. I have turned on debug in nib.c.

coap get fe80::845e:22ff:fe47:a59f 5683 /time
gcoap_cli: sending msg ID 9629, 11 bytes
nib: get next hop link-layer address of fe80::845e:22ff:fe47:a59f%0
nib: fe80::845e:22ff:fe47:a59f is on-link or in NC, start address resolution
nib: host unreachable
> gcoap: timeout for msg ID 9629

I can get the request to work with two actions. First, in the shell, explicitly add the tap destination to the nib cache.

> nib neigh add 6 fe80::845e:22ff:fe47:a59f 86:5e:22:47:a5:9f

Second, update gnrc_ipv6_nib_get_next_hop_l2addr() to retain the generic_netif_t that it finds. gcoap creates the request with the SOCK_ADDR_ANY_NETIF netif. Just below the 'on_link()' test on l.188 in nib.c, I added this:

            /* just found the interface; set the netif too */
            if (node != NULL && iface != 0 && netif == NULL) {
                DEBUG("nib: Setting interface to %u\n", iface);
                gnrc_netif_release(netif);
                netif = gnrc_netif_get_by_pid(iface);
                gnrc_netif_acquire(netif);
            }

Now the gcoap request succeeds as shown in the transcript below.

nib neigh add 6 fe80::845e:22ff:fe47:a59f 86:5e:22:47:a5:9f
> coap get fe80::845e:22ff:fe47:a59f 5683 /time
coap get fe80::845e:22ff:fe47:a59f 5683 /time
gcoap_cli: sending msg ID 52536, 11 bytes
nib: get next hop link-layer address of fe80::845e:22ff:fe47:a59f%0
nib: fe80::845e:22ff:fe47:a59f is on-link or in NC, start address resolution
nib: Setting interface to 6
nib: resolve address fe80::845e:22ff:fe47:a59f%6 from neighbor cache
> gcoap: response Success, code 2.05, 15 bytes
Dec 04 05:38:12
miri64 commented 6 years ago

Hi @kb2ma, can you specify what your set-up is beyond that? Are you using the gcoap example? Are you communicating with the host system or with another native application? If it is the latter: which application?

miri64 commented 6 years ago

Wait, as far as I can see: you aren't providing the interface with the link-local local address. This is illegal in IPv6 and it was actually a bug in the old ND, that it was possible (and could possibly lead to problems in a multi-interface scenario with equal link-local addresses in the neighbors, say two different border routers configured as fe80::1 on different radios).

miri64 commented 6 years ago

With the new NIB you always have to provide the interface with the link-local address (as you would do it also on e.g. Linux).

kb2ma commented 6 years ago

Thanks for the quick response, Martine. I am using the gcoap example. The CoAP client is RIOT, and the server is libcoap on my Ubuntu workstation.

I understand what you are saying with specification of the interface. When I specify this in the sock_udp_ep_t.netif member for the destination, the request works without the other additions I mentioned.

So what is the recommended way to specify the interface? For my workstation tools, I would append a "%tapN" to the address. Does RIOT have this parsing yet in a string to address function? Is it time for a PR to this effect?

miri64 commented 6 years ago

So what is the recommended way to specify the interface? For my workstation tools, I would append a "%tapN" to the address. Does RIOT have this parsing yet in a string to address function? Is it time for a PR to this effect?

Yes there is already a function for this in RIOT: ipv6_addr_split_iface(). But the recommended way would be to not use link-local addresses, but to use GUAs or ULAs.

kb2ma commented 6 years ago

Thanks for the recommendations. I'll try them out. I suggest also updating the README for the gnrc_networking example with these best practices.

miri64 commented 6 years ago

I suggest also updating the README for the gnrc_networking example with these best practices.

Could you maybe do that? I'm not really sure were to add this so a potential newcomer would see that immediately.

kb2ma commented 6 years ago

The gcoap example README needs a similar update as well, so let me do that first. Then I'll see how that translates to gnrc_networking.

miri64 commented 5 years ago

@kb2ma can this issue be closed?

kb2ma commented 5 years ago

We solved the main problem of use of link local addresses. However, we expanded that to an update of the gcoap and gnrc_networking READMEs to recommend (and I assume demonstrate) use of ULA/GUA.

So I guess my answer is a question: How do we keep track of the suggestion to update the gcoap and gnrc_networking READMEs? Should there be some other issue that talks more generally about use of ULA/GUA and addressing in RIOT?

I suggest we close this one and wait until someone is motivated enough to propose and get consensus on a simple, standard way to set up a ULA for development/testing.

aabadie commented 5 years ago

I'd like to mention that I'm facing the same issue between a Raspberry PI with 801.15.4 enabled and which is propagating an ULA prefix (fd00:ab:ad:1e::/64).

Then, when I try to get a CoAP resource on the board with aiocoap-client, I have the following message and the request hangs:

$ aiocoap-client coap://[fd00:ab:ad:1e:7b65:3458:ecde:6b82]/.well-known/core
WARNING:coap:Received Type.ACK from <UDP6EndpointAddress [fe80::7b65:3458:ecde:6b82%lowpan0]:5683 with local address>, but could not match it to a running exchange.

It works if I use:

$ aiocoap-client coap://[fe80::7b65:3458:ecde:6b82%lowpan0]/.well-known/core
</cli/stats>,</riot/board>

By bisecting, I found that 55adbee48808bda975dd5ee420c605bbddf4c94c is the culprit.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

miri64 commented 4 years ago

I think this is still an issue, right?

kb2ma commented 4 years ago

Thanks for flagging this, @miri64. Looks like it got staled just before #12192 took effect. I will review.

kb2ma commented 4 years ago

@miri64, the original issue has been handled. The open question is whether @aabadie's example above has been resolved. It seems like there recently was some adjustment on how address scope is chosen among alternatives.

aabadie commented 4 years ago

Thanks for raising this @kb2ma. I was pretty I commented somewhere about this issue but couldn't remember where :)