RIOT-OS / RIOT

RIOT - The friendly OS for IoT
https://riot-os.org
GNU Lesser General Public License v2.1
4.86k stars 1.97k forks source link

examples/gcoap: client broken #19379

Open maribu opened 1 year ago

maribu commented 1 year ago

Description

Sending a get request from one RIOT device to a separate device fails with an error message, something along the line of send failed. Both notes ping each other fine and tests/nanocoap_cli can successfully communicate with the server.

Steps to reproduce the issue

coap get <IP_ADDR> /.well-known/core

Expected results

A GET request is send and the reply is echoed in the shell soonish afterwards.

Actual results

An error message complaining that send failed is printed in the shell.

Versions

Current master.

krzysztof-cabaj commented 1 year ago

Do you use IPv6 address in square brackets? I test this example from current master in the native environment and when I use IPv6 without brackets I received error message gcoap_cli: msg sned failed. However when I used command coap get [2001:db8::1] /.well-known/core I see coap packet in tap interface.

I do little investigation and the _send function uses sock_udp_name2ep function which needs square brackets.

maribu commented 1 year ago

No I didn't :-/

But honestly, that is a mayor regression from the UX point of view. It would make sense to enforce square brackets if the cmd would expect an URI (e.g. coap://[IP-6-ADDR]/uri/path), as this is the way to pass IPv6 addresses in URIs. But if the cmd asks for an hostname / address as one shell cmd and a URI pass as another, the square brackets make no sense.

maribu commented 1 year ago

Also: Both the gcoap and the nanocoap shell cmd give cryptic error messages claiming send failed when a reply with an error status code came in.

The underlying issues is that the API used is broken by design: It converts CoAP status codes to errno codes (e.g. 4.XY to -ENXIO if I recall correctly). That's complete nonsense and makes it complete impossible to write correctly behaving clients. E.g. if a CoAP client would want to make use if a CoAP Option the server doesn't support (and it is a critical CoAP Option), the server would reply with Bad Option as status code. A client could then retry without that option; but only if it would actually know that the CoAP Option was the issue.

krzysztof-cabaj commented 1 year ago

Hi! I investigate sock_udp_name2ep once again and realize that it uses sock_tl_str2ep. As this function parse also port and netif, for IPv6 address brackets are mandatory. In other case there is a problem which : is part of IPv6 address and which separate IPV6 address and port.

However, I add work around (rather dirty hack - I'm not proud of it :( ) which checks if user give only IPv6 address - without port or netif. Of course if you would like for example add port - brackets are needed. This fix issue with gcoap client. Now both commands coap get [2001:db8::1] /.well-known/core and coap get 2001:db8::1 /.well-known/core works.

The code is on my branch - sock_util. If you think such solution is allowed I could do a PR.

maribu commented 1 year ago

Maybe the cleanest approach would be to change the command line syntax to expect a full URI? That way the API can be reused, it is obvious that square brackets are needed for IPv6 literals, and the number of argument change so that things at least break noticeably?

krzysztof-cabaj commented 1 year ago

As a user I would expected the simplest command. Maybe a good solution will be something similar to the CLI from tests/nanocoap_cli, for example:

coap get <addr>[%iface] [port] <path>

where <...> denotes mandatory and [...] optional. So without netif/iface and using default port I works as you - and most of the users expected:

coap get 2001:db8::1 /.well-known/core

If you agree with such solution, I could try to refactor gcoap_cli_cmd (function for more than 200 lines ... without comments ;) ) and change parameters to fix this issue.

maribu commented 1 year ago

That makes sense. The cli interface has been like this since recently, anyway.

While at it: Would you mind to move that to an a module in sys/shell? I think it could be quite useful to be able to add a shell CoAP client to a custom app be just selecting a module.

krzysztof-cabaj commented 1 year ago

Ok. I try to fix CLI interface.

I did not think before about making a dedicated sys/shell module. When I fix this issue we could think about it :D.

miri64 commented 1 year ago

Could you, if possible, rewrite it so that it expects URIs (e.g. using the uri_parser module)? This way we would not need two distinct gcoap and gcoap_dtls examples anymore.

krzysztof-cabaj commented 1 year ago

Hi! Sorry, for many questions, but I would like detect all possible problems and needed features before I start coding ;).

I looked at examples/gcoap and examples/gcoap_dtls - they share this same code, however depending on the magic define GCOAP_ENABLE_DTLS in the makefile the second one use only DTLS. The idea is to make one example (examples/gcoap ?) which dynamicaly - depending on arguments from CLI use coap or coaps?

mguetschow commented 2 months ago

We have a potentially related issue on current master for the release tests: https://github.com/RIOT-OS/RIOT/actions/runs/9524990270/job/26258574295#step:15:468

Quoting from there:

> coap get -c [fd00:bbbb::1]:5683 /time
coap get -c [fd00:bbbb::1]:5683 /time

Could not parse URI
usage: coap <get [-o|-d]|post|put> [-c] <URI> [data]
       coap ping <scheme>://<host>[:port]
       coap info
       coap proxy set <scheme>://<host>[:port]
       coap proxy unset
Options
    -c  Send confirmably (defaults to non-confirmable)
> 

I was also able to reproduce this locally with examples/gcoap. Anyone feels like taking a closer look?

mguetschow commented 2 months ago

Aha, regression from #20554, the release tests need to be adapted accordingly.

mguetschow commented 2 months ago

There you go: https://github.com/RIOT-OS/Release-Specs/pull/308

fabian18 commented 2 months ago

So the client is not broken, just the test had to be adapted?

mguetschow commented 2 months ago

I don't know what the current state of this issue is, but the failing release test is actually unrelated to it - sorry for the noise here.