Coap TCP keep alive using ping messages

chrysn / aiocoap

The Python CoAP library

Other

261 stars 119 forks source link

Coap TCP keep alive using ping messages #138

Open lcoudeville opened 5 years ago

lcoudeville commented 5 years ago

I'm encountering issues when sending PING messages on a tcp coap (coap+tcp schema) connection. I expected this to be quite simply so I started from the client example.

            initial_request  = Message(code=Code.PING, uri="coap+tcp://%s:%d/blbalba" % self.controller_addr)

            response =  await self.coap_context.request(initial_request).response
            # this works properly, now an TCP connection exists, send PING messages over it.

            ping = Message(code=Code.PING, uri="coap+tcp://%s:%d" % self.controller_addr)

            try:
                await self.coap_context.request(ping).response
            except:
                logging.exception("CoAP ping failed.")

Traceback (most recent call last):
  File "xxx.py", line 211, in _coap_client_keep_alive
    await self.coap_context.request(ping).response
  File "/usr/lib/python3.5/asyncio/futures.py", line 361, in __iter__
    yield self  # This tells Task to wait for completion.
  File "/usr/lib/python3.5/asyncio/tasks.py", line 296, in _wakeup
    future.result()
  File "/usr/lib/python3.5/asyncio/futures.py", line 274, in result
    raise self._exception
  File "/home/laurens/projects/lynx/components/lynxathome/lah-node/venv/lib/python3.5/site-packages/aiocoap/protocol.py", line 718, in _run_outer
    await cls._run(app_request, response, weak_observation, protocol, log)
  File "/home/laurens/projects/lynx/components/lynxathome/lah-node/venv/lib/python3.5/site-packages/aiocoap/protocol.py", line 743, in _run
    await protocol.find_remote_and_interface(app_request)
  File "/home/laurens/projects/lynx/components/lynxathome/lah-node/venv/lib/python3.5/site-packages/aiocoap/protocol.py", line 293, in find_remote_and_interface
    if await ri.fill_or_recognize_remote(message):
  File "/home/laurens/projects/lynx/components/lynxathome/lah-node/venv/lib/python3.5/site-packages/aiocoap/tokenmanager.py", line 179, in fill_or_recognize_remote
    return await self.token_interface.fill_or_recognize_remote(message)
  File "/home/laurens/projects/lynx/components/lynxathome/lah-node/venv/lib/python3.5/site-packages/aiocoap/transports/tcp.py", line 469, in fill_or_recognize_remote
    if message.requested_scheme == self._scheme:
  File "/home/laurens/projects/lynx/components/lynxathome/lah-node/venv/lib/python3.5/site-packages/aiocoap/message.py", line 539, in requested_scheme
    return self.request.requested_scheme
AttributeError: 'Message' object has no attribute 'request'

Replacing Code.PING with CODE.GET works, but without expected result obviously. I don't understand how it's possible that a property request is expected or can be expected on a Message class.

I'm using the current master.

What am I doing wrong?

lcoudeville commented 5 years ago

Diving in the code learned me that PING messages are signal messages and not response/request messages. However I did not reveal a legal way to send them except hacking the tcp connection itself from the TCPClient._pool. I didn't find a proper way even in the tests to send them. I guess a built-in manner to maintain a TCP connection might be interesting for others as well.

This code did the job for me:

   ping = Message(code=Code.PING)
   for tcp_conn in self.coap_client_tokenmanager.token_interface._pool.values():
        tcp_conn._send_message(ping)

If somebody can give some insight where this could should be I can contribute this to the community.

I see a few options:

adding an option in the tcpclient and/or tcpserver which sends them periodically for the entire connection pool
add this feature in the TCPConnection class.
on a higher level e.g. in the tokermanager?

chrysn commented 5 years ago

Ping messages are internal to the chosen protocol (CoAP-over-TCP), and so far not exposed in aiocoap. The request method is, as you discovered, there for using the request/response layer of CoAP (provided eg. by the token manager), where there exist neither pings nor empty messages.

Where it'd make sense to add support for sending pings depends on what they are used for -- I haven't found a case for it myself yet.

What do you use signalling pings for? Do you use/need empty messages (RFC8232 Section 3.4) or TCP keepalives as well? Do you use the custody option? Based on that, we can try to figure out where to best put them.

lcoudeville commented 5 years ago

I really like the fact that those messages are handled internally - this makes aiocoap compatible with libraries that have this functionality - so well done. But there is a problem when using aiocoap to send such messages.

Sending empty messages is an alternative but does't telly anything if the other end is alive and kicking. For example if the socket on the other end might be alive but the other end will never be able to reply (I agree this is an issue on the other end but it would be nice to verify that the other side is ok).

I made a small change to codes.py which makes it possible to send ping requests. Consider my output of git diff below:

diff --git a/aiocoap/numbers/codes.py b/aiocoap/numbers/codes.py
index 4981839..ee6d87f 100644
--- a/aiocoap/numbers/codes.py
+++ b/aiocoap/numbers/codes.py
@@ -69,15 +69,15 @@ class Code(ExtensibleIntEnum):

     def is_request(self):
         """True if the code is in the request code range"""
-        return True if (self >= 1 and self < 32) else False
+        return True if (self >= 1 and self < 32) or self == Code.PING else False

     def is_response(self):
         """True if the code is in the response code range"""
-        return True if (self >= 64 and self < 192) else False
+        return True if (self >= 64 and self < 192) or self == Code.PONG else False

     def is_signalling(self):
-        return True if self >= 224 else False
+        return self in (Code.CSM, Code.PING, Code.RELEASE, Code.ABORT)

This allows to initiate a PING request and await for the reply (see code below, it's a beauty). However, it doesn't feels right that PING is still in the is_signalling list because . On the other hand, PING and PONG are signalling message and belong both to the 'is_signalling' messages. Ping is still in the "is_signalling" list because ping messages, which expect pong replies, would be propagated else.

This is how I can send messages to certain connections:

    for tcp_conn in self.coap_client_tokenmanager.token_interface._pool.values():
        ping = Message(code=Code.PING)
        ping.remote = tcp_conn

        logging.debug("Send ping")
        await self.coap_context.request(ping).response
        logging.debug("Pong received")

I really like this because I can use asyncio.wait_for(timeout) now if I don't receive a reply within the expected interval.

chrysn commented 5 years ago

On Mon, Jan 07, 2019 at 04:43:59PM +0000, lcoudeville wrote:

But there is a problem when using aiocoap to send such messages.

I think that this is the core misunderstanding here -- "messages" are something that happens in the serialization of CoAP. On the "upper" side of the protocol (which is what aiocoap primarily exposes), there are no messages, only requests and responses, and the .request() / .response mechanism will not cater for anything else, because it needs to be agnostic of the underlying transports.

Ping/Pong happens to have request/response semantics too, but with different characteristics (eg. a request is sent to an origin server, while a ping is sent to the directly connected endpoint). Shoehorning them into the request/response mechanism will break at latest when working with proxies.

Before we jump to implementation choices, my question is still: What is the problem you are trying to solve by using Ping/Pong? That can then guide us to the adaequate API.

lcoudeville commented 5 years ago

Well actually I'm having a CoAP server with CoAP clients which are in a NAT or private network, so in fact there a firewall in front of them before they reach an other network. So only the client can take the initiative to connect to the coap server. The server is not able to reach them except there is port forwarding enabled in the firewall/router. So I'm want to use Ping and Pong messages on the client initiative to ensure that the transport level is reliable and that the coap server is healthy.

I agree that for transport only I can use empty messages as well. However I was not able yet to get them work because due to a similar issue of ping/pong. Empty has code 0.00 or integer value 0. So it's not in the is_request range either, so I don't know yet how to use them. Except of hacking the tcp connection and use the _send_message function.

So what I'm actually want to do is build a mechanism were I can send ping/pong messages and if I don't get a reply withing X ping messages I consider the connection and/or server endpoint as dead. When a connection is being marked as "dead" I'll try to recreate a connection.

My primary goal is to ensure that the transport is reliable without sending "real requests". That's something that is mentioned in the signalling rfc 8323: https://tools.ietf.org/html/rfc8323#section-5.4 .

Do you have any purposes on how to implement them without breaking the request/response semantics?

chrysn commented 5 years ago

So just to make sure I understand your particular setup:

Server --- Internet --- Local router --- Client
                          with NAT

where the local router closes connections even though they are perfectly alive.

Do you do role reversal? (I.e. does your server send requests to the client?) Do you have observations running? Do you have requests that take so long to respond to that the connection is closed between request and response? If neither: Why does it matter whether the connection stays on -- for latency? How do you determine how often to send pings?

As long as aiocoap has no support for pings (which I'm fully for adding and appreciative of patches, but would first like to understand the problem to solve), what I'd suggest is a GET to a short or non-existing resource (eg. ctx.request(Message(code=GET, uri=hostname), handle_blockwise=False)), if you need to ship something fast; but I'm confident we can get to the point of what we really need here with a few exchanges.

lcoudeville commented 5 years ago

Well actually I make a distinction between tcp and "coap" terms. I have a tcp client which acts as coap server and client over the same "coap context". I use a server context on both sides. (I've already described my setup in https://github.com/chrysn/aiocoap/pull/132)

The tcp client, which is a coap client and server, connects to a tcp server which also acts as a coap client and server. So the connection that is initiated by the client can be used in both directions, the tcp "server" can make requests to the "tcp client" and reversed.

Without persistent connection the server can't make a request if the client is not connected. Except port forwarding is configured on the router and the possibly client address(es) are known on the server.

In tcp terms:
Server -- Internet -- NAT -- Client

Coap:

Server/Client -- Internet -- NAT -- Client/Server

I'm aware of the fact that I can send GET messages to an "empty" resource, but I don't want to generate unnecessary load on the coap server.