chrysn / aiocoap

The Python CoAP library
Other
267 stars 120 forks source link

Canceling Scheduled Retransmissions After Timeout in Asynchronous CoAP Requests #363

Open euartur opened 2 months ago

euartur commented 2 months ago

First of all, thank you very much for this awesome library. We use it in a production environment for quite some time. So far, we have been very happy users!

Issue: Canceling Scheduled Retransmissions After Timeout in Asynchronous CoAP Requests

We have encountered an issue with canceling scheduled retransmissions when a timeout occurs during a CoAP request for CON-type messages.

Context

In our code, we maintain a CoAP context using create_client_context() and send requests asynchronously with a message created via Message(). The requests are wrapped with asyncio.wait_for()to enforce an absolute timeout. Here is a simplified version of the code:

import asyncio
from aiocoap import Code, Context, Message

# Creating the context
_protocol = await Context.create_client_context(loop=loop)

# Send first request with a short timeout.
request1 = Message(
   code=Code.PUT,
   uri="coap://localhost/ping",
   payload=b"1",
)
future_response = _protocol.request(request1).response

## Given: CoAP server on localhost will not receive the first 3 messages.

try:
   _ = await asyncio.wait_for(future_response, timeout=1.0)
except TimeoutError as e:
   print(f"Request timed out: {e}")

# Schedule a 2nd request using the same context to avoid MID and token reuse.
request2 = Message(
   code=Code.PUT,
   uri="coap://localhost/another_endpoint",
   payload=b"2",
)
future_response = _protocol.request(request2).response
response = await asyncio.wait_for(future_response, timeout=30.0)

## Unexpected: request1 retransmissions are still active and blocking request2 from proceeding.
## Just once request1 is satisfied (late response or ConRetransmitsExceeded), request2 proceeds.

The Issue

When the asyncio.wait_for() exceeds the specified timeout, we expect all resources tied to the message transmission and retransmissions to be properly cleaned up. However, we’ve noticed that the context continues to attempt retransmissions of the timed-out message. These retransmissions occur even when a new CoAP request is made, which results in the old message being sent first before the new request is processed.

Wireshark logs show retransmission of an old message (MID: 53910) 3 seconds after the original send (which had a timeout of 1 second, as in the example above), even though a new message (MID: 53911) is queued for transmission. The old message is sent and acknowledged before the new message is processed.

No. Time        src dst Proto   Len Info
8   2.387441446 ::1 ::1 CoAP    84  CON, MID:53910, POST, TKN:c7 d4, /d/loopback
17  5.015088905 ::1 ::1 CoAP    84  CON, MID:53910, POST, TKN:c7 d4, /d/loopback [Retransmission]
22  5.019683191 ::1 ::1 CoAP    73  ACK, MID:53910, 2.05 Content, TKN:c7 d4, /d/loopback
23  5.020051706 ::1 ::1 CoAP    89  CON, MID:53911, POST, TKN:c7 d5, /d/loopback2
26  5.022766242 ::1 ::1 CoAP    77  ACK, MID:53911, 2.05 Content, TKN:c7 d5, /d/loopback2

Investigation and Expected Behavior

We have investigated potential control options like ACK_TIMEOUT, ACK_RANDOM_FACTOR via TransportTuning. However, it seems that the expected behavior (canceling retransmissions upon timeout) would ideally be controlled by a mutable REQUEST_TIMEOUT (from aiocoap.numbers.constants) or making the message future propagating a cancellation to its active transactions.

What we are looking for: We need a way to ensure that when a request times out, all scheduled retransmissions are canceled, while the context remains alive to preserve aspects like the MID counter. This would allow the next request to start fresh without resending old messages.

Is there a built-in mechanism to handle absolute timeouts, or is there a suggested way to support aiocoap to achieve this behavior?