eclipse-leshan / leshan

Java Library for LWM2M
https://www.eclipse.org/leshan/
BSD 3-Clause "New" or "Revised" License
653 stars 407 forks source link

Which default DTLS retransmission value ? #1002

Closed sbernard31 closed 3 years ago

sbernard31 commented 3 years ago

Talking about DTLS retransmission timeout value :

Currently (1 seconds) ,

But for some embedded use cases this sounds maybe too short value :

So questions are :

  1. Should we increase this DTLS retransmission default timeout value in Leshan ?
  2. If yes, Which value could be used ?

My first thought (1s) :
This value mainly depends on use cases.
Increasing the value could minimize retransmission but also increase latency.
As we have no clear reason to favor one use case from another, we should keep to recommend value from DTLS rfc6347. (1s)

Seconds thought (9s): Leshan is mainly about IoT, and so maybe DTLS Profiles for IoT should be used. So this rfc recommends 9 seconds but 9 seconds sounds really huge and hard to be considered as good default value ...

Third thought (2s): DTLS retransmission Timeout and CoAP ACK_TIMEOUT seems to have same role and maybe should have same value ?
CoAP defines 2 seconds as default for ACK_TIMEOUT.
Californium and Leshan are using 2 seconds as default value too.
CoAP is mainly an IoT protocol and so its default value (2s) was probably thought for IoT. DTLS is a more generic protocol and so its default value (1s) is maybe not thought for IoT. In this case, maybe Leshan should use default CoAP ACK_TIMEOUT as default value for DTLS retransmission ? (2s)

(a PR about this #1003)

Fourth thought (1s): "IoT use cases" means not so much. There is too many different IoT use cases, so keep it simple and stay with DTLS default rfc6347 value (1s)

Those questions were triggered by https://github.com/eclipse/leshan/issues/998#issuecomment-819229271.

sbernard31 commented 3 years ago

This changes will not concern Leshan 1.x (stable release) but maybe Leshan 2.x (in dev)

sbernard31 commented 3 years ago

@boaks (from californium) @sbertin-telular , @rettichschnidi, @qleisan , @tuve, @mlasch (from Wakaama) @hannestschofenig, @dnav (from OMA)

Any thought or experiences to share about this ?

(Do not hesitate to tell me here or privately If you prefer I didn't ping you for this kind of question)

boaks commented 3 years ago

My favorite:

Though in my experience, the message drop under normal condition is not too high (2-4%), larger values have not that bad impact. Too small values may trigger more retransmissions, so, if that is no issue, also that is possible.

As default, I prefer a initial coap/dtls timeout of 2s, though that value reflects a good trade-off between fast communication without too many retransmissions. But that's just my experience. With Californium 3.0 I consider to adapt the dtls default to 2s and to use a "additional timeout for ecc" of 2s as default.

boaks commented 3 years ago

@thomas-fossati (RFC 7925)

Maybe you add some information about the 9s.

FMPOV, that is mainly caused by ecc. Therefore I tried a experimental extension in Californium, to enlarge that dtls retransmission timeout, if ecc operations are involved.

tuve commented 3 years ago

I there any way at all to negotiate the timeout or does the server set a fixed value for all clients. The reason for asking is that as @boaks mentioned in normal operation (reasonably decent connectivity) 2s should be fine but looking at the protocol stack [1] there is some potentially slower protocols beneath both DTLD and CoAP. In our use case we also are tunneling the udp packets over a BLE serial link and we are looking into lorawan (does not depend on DTLS, but CoAP timeouts)

  1. https://www.openmobilealliance.org/release/LightweightM2M/Lightweight_Machine_to_Machine-v1_1-OMASpecworks.pdf
boaks commented 3 years ago

The nasty thing of such a negotiation is, that it comes with "state". For CoAP, in order to keep that more "stateless", Cocoa was a try. For DTLS, though the retransmission is only used in the handshake, it would not be that easy to negotiate a timeout with that. May be a hello extension, but for now, none is defined.

I use for such topics different server endpoint with different communication parameters.

thomas-fossati commented 3 years ago

Hi Achim,

Maybe you add some information about the 9s.

the 9s is a recommendation, it's not a MUST, so you are allowed to override the value (in both directions) if you know what you are doing. We have set it pretty high for mainly three reasons:

  1. to avoid spurious retransmits when the public key operation is taking long on a constrained node;
  2. to avoid congestion on LLNs when the loss is genuine and the flight to retransmit is bulky (e.g., certificate) and maybe fragmented;
  3. to cater for very high latency variance in certain access technology (e.g., GSM-SMS)

If none of the above apply to your environment, I guess you can set your retransmission value lower than 9s.

boaks commented 3 years ago

Thanks!

For the first two points I tried to mitigate them by the possibility to add some extra time to the timeout, if ecc calculations are expected.

thomas-fossati commented 3 years ago

<dunce-hat-on> I don't know Leshan's API but I guess one way to handle this would be to define a sensible default value and expose the RTO setter/getter to the user. Maybe with some prefab constants too -- e.g., RTO_RFC7925 (9s), RTO_RFC2988 (3s),TIMEOUT_RFC6347 (1s) -- to make it easier to pick. </dunce-hat-on>

boaks commented 3 years ago

@thomas-fossati

I don't know Leshan's API

AFAIK, it's the API of Eclipse/Californium. PR1611 in Eclipse/Californium addresses your idea with the "constants" for the upcoming 3.0.0. Thanks!

sbernard31 commented 3 years ago

Californium change is default value to 2s for DTLS retransmission timeout and default CoAP ACK_TIMEOUT is still 2 seconds.

At Leshan side we will not change this default value for the 2.0.0, so I close this one.