OpenMobileAlliance / OMA_LwM2M_for_Developers

OMA LightweightM2M public resources.
http://openmobilealliance.github.io/OMA_LwM2M_for_Developers/
Other
239 stars 52 forks source link

DTLS failure handling #160

Closed kiemlicz closed 6 years ago

kiemlicz commented 8 years ago

All questions assume DTLS usage.

Given following information: (from specification)

Question 1: In case of losing the DTLS state (e.g. server restart), it turns out that the LWM2M client should perform Client Initiated Bootstrap. The Register Interface messages will be dropped (server won't be able to decrypt them). Client will perform bootstrap to the LWM2M bootstrap server which may not be reachable at all times. Moreover it it serious overhead, considering that only DTLS session is gone. Maybe the fallback action should be to simply re-start current DTLS session.

Question 2: As (usually) the device initiates the LWM2M connection starting the DTLS handshake (sends ClientHello). Imagine that it registers for lifetime = 24h, binding mode = U and during that time the server restarts (the in-memory dtls state is gone). Device does not send update (it is not yet time for update). Server wants to send requests via Management Interface. Can it initiate DTLS connection (can LWM2M server send ClientHello)? - It is not stated explicitly.

boaks commented 8 years ago

Question 1: Maybe the fallback action should be to simply re-start current DTLS session.

I would also do so :-). Currently the trigger for the bootstrap is base on a "use case" (changed credentials). But in my opinion, the behaviour (timeout) would occur more frequently based on other "use cases" then on that. Therefore this trigger will cause a "flood" of unwanted bootstraps. I think, it will not survive the reality!

Question 2:

The server may even not know the IP of the client :-). Or the "route through the NATs" has been closed and the client is not reachable. So, if you wan't to reach your client, your client has to update much more frequently than the lifetime.

(can LWM2M server send ClientHello)?

issue:

2

From my view, this mixes up a lot. But there are more people out, wo belief in that approach of a "CLIENT_HELLO" from the LWM2M server.

boaks commented 8 years ago

and issue:

145

kiemlicz commented 8 years ago

The server may even not know the IP of the client :-). Or the "route through the NATs" has been closed and the client is not reachable. So, if you wan't to reach your client, your client has to update much more frequently than the lifetime.

Seems to me that LWM2M is also designed to fit the environments when the NAT is not the issue (e.g. some wireless sensor network fields). Generating update messages that often seems to be real overhead (especially for highly constrained devices). In my opinion sending the ClientHello from LwM2M server side could solve this. Or the DTLS HelloRequest from the LwM2M server side is the way to go?

From my view, this mixes up a lot.

Could you please outline problems that it introduces?

boaks commented 8 years ago

Seems to me that LWM2M is also designed to fit the environments when the NAT is not the issue (e.g. some wireless sensor network fields).

Sure, if you don't have NATs and you are running in "static address" environment, there are less problems and more possiblities. But then, I would at least like to have some statements about the best practice in the different situations.

Could you please outline problems that it introduces?

PSK: HANDSHAKE from device: Device sends identity according the security instance related to the target server. Stable and used "as intended".

HANDSHAKE from server: Server sends identity according the "last valid ip-address/port" of the device. More or less dynamic and the intention is inverted.

It's like a PC reports on login your name :-).

hannestschofenig commented 7 years ago

Regarding Question #1: I think that the CoAP specification is incorrect with regards to the references to the epoch values. This makes no sense. This prevents you from just re-establishing a new DTLS session.

However, the problem is not as bad as described since a device does not just randomly loose its DTLS state (which would typically be in RAM). If that happens, for example, when the device crashes then it better starts to renegotiate the bootstrap procedure and the connection establishment again. The DTLS CID concept wouldn't help in that case either since you would loose all the relevant security context.

Regarding Question #2: The LwM2M server could initiate a server-initiated bootstrapping but most likely that's not going to be supported by the LwM2M client. Hence, if the device was programmed to only initiate the communication first and the server crashed then by definition the server cannot connect to the client. Additionally, there will be the issue that the server may not necessarily know where to contact the client anyway. So, when you design your system you need to determine which features you care about. If you have a frequently crashing server then you can either lower the registration update interval or incorporate a server-initiated bootstrapping feature into the client.

boaks commented 7 years ago

Just to mention: The numerical value of the DTLS epoch (RFC6347, section 4.1, record layer) is not usable as indicator for the "same epoch".

See https://www.ietf.org/mail-archive/web/core/current/msg08820.html and to follow up answers.

hannestschofenig commented 7 years ago

Question 1: In case of losing the DTLS state (e.g. server restart), it turns out that the LWM2M client should perform Client Initiated Bootstrap. The Register Interface messages will be dropped (server won't be able to decrypt them). Client will perform bootstrap to the LWM2M bootstrap server which may not be reachable at all times. Moreover it it serious overhead, considering that only DTLS session is gone. Maybe the fallback action should be to simply re-start current DTLS session.

[Answer by Hannes]

If the server does an orderly restart then the DTLS stack sends a CloseNotify message. The LwM2M Client should therefore be aware of the server going down.

If the server crashes then no CloseNotify message will be sent. In this case, the LwM2M will at latest send a registration update and will not receive a response back. An implementation could then use this information as a hint that something went wrong and should therefore re-establish state (new registration, new security context, and new observations).

Since server crashes are not that common I believe it makes little sense to optimize for such an error case. Instead better make sure that the server does not crash too often.

Question 2: As (usually) the device initiates the LWM2M connection starting the DTLS handshake (sends ClientHello). Imagine that it registers for lifetime = 24h, binding mode = U and during that time the server restarts (the in-memory dtls state is gone). Device does not send update (it is not yet time for update). Server wants to send requests via Management Interface. Can it initiate DTLS connection (can LWM2M server send ClientHello)? - It is not stated explicitly.

[Answer by Hannes]

The server will typically not be able to contact the device for a number of reasons. For example, the device does not implement a DTLS server, the device is behind a firewall, the crashed server forgot the IP address of the client or the port number.

Note that the registration updates are not necessarily the only messages the LwM2M client may send. It could, for example, send keep alive messages to make sure that NAT/firewall state does not disappear.

[Summary by Hannes]

There does not seem to be any implication to the specification itself. Are there?

hannestschofenig commented 6 years ago

LwM2M v1.1 will provide more details about error handling.