Missing observations during firmware update

qleisan commented 3 years ago

This issue is reported to me verbally I will investigate and update this issue description if needed. Sharing now to allow fast feedback from others

During firmware update observed resource "state" might make quick transitions from Downloading->Downloaded->IDLE due to failed checksum check. It is expected that all states are reported to the server however this is not the case.

1.1.1 spec §5.1.2

Notes: The Minimum Period Attribute indicates the minimum time in seconds the LwM2M Client MUST wait between two notifications. If a notification of an observed Resource is supposed to be generated but it is before pmin expiry, notification MUST be sent as soon as pmin expires. In the absence of this parameter, the Minimum Period is defined by the Default Minimum Period set in the LwM2M Server Account.

sbertin-telular commented 3 years ago

Observes don't guarantee that all changes will be received by the server.

qleisan commented 3 years ago

But are they guaranteed to be sent or is it the latest available observation value that is sent when "pmin" expires? I.e. are they supposed to be queued so that (in this case) no state transition is lost (or rather not sent) due to fast transitions...?

sbertin-telular commented 3 years ago

The LWM2M 1.1.1 specification section 6.4.2 says "This operation includes the new value of the Object Instance or Resource." My interpretation of the "new value" would be the value when the notify is sent, but I suppose depending on the resource and implementation notifies could be queued.

For a highly constrained device it may not be possible to queue values, so the latest would be sent. For a continuously changing resource, I would also expect the latest value to be sent as the queue may not drain as fast as it is filled. The latest value is generally the most relevant.

For Wakaama I would prefer to keep it simple and always report the latest value.

For the specific case of firmware update, the Update Result resource should be checked when a notification of an Idle State is received to see what happened. A timeout or polling read may also be needed in case notifications are lost.

sbernard31 commented 3 years ago

If I correctly understand the issue. This is about a LWM2M firmware update implementation at LWM2M server side based on observe to watch the state (/5/x/3) resource.

Notification is just best effort to synchronize a resource between CoAP client and server meaning that :

some notification can be lost.
- notification order is not guarantee and CoAP client (LWM2M server in this case) implementation generally just drop older notification. (see rfc7641#section-3.4)

It is expected that all states are reported to the server however this is not the case.

So my understanding is that you can not rely on observe to receive all the state. Observe can just be used as a kind of optimization to reduce latency but the implementation must be designed to work even if you never receive any notification.

In my experience observation bring lot complexity and so I can imagine LWM2M device which does not support it and so this is another reason to not rely (exclusively?) on observe for firmware update.

But are they guaranteed to be sent

At LWM2M server side, we don't care if there are guaranteed to be sent as they are not guaranteed to be received :)

or is it the latest available observation value that is sent when "pmin" expires?

LWM2M pmin is inspired by draft-ietf-core-dynlink-13#section-3.2.1 and it says that :

   When present, the minimum period indicates the minimum time, in
   seconds, between two consecutive notifications (whether or not the
   resource state has changed) ...
   A (CoAP) server MAY update the resource state with the last sampled value
   that occured during the pmin interval, after the pmin interval
   expires.

The author described it like this : ""Regardless of whether the value of the resource changes or not, do not send me its value any quicker than what I have specified for pmin"."

And knowing that CoAP observation spirit is more about trying to sync the "most recent state" than caring about a "complete history" (see https://github.com/eclipse/leshan/wiki/LWM2M-Observe#the-spirit-of-observe-feature), I would say that this is the lastest available observation value.

I.e. are they supposed to be queued so that (in this case) no state transition is lost (or rather not sent) due to fast transitions...?

For reason exposed above, I think queuing it does not make so much sense.

I can see another problem with queuing it. Imagine a pmin set to 10s and the resource state change each 5s. You can send only 1 notify each 10s but you will queue 2 value each 10s, it sounds not so good.

qleisan commented 3 years ago

Thanks for your input, it has been useful. Summary of facts relevant to both wakaama and the discussion that initiated this issue are listed below. At this point it is not clear if there is a wakaama code related problem left to solve, I will keep this open some time to get feedback (share with additional people)

packets can always be lost, server side needs to have a timeout/polling mechanism to deal with missing, out-of-order for key pieces of information e.g. firmware update state info. This is (of course) being worked on for the system discussed.
to send a notification the client application must call lwm2m_resource_value_changed() and then lwm2m_step().
default behavior of the wakaama client/server is to send notifications immediately when lwm2m_step() is called. This can be altered by using pmin, pmax and other attributes
default behavior of the wakaama client/server is to send non-confirmable coap messages for notify
LwM2M 1.2 Core spec (5.1.2 Attributes Classification) allows for per resource control of confirmable/non-confirmable notify

sbernard31 commented 3 years ago

Just some details about CON : Using CON instead of NON notify allow devices to be aware that notify are well received by the server. But this does not resolved the reordering "issue". (relative to see rfc7641#section-3.4) I don't know if every body interpret the CoAP spec like Californium project but the current behavior of this CoAP stack is to ACK the too old notify and then just drop it. By dropping I mean that it is not returned to the application layer. (see https://github.com/eclipse/californium/blob/2.6.1/californium-core/src/main/java/org/eclipse/californium/core/coap/ClientObserveRelation.java#L328)

eclipse-wakaama / wakaama

Missing observations during firmware update #548