eclipse-leshan / leshan

Java Library for LWM2M
https://www.eclipse.org/leshan/
BSD 3-Clause "New" or "Revised" License
652 stars 407 forks source link

LwM2mNodeSenMLDecoder sometimes fails trying to convert from double to long #1170

Closed sonny82 closed 2 years ago

sonny82 commented 2 years ago

Hi,

We are using Leshan 2.0.0-M5. In the log files I sometimes see that when reading currentTime resource from device object (3/0/13) decoding sometimes fails with the following exception:

08:50:47 [CoapServer(main)#27] ERROR c.n.s.d.s.s.d.ManagerImpl - Endpoint 94193A0407000000: Read request for path 3 failed with exception: {} org.eclipse.leshan.core.request.exception.InvalidResponseException: Unable to decode response payload of request [ReadRequest [path=/3 format=null]] from client [94193A0407000000] at org.eclipse.leshan.server.californium.request.LwM2mResponseBuilder.decodeCoapResponse(LwM2mResponseBuilder.java:458) at org.eclipse.leshan.server.californium.request.LwM2mResponseBuilder.visit(LwM2mResponseBuilder.java:122) at org.eclipse.leshan.core.request.ReadRequest.accept(ReadRequest.java:187) at org.eclipse.leshan.server.californium.request.RequestSender$2.buildResponse(RequestSender.java:226) at org.eclipse.leshan.core.californium.AsyncRequestObserver$1.onResponse(AsyncRequestObserver.java:60) at org.eclipse.leshan.core.californium.CoapAsyncRequestObserver.onResponse(CoapAsyncRequestObserver.java:99) at org.eclipse.californium.core.coap.Request.setResponse(Request.java:931) at org.eclipse.californium.core.server.ServerMessageDeliverer.deliverResponse(ServerMessageDeliverer.java:258) at org.eclipse.californium.core.network.stack.BaseCoapStack$StackTopAdapter.receiveResponse(BaseCoapStack.java:217) at org.eclipse.californium.core.network.stack.AbstractLayer.receiveResponse(AbstractLayer.java:89) at org.eclipse.californium.core.network.stack.ExchangeCleanupLayer.receiveResponse(ExchangeCleanupLayer.java:105) at org.eclipse.californium.core.network.stack.ObserveLayer.receiveResponse(ObserveLayer.java:139) at org.eclipse.californium.core.network.stack.BlockwiseLayer.receiveResponse(BlockwiseLayer.java:776) at org.eclipse.californium.core.network.stack.ReliabilityLayer.receiveResponse(ReliabilityLayer.java:308) at org.eclipse.californium.core.network.stack.AbstractLayer.receiveResponse(AbstractLayer.java:89) at org.eclipse.californium.core.network.stack.BaseCoapStack.receiveResponse(BaseCoapStack.java:146) at org.eclipse.californium.core.network.CoapEndpoint$1.receiveResponse(CoapEndpoint.java:314) at org.eclipse.californium.core.network.UdpMatcher$4.run(UdpMatcher.java:457) at org.eclipse.californium.elements.util.SerialExecutor$1.run(SerialExecutor.java:290) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: org.eclipse.leshan.core.node.codec.CodecException: Invalid content [1.638435E9] for type TIME for path /3/0/13 at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.parseResourceValue(LwM2mNodeSenMLDecoder.java:504) at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.extractLwM2mResources(LwM2mNodeSenMLDecoder.java:426) at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.parseRecords(LwM2mNodeSenMLDecoder.java:193) at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.decode(LwM2mNodeSenMLDecoder.java:97) at org.eclipse.leshan.core.node.codec.DefaultLwM2mDecoder.decode(DefaultLwM2mDecoder.java:156) at org.eclipse.leshan.core.node.codec.DefaultLwM2mDecoder.decode(DefaultLwM2mDecoder.java:138) at org.eclipse.leshan.server.californium.request.LwM2mResponseBuilder.decodeCoapResponse(LwM2mResponseBuilder.java:450) ... 21 common frames omitted Caused by: java.lang.IllegalStateException: Can not convert 1.638435E9 to long safely : Unsupported number java.lang.Double at org.eclipse.leshan.core.util.datatype.NumberUtil.numberToLong(NumberUtil.java:59) at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.numberToLong(LwM2mNodeSenMLDecoder.java:527) at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.parseResourceValue(LwM2mNodeSenMLDecoder.java:493) ... 27 common frames omitted

Best regards, Sonny

sbernard31 commented 2 years ago

Are you using senml-json or senml-cbor ?

sbernard31 commented 2 years ago

could you share the exact payload (byte for cbor / string for json) sent by the client ? (or a wireshark capture)

sonny82 commented 2 years ago

Hmm, I don't set a specific ContentFormat in ReadRequest, so I'm not sure which one is used by default. I'll try to capture the exact payload but the problem is this happens rarely (maybe once or several times per day).

Note: I only 4 devices for development/testing, we're not in production yet.

sbernard31 commented 2 years ago

Hmm, I don't set a specific ContentFormat in ReadRequest, so I'm not sure which one is used by default.

You can put in the ReadRequest the format you would like to be used for the response. If you set nothing, the client will chose its preferred format but in all case the format used to encode the response payload is in the response.

Eg. :

Frame 218: 64 bytes on wire (512 bits), 64 bytes captured (512 bits) on interface lo, id 0
Ethernet II, Src: 00:00:00_00:00:00 (00:00:00:00:00:00), Dst: 00:00:00_00:00:00 (00:00:00:00:00:00)
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
User Datagram Protocol, Src Port: 52954, Dst Port: 5683
Constrained Application Protocol, Acknowledgement, 2.05 Content, MID:32713
    01.. .... = Version: 1
    ..10 .... = Type: Acknowledgement (2)
    .... 1000 = Token Length: 8
    Code: 2.05 Content (69)
    Message ID: 32713
    Token: 7c515cfbdc858367
    Opt Name: #1: Content-Format: application/vnd.oma.lwm2m+tlv
        Opt Desc: Type 12, Elective, Safe
        1100 .... = Opt Delta: 12
        .... 0010 = Opt Length: 2
        Content-type: application/vnd.oma.lwm2m+tlv
    End of options marker: 255
    Payload: Payload Content-Format: application/vnd.oma.lwm2m+tlv, Length: 6
        Payload Desc: application/vnd.oma.lwm2m+tlv
        [Payload Length: 6]
    [Uri-Path: /3/0/13]
    [Request In: 217]
    [Response Time: 0.014677148 seconds]
Lightweight M2M TLV, Device (1 element)

See in my example : Content-type: application/vnd.oma.lwm2m+tlv

sbernard31 commented 2 years ago

Just speculation,

I'll try to capture the exact payload but the problem is this happens rarely (maybe once or several times per day).

It seems at some point we get a double (a floating point number) in Leshan where we expect a long (an integer). In cbor number format is pretty clear but in json this is more fuzzy. So I will bet that your application rather use senml+json ?

My guess is when the timestamp ends with several 0, your client will use the exponential format. E.g. in your stacktrace : 1.638435E9 for 1638435000 (for 2021-12-02T08:50:00.000Z)

I also guess our json library (jackson) convert a timestamp like 1.638435E9 in double and 1638435000 in long.

Why sometime it works ? because when timestamp does ends with many 0 maybe you client use the "normal notation" like 1638451878

Anyway, this is too many guess and I would like to be sure so I wait for more input from you.

sbernard31 commented 2 years ago

Any news about this ? I'm really curious to know what happens as maybe we should be more flexible here. :thinking:

sonny82 commented 2 years ago

Not yet. I tried to record the messages using Wireshark during the night to catch this error but system crashed somewhere at midnight due to a lack of memory.. I'll post Wireshark capture as soon as I'll have it.

sonny82 commented 2 years ago

Content format is "Content-Format":"application/senml+json".

sbernard31 commented 2 years ago

OK so I will try to test how Leshan behave with JSON number exponential notation.

sbernard31 commented 2 years ago

I'm able to reproduce it : https://github.com/eclipse/leshan/pull/1174.

I will try to add a safe conversion in NumberUtils.

sbernard31 commented 2 years ago

@sonny82 #1174 should fix your issue.

sonny82 commented 2 years ago

Great, thanks.

sbernard31 commented 2 years ago

This is not yet available in master If you can test the PR and tell me if it works for you, this would be great. If you can not test, let me know this way I don't wait for you before to merge. :wink:

sonny82 commented 2 years ago

I'm using senml-cbor now. In order to test this properly I would have to let it run for at least one day, but we are solving some other issues now, so I'm not sure when I'll be able to do it. Please, don't wait with the merge (regarding the fact hat you are able to reproduce this issue in a unit test). In any case, I'll do some more testing when I'll have time and I'll also test using senml-json. BR

sbernard31 commented 2 years ago

@sonny82 could I ask how you are using Leshan ? If it makes sense maybe you could consider to comment at : https://github.com/eclipse/leshan/issues/830

sonny82 commented 2 years ago

Sure. We are working as an outsourcing partner for a company that has an existing generic backend for managing different kinds of devices (electric meters, water meters, heat meters, ect). On this specific project they have to integrate and manage around 13.000 heat meters, where each meter has an attached NB-IoT module that supports LWM2M 1.1. Meter data will be sent using MQTT-SN protocol, LWM2M (Leshan 2.0) will be used for bootstrapping (LWM2M DM server + MQTT-SN server) and for management of the NB-IoT module (FW upgrades, tracking battery and signal strength information, configuration of readout and report intervals, etc.). Devices will be battery powered and will use queue mode.

In the first version we will not use Leshan cluster (not sure if it's really needed as the number of devices is not that big). We will use DTLS. Due to security reasons we are not allowed to use existing Redis or file based security stores. We will retrieve keys from a secure key store on demand when needed and then use in-memory security and configuration store. Communication with the backend will be done using the REST API. We developed a simple API using async servlets. End customer doesn't need GUI (backend has it's own GUI), however, demo-GUI will also be deployed for easier admin-management. We are not in production yet. I'll also write a comment to #830.

sonny82 commented 2 years ago

hi,

Now we found a similar problem using SENM_CBOR: Can not convert 3000.0 to long safely

[CoapServer(main)#6] ERROR c.n.s.d.s.s.d.ModuleManagerImpl - Endpoint 94193A04070000EA: Read request for path 3 failed with exception: {} org.eclipse.leshan.core.request.exception.InvalidResponseException: Unable to decode response payload of request [ReadRequest [path=/3 format=SENML_CBOR(112)]] from client [94193A04070000EA] at org.eclipse.leshan.server.californium.request.LwM2mResponseBuilder.decodeCoapResponse(LwM2mResponseBuilder.java:458) at org.eclipse.leshan.server.californium.request.LwM2mResponseBuilder.visit(LwM2mResponseBuilder.java:122) at org.eclipse.leshan.core.request.ReadRequest.accept(ReadRequest.java:187) at org.eclipse.leshan.server.californium.request.RequestSender$2.buildResponse(RequestSender.java:226) at org.eclipse.leshan.core.californium.AsyncRequestObserver$1.onResponse(AsyncRequestObserver.java:60) at org.eclipse.leshan.core.californium.CoapAsyncRequestObserver.onResponse(CoapAsyncRequestObserver.java:99) at org.eclipse.californium.core.coap.Request.setResponse(Request.java:931) at org.eclipse.californium.core.server.ServerMessageDeliverer.deliverResponse(ServerMessageDeliverer.java:258) at org.eclipse.californium.core.network.stack.BaseCoapStack$StackTopAdapter.receiveResponse(BaseCoapStack.java:217) at org.eclipse.californium.core.network.stack.AbstractLayer.receiveResponse(AbstractLayer.java:89) at org.eclipse.californium.core.network.stack.ExchangeCleanupLayer.receiveResponse(ExchangeCleanupLayer.java:105) at org.eclipse.californium.core.network.stack.ObserveLayer.receiveResponse(ObserveLayer.java:139) at org.eclipse.californium.core.network.stack.BlockwiseLayer.receiveResponse(BlockwiseLayer.java:776) at org.eclipse.californium.core.network.stack.ReliabilityLayer.receiveResponse(ReliabilityLayer.java:308) at org.eclipse.californium.core.network.stack.AbstractLayer.receiveResponse(AbstractLayer.java:89) at org.eclipse.californium.core.network.stack.BaseCoapStack.receiveResponse(BaseCoapStack.java:146) at org.eclipse.californium.core.network.CoapEndpoint$1.receiveResponse(CoapEndpoint.java:314) at org.eclipse.californium.core.network.UdpMatcher$4.run(UdpMatcher.java:457) at org.eclipse.californium.elements.util.SerialExecutor$1.run(SerialExecutor.java:290) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: org.eclipse.leshan.core.node.codec.CodecException: Invalid content [3000.0] for type INTEGER for path /3/0/7 at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.parseResourceValue(LwM2mNodeSenMLDecoder.java:504) at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.extractLwM2mResources(LwM2mNodeSenMLDecoder.java:452) at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.parseRecords(LwM2mNodeSenMLDecoder.java:193) at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.decode(LwM2mNodeSenMLDecoder.java:97) at org.eclipse.leshan.core.node.codec.DefaultLwM2mDecoder.decode(DefaultLwM2mDecoder.java:156) at org.eclipse.leshan.core.node.codec.DefaultLwM2mDecoder.decode(DefaultLwM2mDecoder.java:138) at org.eclipse.leshan.server.californium.request.LwM2mResponseBuilder.decodeCoapResponse(LwM2mResponseBuilder.java:450) ... 21 common frames omitted Caused by: java.lang.IllegalStateException: Can not convert 3000.0 to long safely : Unsupported number java.lang.Double at org.eclipse.leshan.core.util.datatype.NumberUtil.numberToLong(NumberUtil.java:59) at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.numberToLong(LwM2mNodeSenMLDecoder.java:527) at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.parseResourceValue(LwM2mNodeSenMLDecoder.java:485) ... 27 common frames omitted

sbernard31 commented 2 years ago

1174 should fix this kind of problem for senML+cbor too. Which version are you using ?

Note that : even if I add this more flexible behavior which tolerates to convert Double in Long safely. I feel that your device behaves strangely ... maybe you should report it ?

For JSON, there is no clear way to know the type of a number. A common way is to consider that :

For CBOR, I feel this is even more wrong as the LWM2M specify that integer representation must be used for integer. See http://www.openmobilealliance.org/release/LightweightM2M/V1_1_1-20190617-A/HTML-Version/OMA-TS-LightweightM2M_Core-V1_1_1-20190617-A.html#Table-C-2-Data-Type-Mapping

Reading the specification again, I'm wondering if Leshan should tolerate this by default for senml+cbor. I feel it should not. (and so I will maybe change this again)

sonny82 commented 2 years ago

We don't use your fix yet. Yes, I think I will contact the manufacturer regarding this issue.

sbernard31 commented 2 years ago

I create an issue about this : https://github.com/eclipse/leshan/issues/1176

sonny82 commented 2 years ago

I contacted the manufacturer regarding the senml-cbor decoding issue. They think that the problem might be on the Leshan side in the following bits of code:

https://github.com/eclipse/leshan/blob/da6d32e5079e393e58cba6379223d7d721e6e549/leshan-core/src/main/java/org/eclipse/leshan/senml/cbor/upokecenter/SenMLCborPackSerDes.java#L83

Their comment: "Regardless of BigNum being Integer, it is set as float."

sbernard31 commented 2 years ago

I think their are wrong. :thinking:

  1. If this was true, why sometime it works and sometime it does not.
  2. record.setFloatValue method name is misleading but we don't set a float here but a Number which could be an integer. (it should probably be renamed)
  3. there is a simple way to verify. Instead of looking at Leshan code, just capture the traffic and look at the payload send by the device if this is a float number at cbor point of view. I feel this is an issue at client side. For now without more information, I bet on this.
sonny82 commented 2 years ago

Hi,

I'm attaching Wireshark captures for both senml+json and senml+cbor (I'm sorry it took so long). It fails when the integer value I'm trying to read is a thousand-round value (in this case 3000). If I change the value to 2999 decoding works. Note: this has been reproduced with the 2.0.0-M5 version (#1174 not applied yet).

If I read these captures correctly I see that also if I set content format in request to senml+cbor, I'm getting senml+json response..

wireshark_captures.zip

sbernard31 commented 2 years ago

About the float number SenML-JSON issue : Captures confirm what I explained in : https://github.com/eclipse/leshan/issues/1170#issuecomment-988627519 3e3 notation is used and so is considered as a Float by jackson. Even if I don't recommend to use exponential notation in JSON for Integer. We can not consider it as a bug at client side.

1174 should fix the problem at server side. So I think we are good here.

Not related to this issue but looking your capture, I feel there is maybe some issue at client side :

1) too many answer about DTLS retransmission ? See failure_senml-json.pcapng :

1   0.000000    2022-01-04 07:27:36,129189  10.1.0.5    95.159.202.153  5683    9180    CoAP    67  CON, MID:49225, GET, TKN:00 fc ee 4e 86 1a cb 5c, /33906/0/35   Error   2022-01-04 07:27:36,129189
2   2.030676    2022-01-04 07:27:38,159865  10.1.0.5    95.159.202.153  5683    9180    CoAP    67  CON, MID:49225, GET, TKN:00 fc ee 4e 86 1a cb 5c, /33906/0/35 [Retransmission]  Note    2022-01-04 07:27:38,159865
3   6.092597    2022-01-04 07:27:42,221786  10.1.0.5    95.159.202.153  5683    9180    CoAP    67  CON, MID:49225, GET, TKN:00 fc ee 4e 86 1a cb 5c, /33906/0/35 [Retransmission]  Note    2022-01-04 07:27:42,221786
4   14.213182   2022-01-04 07:27:50,342371  10.1.0.5    95.159.202.153  5683    9180    CoAP    67  CON, MID:49225, GET, TKN:00 fc ee 4e 86 1a cb 5c, /33906/0/35 [Retransmission]  Note    2022-01-04 07:27:50,342371
5   21.520511   2022-01-04 07:27:57,649700  95.159.202.153  10.1.0.5    9180    5683    CoAP    86  ACK, MID:49225, 2.05 Content, TKN:00 fc ee 4e 86 1a cb 5c, /33906/0/35      2022-01-04 07:27:57,649700
6   21.534478   2022-01-04 07:27:57,663667  95.159.202.153  10.1.0.5    9180    5683    CoAP    86  ACK, MID:49225, 2.05 Content, TKN:00 fc ee 4e 86 1a cb 5c, /33906/0/35      2022-01-04 07:27:57,663667
7   21.551604   2022-01-04 07:27:57,680793  95.159.202.153  10.1.0.5    9180    5683    CoAP    86  ACK, MID:49225, 2.05 Content, TKN:00 fc ee 4e 86 1a cb 5c, /33906/0/35      2022-01-04 07:27:57,680793
8   22.800684   2022-01-04 07:27:58,929873  95.159.202.153  10.1.0.5    9180    5683    CoAP    86  ACK, MID:49225, 2.05 Content, TKN:00 fc ee 4e 86 1a cb 5c, /33906/0/35      2022-01-04 07:27:58,929873

There is a lot of retransmission because it seems that device takes long time too answer. (1 request + 3 retransmission) Then we get 4 answers in a short frame time. Maybe this is a network issue but here I would expect only 1 answer from the device (maybe 2 because of network latency) but 4 sounds a bit strange. (Anyway this is not really a bug at client side but something which could maybe be improved)

2) request where senml-cbor is asked and senml-json is returned I think this behavior does not follow the LWM2M specification. The specification says :

An LwM2M Server data request MAY contain an option specifying the data formats the Server would prefer to receive for the payload; if this data format is not accepted by the LwM2M Client, the request is rejected;

And Operations Available CoAP Response Codes Reason Phrase
Read 4.06 Not Acceptable None of the preferred Content-Formats can be returned

(Table: 6.7.-3 Response Codes: Device Management and Service Enablement Interface)

So either the device should return payload using senml+cbor or it should return 4.06 error. (At least this is my understanding)

sonny82 commented 2 years ago

Thank you. I can confirm that applying #1174 solves this issue.

sbernard31 commented 2 years ago

Thx again for taking time to reporting this. That's help a lot :pray: