eclipse-californium / californium

CoAP/DTLS Java Implementation
https://www.eclipse.org/californium/
Other
722 stars 361 forks source link

[Firmware Update][Blockwise] - Is it possible transfer a firmware more than 300MB using Blockwise? #2248

Closed mathns28 closed 3 months ago

mathns28 commented 3 months ago

Hello, I'm newer in lwm2m and I'm learning, so if I miss something forgive me.

I made a file transfer using the CaliforniumServerEndpointsProvider as base. I changed the RootResource for other implementation that controls the files.

The class I get from here: https://github.com/eclipse-leshan/leshan/issues/70#issuecomment-2047006480

I tested locally and It works pretty well, but when I tested on the server I get every time a delay:

logPart.txt

This occurs randomly, sometimes after about 20 transmissions, sometimes after 500 transmissions.

The server configuration that I'm using is here. This is in default mode, I only changed the MAX_RESOURCE_BODY_SIZE, PREFERRED_BLOCK_SIZE and MAX_RETRANSMIT, because the files are more than 300Mb.

config.txt

I don't know where is the error, if I lost something, if I need to increase some value on the configs, or decrise, if I can get more logs to validate the message...

I would be grateful if someone could help me!!

boaks commented 3 months ago

300MB, are you sure? And that in 256 bytes blocks? CoAP refers to "Constraint ...".

I don't have experience with that, I would not use CoAP for 300MB files.

The leshan project already pointed to the main issue: Using MID with 16 bit limits that hyper huge transfer.

You may play with it, (use the COAP.MID_TRACKER=NULL on the client, and COAP.DEDUPLICATOR=PEERS_MARK_AND_SWEEP on the server), but I guess, finally, it will not work stable.

The current blockwise implementation requires also all data in the heap, that makes this pretty inefficient for 300MB.

mathns28 commented 3 months ago

Do you know the largest possible file size I can transfer stably? Or the size of the commons transfers?

I don't have experience with that, I would not use CoAP for 300MB files.

I see, I got 2MB transfers at most.

boaks commented 3 months ago

Do you know the largest possible file size I can transfer stably? Or the size of the commons transfers?

There maybe very different experience on that. I myself, using mainly single request (400-700 bytes), and for firmware download a blockwise transfer of 500K in 1K blocks (GET).

I didn't test, what the largest "stable transfer" would be. I guess, that depends a lot on the stability of the network and the configuration used.

For the "MID overflow" it depends a lot on the used implementations. It's not only about the used MID in transfer, long ago I found also, that a restart with a random MID after such a blockwise transfer gets a pretty higher probability to collide with that previous blockwise transfer (rational behind that: on normal starts, the probability to use MID, which hasn't expired on the server side, randomly is pretty low, e.g. 1-2/65536. But, if a blckwise transfer of e.g. 1000 blocks has still not expired MDs, the probability will be 1000/65536, and that will be hit in any deployment of 100 devices.)

For all that, we implemented an alternative algorithm for deduplication on the server side, the COAP.DEDUPLICATOR=PEERS_MARK_AND_SWEEP . That's not covered by the IETF core group. It limits the client in using "larger N-STARTS". If that works, then it helps to reduce the "MID overflows" a lot (and also the heap use for the deduplication based on that). I use that in the very most of my own deployments.

I see, I got 2MB transfers at most.

Using blocks with 256 bytes (about 8000 requests)? Or 1024 bytes? Not sure, why you get "limited" at that size. As I wrote, I don't use large transfers in the wild, but tests with 8MB are usually successful.

By the way, which implementation are you using on the client side?

boaks commented 3 months ago

The issue with random MID collisions on restart after a blockwise transfer depends on the used "endpoint identity". E.g. if the ip-address/port is used, then it depends on client port (fix or ephemeral), DHCP or a NAT. If the DTLS principal is used, that issue may show up more frequently. Sometimes it's also possible, to setup a second coap-server for downloads, what also mitigates that collision.

boaks commented 3 months ago

I assume, that a 300MB transfer is not too frequently used. And without experience it's hard to be helpful on this topic. Therefore I would prefer to close it.