eclipse-leshan / leshan

Java Library for LWM2M
https://www.eclipse.org/leshan/
BSD 3-Clause "New" or "Revised" License
652 stars 407 forks source link

Is Leshan server ready for production in industry ? #1614

Open EmbGangsta opened 6 months ago

EmbGangsta commented 6 months ago

Question

Hi,

Our company (www.cls.fr) is very interested in using LwM2M and we would like to add instance of leshan server inside our infrastructure but I wonder if today leshan is ready to be used in industrial in production mode ?

We need to support LwM2M 1.1 with queue mode as our devices will be sleeping most of the time, will use dynamic IP addresses (devices are using cellular modems).

Then 10000+ beacons could connect at the same time, is Leshan able to cope with this ?

What would be the licensing model in use ?

Regards

sbernard31 commented 6 months ago

Hi,

I wonder if today leshan is ready to be used in industrial in production mode ?

At least, we try to be. (except demos and some experimental feature which are not production ready, the client is maybe a bit less production ready, I think it is mainly used for testing) CoAP and CoAP over DTLS 1.2 based on Californium/Scandium are currently the more production ready transport layer.

I can also say that Leshan 1.x is currently used in production at Semtech (SierraWireless) I'm not totally sure but I understand that Orange is currently using Leshan 2.0.0-Mx in production.

You can also see :

Leshan 1.x is the stable release (stable API). It implements LWM2M v1.0.x only and is based on Californium/Scandium 2.x. Leshan 2.0.x is the in development (not stable API, you could face API changes between 2 release. It implements LWM2M v1.1.x only and is based on Californium/Scandium 3.x. (4.x is there is a release before the stable release?)

See more details at : https://github.com/eclipse-leshan/leshan/wiki/Roadmap

We need to support LwM2M 1.1 with queue mode as our devices will be sleeping most of the time, will use dynamic IP addresses (devices are using cellular modems).

This is a classic use case and should be supported following : https://github.com/eclipse-leshan/leshan/wiki/LWM2M-Devices-with-Dynamic-IP

Then 10000+ beacons could connect at the same time, is Leshan able to cope with this ?

It's very hard to answer to this kind of question, it depends on so many different factor but I guess this should be OK :thinking: But whatever the answer you will get to this question, I advice you to do some test performances anyway.

What would be the licensing model in use ?

Leshan is dual licensing so you can choose to use one license or another. More details at :

If all of this information is not enough, you could directly contact license@eclipse.org

sbernard31 commented 6 months ago

@JaroslawLegierski, @jvermillard, @cyril2maq, @gcx-seb maybe you have some experience to share about your usage of Leshan ?

jvermillard commented 6 months ago

I use Leshan 2.0.x in production for some customers (I'm a freelancer), and even load-tested it way above 10k devices, so if you need 2.0.x features, it's totally doable to run it in prod; but you need to be careful when upgrading milestones.

IMO, 10k devices can be doable on a single machine. If you target more than 50~100k+ it's where you need more work to have a multinode setup + redis to manage the sessions

boaks commented 6 months ago

Then 10000+ beacons could connect at the same time, is Leshan able to cope with this ?

In my experience (coap/dtls, Californium), it's not only about the number of devices, it depends also on the number of intended messages. 1.000.000 devices sending every hour results in less than 300 msg/s. But 1000 webcams with 30 msg/s will result in 30.000 msgs/s. So, how frequently are your beacons considered to send messages?

I also made frequently the experience, that not the CoAP device frontend is a performance bottleneck, in quite a lot of times, it's the application-backend. So, please also verify, that this is able to process the load.

EmbGangsta commented 6 months ago

Ok, actually our beacons are not very verbose, we are generally sending around 16 bytes every 1 minutes in term of data push (SEND method,, only the useful data speaking without SENML CBOR overhead). Exceptionnally it could output more data if beacon was out of cellular coverage during long time (datalogger feature), in this case the beacon may output more than 1500 bytes x 50 times in the same connection but then that's all ... So the average uplink content is really reduced and we will use CBOR and opaque data as much as possible

boaks commented 6 months ago

(devices are using cellular modems)

we are generally sending around 16 bytes every 1 minutes

That's less than 200 msgs/s, so it should not be an issue.

Just to mention: 16 bytes application data will require about 100 bytes additional for ip, udp, dtls, and coap. That will be 5MB per month and device. And it will drain the battery a lot.

EmbGangsta commented 6 months ago

We have different ways to cope with data thoughput VS battery. When generating lot of data (connection period short < 5min) it's generally in USB plugged case, otherwise we have configs to deliver data by chunks to limit overhead of connections / transmits. That's also why I am pushing to use LwM2M today instead of MQTT !!

cyril2maq commented 6 months ago

At Orange we do use leshan 2.X in production, with heterogeneous types of devices and LwM2M SDK connected to ours servers, and I can confirm it is very robust. Even more in your case, where it seems that you manage both device and server code.

And as @jvermillard and @sbernard31 mentioned, the API can still change (with breaking change), so you need to anticipate this in your project.

On our side, with regards to our backend application, we currently aim about 5k devices per leshan instance. So we performed loading tests with 5k devices on 1 instance with heavy usage (bootstrap, firmware updates, observation...). As @boaks mentionned, performance limitations will probably not come from leshan but rather from your backend application. And FYI, to handle multi-instances, you will need some work regarding specific implementations (using redis to share sessions).

EmbGangsta commented 6 months ago

Thanks. I have now to present this solution to our infra architect and come back with other questions.

Regards

PadmabushanReddy commented 6 months ago

Apart from Leshan, Most of the "Production Readiness" Depends on the backend integrations (mainly the patterns used to integrate) with your own frameworks and managing client connect, disconnect and bootstrapping cycles ( loadbalancing and avoiding thundering heards) We have Leshan v1.x running in a clustered setup with backend integrated to C* and Kafka (while keeping redis for Auth) with 2.5 Million actively connected devices(Hubs) sending us events (using observe-notify) in the range of 500-600 events/day/device. This load is served by 25 instances of leshan (staying at 100k per server since the Californium/scandium had a max connection value at 150k) As your scale increases (beyond say 1M active/live connections), you need to reinvent dynamic loadbalancing ( bootstrap based or otherwise) yourself. Building in the observability would help a lot in understanding the application and loadpattern. We still didnt try Leshan 2.0 at scale. I'm hoping it can hold 4-5x more connections than v1.x

boaks commented 6 months ago

@PadmabushanReddy

Thanks a lot for let us know!

since the Californium/scandium had a max connection value at 150k

150k is the default, with Cf 3.11 it will be

Californium3.properties:

# Maximum number of active peers.
# Default: 150000
COAP.MAX_ACTIVE_PEERS=1000000

# DTLS maximum connections.
# Default: 150000
DTLS.MAX_CONNECTIONS=1000000

to adapt that. With Cf 2.x it depends on

Californium.properties:

MAX_ACTIVE_PEERS=1000000

and which value is passed to

DtlsConnectorConfig.Builder.setMaxConnections(int)

You may need to check here in the Leshan project, what is supported.

sbernard31 commented 3 months ago

@EmbGangsta any news do you know if Leshan will be used by CLS ?

EmbGangsta commented 3 months ago

I hope we could ! Today it would be the best approach in term of "price". Our developers are maybe a little lazy and would like a ready to use solution but when they look to other ones (IOTEROP or Alaska) they deem that prices are too high... So I am pushing them to use Leshan yes ! We may have a little talk together maybe, I am in holidays right now.

sbernard31 commented 3 months ago

Thx for the update.

We may have a little talk together maybe, I am in holidays right now.

I will be unavailable next few days back the 21th. Generally, I prefer to use public async communication, but I you want to ask more "private" question you can send me an email :slightly_smiling_face: