Azure / azure-iot-sdk-java

A Java SDK for connecting devices to Microsoft Azure IoT services
https://azure.github.io/azure-iot-sdk-java/
Other
199 stars 236 forks source link

[Device Client MQTT] Getting OutOfMemoryError #463

Closed jerickcolendres closed 5 years ago

jerickcolendres commented 5 years ago

There seems to be a bottleneck in the edgeHub but even so, we expect that messages sent are scheduled for GC which wasn't case until we set the message expiry to 60 seconds and reduce the amount of messages we sent to 500 per second.

Are we doing the right thing by setting the message expiry or are we sending messages to the edgeHub too fast?

Here's a screenshot in VisualVM that shows the IOTHubTransportPacket message instances using 1.1GB of heap memory:

javamodule-oome-visualvm

Code sample exhibiting the issue:

public void publishMessage(String data) {
    logger.debug("Sending message to IOT Hub: {}", data);
   Message message = new Message(data);
   try {
        logger.debug("Sending message to Edge Hub");
        //Uncomment to free up more heap space and prevent OutOfMemoryError
        //message.setExpiryTime(60* 1000);
        client.sendEventAsync(message, eventCallback, message, OUTPUT_NAME);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

Console log of the issue:

2019-03-09 00:34:00.183 [collector-thread-126] INFO  c.f.a.e.r.s.DataCollectorService - [Site 63]: Getting tag samples from '2019-03-09T10:51:20' and '2019-03-09T11:01:20'
Caused by: java.sql.SQLTransientConnectionException: rhea_datasource_pool - Connection is not available, request timed out after 3160ms.
        at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:667)
        at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:183)
        at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:148)
        at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100)
        at org.jdbi.v3.core.Jdbi.open(Jdbi.java:285)
        ... 10 more
Exception in thread "collector-thread-126" java.lang.OutOfMemoryError: Java heap space
timtay-microsoft commented 5 years ago

I suspect that the biggest bottleneck here is the MQTT library that this SDK uses, not the edgehub instance you are using. I recommend testing out AMQP instead of MQTT to see if you get a different performance. Keep in mind that 1000 messages per second with ~5 MB per message is a pretty heavy load for this SDK.

As far as message expiry time goes, you are correct to be setting that property. The message expiry time is strictly an SDK side property for your message. The purpose of that property is to keep messages from staying in the send queue for too long, and to let newer messages be sent instead. Since you are queueing messages faster than the SDK can send them, the message queue of the SDK will just keep growing until you hit that OutOfMemoryException. Even with a short message expiry time, I suspect you will still hit OutOfMemoryException eventually if you try to send messages at that pace.

@varunpuranik for some context on performance of edgehubs with such high message/sec

jerickcolendres commented 5 years ago

@timtay-microsoft Thanks for your response! Indeed, 1k message/second with ~5mb/message is quite heavy and by just reducing the sending rate by half, the heap memory graph looks much better though I still have to set the message expiry. I'll try AMQP and see if it can cope with the load.

From what you said about the message expiry, it seems unsent messages that have expired gets taken out of the sent queue to let newer ones sent first. We don't want to drop messages from the send queue because it'll leave a gap in the Time Series Insights graphs. We prefer reliable message delivery over up-to-date information.

Does the SDK provides a way to monitor the send queue from the Edge module so that we can throttle the sending rate?

jerickcolendres commented 5 years ago

Looks like changing the protocol to AMQP is the solution to our problem. I'm not getting the OutOfMemoryError anymore and I don't have to set the message expiry. The heap memory usage too is very low compared to MQTT where I still have the issue even when running the latest version 1.16.0. Thanks for the tip!

timtay-microsoft commented 5 years ago

I'm glad to hear AMQP is working better for you.

From what you said about the message expiry, it seems unsent messages that have expired gets taken out of the sent queue to let newer ones sent first. We don't want to drop messages from the send queue because it'll leave a gap in the Time Series Insights graphs. We prefer reliable message delivery over up-to-date information.

If you want reliable message delivery rather than up-to-date information, then you will need to be responsible about how many messages you queue up at a time. There are a few options you can adjust to try to speed up sending, but the onus will still be on you to add messages at a pace that the SDK can keep up with. I recommend fiddling with the send interval through the setOption API with the "SetSendInterval" option. Lower values here will make the SDK attempt to send queued messages more frequently, but at the cost of extra threading.

Does the SDK provides a way to monitor the send queue from the Edge module so that we can throttle the sending rate?

No, right now the only way to track the send queue's size is by keeping track of how many sendEventAsync(...) calls you have made. There isn't a hard-defined limit for how many messages per second that this SDK can send from its queue, so you will need to test the throughput for yourself, and create a limit from the application side to prevent adding too many messages to the queue.

jerickcolendres commented 5 years ago

I see. Thanks for your help!

az-iot-builder-01 commented 5 years ago

@jerickcolendres, thank you for your contribution to our open-sourced project! Please help us improve by filling out this 2-minute customer satisfaction survey