aws / aws-connected-device-framework

Apache License 2.0
62 stars 38 forks source link

device-monitoring: lambda concurrent execution limits #87

Open aaronatbissell opened 1 year ago

aaronatbissell commented 1 year ago

Aws Connected Device Framework Affected Module(s):

device-monitoring **I'm submitting a ...**

Description:

The device monitoring service keeps track of a device's online/offline status based off the $aws/events/presence MQTT topics. This is a straight IOT Core Rule -> Lambda. This is great during "normal operation" and for low-volume systems, but higher-volume systems (millions of devices) or when AWS does maintenance on IOT Core and initiates a SERVER_INITIATED_DISCONNECT, you may have 10's of thousands of devices all disconnecting and reconnecting at the same time. This will overwhelm Lambda fairly quickly and you will hit your max lambda concurrency quota very quickly (especially because each device monitoring lambda invokes an assetlibrary lambda).

Current behavior:

Lambda hits the concurrency limit and other operations going on in the system are throttled at that time

Expected behavior:

These system events are handled in a batched, methodical process

Steps to reproduce:

Additional Information: Possibly could fix this by using SQS to queue and batch the records. This would also help with execution time, as this wouldn't require as many lambda cold starts

Screen Shot 2022-10-20 at 1 37 21 PM

aaronatbissell commented 1 year ago

Actually - after thinking about this a little bit more, the assetlibrary-history lambda might be a better one to tackle batching for. The device monitoring lambda tends to do well (although there are small spikes) with handling large loads like this.

anish-kunduru commented 1 year ago

Hi Aaron,

I agree that what you're asking for is a very valid use case. I actually put in a PFR for IoT Lambda Rule Actions to support batching functionality a couple months back. Let me find out where that is and add you as a customer of influence to that.