aws / aws-iot-device-sdk-python-v2

Next generation AWS IoT Client SDK for Python using the AWS Common Runtime
Apache License 2.0
408 stars 213 forks source link

Resource leak when using IPC subscription #589

Open erikfinnman opened 2 months ago

erikfinnman commented 2 months ago

Describe the bug

We have detected what appears to be a resource leak in the Greengrass nucleus related to IPC subscriptions.

When the below Python code is executed (in a Greengrass component) which constantly creates an IPC client, sets up a topic subscription and then closes the subscription and client, the underlying resources appear not to be freed.

Expected Behavior

Resources in the Greengrass nucleus should be released when the client is closed.

Current Behavior

Heap is eventually exhausted for the Greengrass Java process.

Reproduction Steps

    log.info("Mem-test of IPC client")
    count = 0
    while True:
        ipc_client = GreengrassCoreIPCClientV2()
        request_id = uuid.uuid1()
        response_topic = f"dummy_method-response-{request_id}"
        def response_listener(message: SubscriptionResponseMessage) -> None:
            log.info("Response listener")
        def error_listener(_: Exception) -> Union[None, bool]:
            log.info("Error listener")
            return True

        _, operation = ipc_client.subscribe_to_topic(
            topic=response_topic,
            on_stream_event=response_listener,
            on_stream_error=error_listener,
        )
        operation.close()
        ipc_client.close()
        count += 1
        if count > 10000:
            log.info("Created %s clients", count)
            count = 0
            time.sleep(1)

If the Greengrass heap is set to something like 100Mb, the memory is exhausted after about 15-20 minutes when running the above snippet in a Greengrass component, which we can see by enabling the Native Memory Tracking feature in the JVM.

The above code snippet was the most compact way we were able to replicate the problem we have been seeing on our production devices (but there the memory leak takes several weeks to manifest since we obviously don’t create clients as frequently as in the code snippet above).

Analyzing memory dumps of the JVM identifies the com.aws.greengrass.builtin.services.pubsub.PubSubIPCEventStreamAgent as the object retaining almost all memory. Digging into the references of this class reveals hundreds of thousands of objects of type java.util.concurrent.ConcurrentHashMap$Node which in turn have references to com.aws.greengrass.builtin.services.pubsub.SubscriptionTrie. It looks like this class contains the topic name of the generated subscription.

Studying the IPC documentation I can’t see anything obviously wrong with our code snippet - both the Greengrass IPC client and the subscription operations are closed - shouldn’t this free up all resources?

I raised this issue with the Nucleus team, but they state that this must be a problem in Python SDK since the client is not disconnected: https://github.com/aws-greengrass/aws-greengrass-nucleus/issues/1650

Possible Solution

No response

Additional Information/Context

I've also tried with a version of the above snippet when the client is created once (outside the loop), but I get the same behavior.

SDK version used

1.19.0

Environment details (OS name and version, etc.)

Linux 5.15.61-v8+ #1579 SMP PREEMPT 2022 aarch64 GNU/Linux

jmklix commented 1 month ago

Sorry for the delay, still looking into this. Trying to verify that the server correctly gets the message that the channel is closed. If that isn't happening then it's there is likely a problem with this sdk. Otherwise it might be a greengrass bug.

erikfinnman commented 1 month ago

Sorry for the delay, still looking into this. Trying to verify that the server correctly gets the message that the channel is closed. If that isn't happening then it's there is likely a problem with this sdk. Otherwise it might be a greengrass bug.

Ok, thanks for taking the time to update the issue.