aws-greengrass / aws-greengrass-nucleus

The Greengrass nucleus component provides functionality for device side orchestration of deployments and lifecycle management for execution of Greengrass components and applications. This includes features such as starting, stopping, and monitoring execution of components and apps, interprocess communication server for communication between components, component installation and configuration management.
Apache License 2.0
108 stars 46 forks source link

(greengrass): Greengrass cannot report the job status back to the cloud #1669

Open DeboBurro opened 4 days ago

DeboBurro commented 4 days ago

Describe the bug After provisioning a greengrass device, I saw the /greengrass/v2/logs/greengrass.log

To Reproduce

  1. Install greengrass via
    sudo -E java -Droot="/greengrass/v2" -Dlog.store=FILE \
    -jar ~/GreengrassInstaller/lib/Greengrass.jar \
    --aws-region us-west-2 \
    --thing-name $HOSTNAME \
    --tes-role-name GreengrassV2TokenExchangeRole \
    --tes-role-alias-name GreengrassCoreTokenExchangeRoleAlias \
    --component-default-user $USER:$USER \
    --provision true \
    --setup-system-service true
  2. We make a deployment to the device. and it looks like when it tried to report the completed deployment job status, it shows the error in /greengrass/v2/logs
    
    2024-11-14T14:21:41.439Z [WARN] (pool-3-thread-7) com.aws.greengrass.deployment.ShadowDeploymentListener: Caught exception while subscribing to shadow topics, will retry shortly. {}
    java.util.concurrent.TimeoutException
        at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
        at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
        at com.aws.greengrass.mqttclient.MqttClient.subscribe(MqttClient.java:532)
        at com.aws.greengrass.mqttclient.WrapperMqttClientConnection.subscribe(WrapperMqttClientConnection.java:77)
        at software.amazon.awssdk.iot.iotshadow.IotShadowClient.SubscribeToUpdateNamedShadowRejected(IotShadowClient.java:1233)
        at com.aws.greengrass.deployment.ShadowDeploymentListener.subscribeToShadowTopics(ShadowDeploymentListener.java:247)
        at com.aws.greengrass.deployment.ShadowDeploymentListener.lambda$setupShadowCommunications$2(ShadowDeploymentListener.java:218)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

2024-11-14T14:21:47.772Z [WARN] (pool-3-thread-6) com.aws.greengrass.deployment.IotJobsHelper: No connection available during subscribing to Iot Jobs descriptions topic. Will retry in sometime. {ThingName=burro-8-475} java.util.concurrent.TimeoutException at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886) at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021) at com.aws.greengrass.mqttclient.MqttClient.subscribe(MqttClient.java:532) at com.aws.greengrass.mqttclient.WrapperMqttClientConnection.subscribe(WrapperMqttClientConnection.java:77) at com.aws.greengrass.deployment.IotJobsClientWrapper.SubscribeToDescribeJobExecutionAccepted(IotJobsClientWrapper.java:198) at software.amazon.awssdk.iot.iotjobs.IotJobsClient.SubscribeToDescribeJobExecutionAccepted(IotJobsClient.java:284) at com.aws.greengrass.deployment.IotJobsHelper.subscribeToGetNextJobDescription(IotJobsHelper.java:533) at com.aws.greengrass.deployment.IotJobsHelper.subscribeToJobsTopics(IotJobsHelper.java:489) at com.aws.greengrass.deployment.IotJobsHelper.lambda$setupCommWithIotJobs$5(IotJobsHelper.java:347) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)

2024-11-14T14:22:37.035Z [WARN] (Thread-6) com.aws.greengrass.mqttclient.AwsIotMqtt5Client: Connection interrupted. {reason=null, clientId=burro-8-475, reasonCode=KEEP_ALIVE_TIMEOUT, error=Mqtt5 client connection interrupted by server DISCONNECT.} 2024-11-14T14:22:37.036Z [ERROR] (Thread-6) com.aws.greengrass.mqttclient.AwsIotMqtt5Client: Error subscribing to topic. {clientId=burro-8-475, topic=$aws/things/burro-8-475/shadow/name/AWSManagedGreengrassV2Deployment/update/rejected} java.util.concurrent.CompletionException: software.amazon.awssdk.crt.CrtRuntimeException: Mqtt5 operation failed due to a disconnection event in conjunction with the client's offline queue retention policy. AWS_ERROR_MQTT5_OPERATION_FAILED_DUE_TO_OFFLINE_QUEUE_POLICY(5156) at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) Caused by: software.amazon.awssdk.crt.CrtRuntimeException: Mqtt5 operation failed due to a disconnection event in conjunction with the client's offline queue retention policy. AWS_ERROR_MQTT5_OPERATION_FAILED_DUE_TO_OFFLINE_QUEUE_POLICY(5156)

2024-11-14T14:22:37.036Z [ERROR] (Thread-6) com.aws.greengrass.mqttclient.MqttClient: Error subscribing. {topic=$aws/things/burro-8-475/shadow/name/AWSManagedGreengrassV2Deployment/update/rejected} java.util.concurrent.CompletionException: software.amazon.awssdk.crt.CrtRuntimeException: Mqtt5 operation failed due to a disconnection event in conjunction with the client's offline queue retention policy. AWS_ERROR_MQTT5_OPERATION_FAILED_DUE_TO_OFFLINE_QUEUE_POLICY(5156) at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) Caused by: software.amazon.awssdk.crt.CrtRuntimeException: Mqtt5 operation failed due to a disconnection event in conjunction with the client's offline queue retention policy. AWS_ERROR_MQTT5_OPERATION_FAILED_DUE_TO_OFFLINE_QUEUE_POLICY(5156)

2024-11-14T14:22:37.037Z [ERROR] (Thread-6) com.aws.greengrass.mqttclient.AwsIotMqtt5Client: Error subscribing to topic. {clientId=burro-8-475, topic=$aws/things/burro-8-475/jobs/$next/namespace-aws-gg-deployment/get/accepted} java.util.concurrent.CompletionException: software.amazon.awssdk.crt.CrtRuntimeException: Mqtt5 operation failed due to a disconnection event in conjunction with the client's offline queue retention policy. AWS_ERROR_MQTT5_OPERATION_FAILED_DUE_TO_OFFLINE_QUEUE_POLICY(5156) at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) Caused by: software.amazon.awssdk.crt.CrtRuntimeException: Mqtt5 operation failed due to a disconnection event in conjunction with the client's offline queue retention policy. AWS_ERROR_MQTT5_OPERATION_FAILED_DUE_TO_OFFLINE_QUEUE_POLICY(5156)



**Expected behavior**
It should report the completed deployment job status

**Actual behavior**
It cannot report any job status. In the cloud side, the status of the job keeps being `active`

**Environment**
 - OS: Ubuntu 20.04
 - JDK version: openjdk version "11.0.19" 2023-04-18
 - Nucleus version: 2.13

**Additional context**
This issue starts happening in the past 1~2 weeks. Wondering if there is a bug in new release or something.
junfuchen99 commented 3 days ago

Hi,

Please look at your “AWS IoT Console -> Connect -> Domain configurations -> iot:Data-ATS” and check what’s chosen under security policy.

If it’s configured to use TLS13_1_3_2022_10, please try switch to TLS13_1_2_2022_10 and restart Greengrass on your device. We are actively working on an issue where Nucleus v2.13.0 does not connect to IoT Core with that specific TLS policy.

If confirmed, this issue should be mitigated by downgrading GG or choosing a different TLS policy