Closed Dizzzmas closed 3 months ago
Here are the logs from the thing when this happens:
2024-07-09T12:10:07.664Z [INFO] (Thread-6) com.aws.greengrass.mqttclient.AwsIotMqtt5Client: Successfully connected to AWS IoT Core. {clientId=Recorder_EMUL3, sessionPresent=false}
2024-07-09T12:10:07.763Z [INFO] (main) com.aws.greengrass.lifecyclemanager.Kernel: effective-config-dump-complete. {file=/greengrass/v2/config/effectiveConfig.yaml}
2024-07-09T12:10:07.778Z [INFO] (main) com.aws.greengrass.lifecyclemanager.KernelLifecycle: system-start. {main=services.main:NEW}
2024-07-09T12:10:07.841Z [INFO] (aws.greengrass.Nucleus-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=aws.greengrass.Nucleus, currentState=NEW, newState=INSTALLED}
2024-07-09T12:10:07.845Z [INFO] (TelemetryAgent-lifecycle) com.aws.greengrass.telemetry.TelemetryAgent: service-set-state. {serviceName=TelemetryAgent, currentState=NEW, newState=INSTALLED}
2024-07-09T12:10:07.848Z [INFO] (FleetStatusService-lifecycle) com.aws.greengrass.status.FleetStatusService: service-set-state. {serviceName=FleetStatusService, currentState=NEW, newState=INSTALLED}
2024-07-09T12:10:07.843Z [INFO] (DeploymentService-lifecycle) com.aws.greengrass.deployment.DeploymentService: service-set-state. {serviceName=DeploymentService, currentState=NEW, newState=INSTALLED}
2024-07-09T12:10:07.853Z [INFO] (UpdateSystemPolicyService-lifecycle) com.aws.greengrass.lifecyclemanager.UpdateSystemPolicyService: service-set-state. {serviceName=UpdateSystemPolicyService, currentState=NEW, newState=INSTALLED}
2024-07-09T12:10:07.856Z [INFO] (main-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=main, currentState=NEW, newState=INSTALLED}
2024-07-09T12:10:07.913Z [INFO] (TelemetryAgent-lifecycle) com.aws.greengrass.telemetry.TelemetryAgent: service-set-state. {serviceName=TelemetryAgent, currentState=INSTALLED, newState=STARTING}
2024-07-09T12:10:07.914Z [INFO] (aws.greengrass.Nucleus-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=aws.greengrass.Nucleus, currentState=INSTALLED, newState=STARTING}
2024-07-09T12:10:07.914Z [INFO] (DeploymentService-lifecycle) com.aws.greengrass.deployment.DeploymentService: service-set-state. {serviceName=DeploymentService, currentState=INSTALLED, newState=STARTING}
2024-07-09T12:10:07.915Z [INFO] (FleetStatusService-lifecycle) com.aws.greengrass.status.FleetStatusService: service-set-state. {serviceName=FleetStatusService, currentState=INSTALLED, newState=STARTING}
2024-07-09T12:10:07.921Z [INFO] (UpdateSystemPolicyService-lifecycle) com.aws.greengrass.lifecyclemanager.UpdateSystemPolicyService: service-set-state. {serviceName=UpdateSystemPolicyService, currentState=INSTALLED, newState=STARTING}
2024-07-09T12:10:07.943Z [INFO] (pool-3-thread-14) com.aws.greengrass.deployment.DeploymentService: Received deployment in the queue. {DeploymentId=a0dd6d9a-b9d0-4bf2-986f-11710f1e1a20, GreengrassDeploymentId=edcfec89-8dbe-4653-8273-1151b2c7f109, DeploymentType=IOT_JOBS
2024-07-09T12:10:07.956Z [INFO] (UpdateSystemPolicyService-lifecycle) com.aws.greengrass.lifecyclemanager.UpdateSystemPolicyService: service-set-state. {serviceName=UpdateSystemPolicyService, currentState=STARTING, newState=RUNNING}
2024-07-09T12:10:07.948Z [INFO] (DeploymentService-lifecycle) com.aws.greengrass.deployment.DeploymentService: service-set-state. {serviceName=DeploymentService, currentState=STARTING, newState=RUNNING}
2024-07-09T12:10:07.968Z [INFO] (FleetStatusService-lifecycle) com.aws.greengrass.status.FleetStatusService: service-set-state. {serviceName=FleetStatusService, currentState=STARTING, newState=RUNNING}
2024-07-09T12:10:07.974Z [INFO] (TelemetryAgent-lifecycle) com.aws.greengrass.telemetry.TelemetryAgent: service-set-state. {serviceName=TelemetryAgent, currentState=STARTING, newState=RUNNING}
2024-07-09T12:10:07.979Z [INFO] (pool-3-thread-11) com.aws.greengrass.lifecyclemanager.GenericExternalService: generic-service-finished. Nothing done. {serviceName=aws.greengrass.Nucleus, currentState=STARTING}
2024-07-09T12:10:07.981Z [INFO] (aws.greengrass.Nucleus-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=aws.greengrass.Nucleus, currentState=STARTING, newState=FINISHED}
2024-07-09T12:10:08.025Z [INFO] (main-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=main, currentState=INSTALLED, newState=STARTING}
2024-07-09T12:10:08.030Z [INFO] (aws.greengrass.Nucleus-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=aws.greengrass.Nucleus, currentState=FINISHED, newState=STOPPING}
2024-07-09T12:10:08.061Z [INFO] (pool-3-thread-12) com.aws.greengrass.lifecyclemanager.GenericExternalService: generic-service-finished. Nothing done. {serviceName=main, currentState=STARTING}
2024-07-09T12:10:08.063Z [INFO] (main-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=main, currentState=STARTING, newState=FINISHED}
2024-07-09T12:10:08.065Z [INFO] (main-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=main, currentState=FINISHED, newState=STOPPING}
2024-07-09T12:10:08.069Z [INFO] (pool-3-thread-12) com.aws.greengrass.lifecyclemanager.GenericExternalService: Shutdown initiated. {serviceName=main, currentState=STOPPING}
2024-07-09T12:10:08.075Z [INFO] (pool-3-thread-11) com.aws.greengrass.lifecyclemanager.GenericExternalService: Shutdown initiated. {serviceName=aws.greengrass.Nucleus, currentState=STOPPING}
2024-07-09T12:10:08.082Z [INFO] (pool-3-thread-12) com.aws.greengrass.lifecyclemanager.GenericExternalService: generic-service-shutdown. {serviceName=main, currentState=STOPPING}
2024-07-09T12:10:08.084Z [INFO] (main-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=main, currentState=STOPPING, newState=FINISHED}
2024-07-09T12:10:08.086Z [INFO] (pool-3-thread-11) com.aws.greengrass.lifecyclemanager.GenericExternalService: generic-service-shutdown. {serviceName=aws.greengrass.Nucleus, currentState=STOPPING}
2024-07-09T12:10:08.087Z [INFO] (aws.greengrass.Nucleus-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=aws.greengrass.Nucleus, currentState=STOPPING, newState=FINISHED}
2024-07-09T12:10:08.260Z [INFO] (pool-3-thread-14) com.aws.greengrass.deployment.DeploymentStatusKeeper: Stored deployment status. {DeploymentId=a0dd6d9a-b9d0-4bf2-986f-11710f1e1a20, GreengrassDeploymentId=edcfec89-8dbe-4653-8273-1151b2c7f109, DeploymentStatus=IN_PROGRE
2024-07-09T12:10:08.266Z [INFO] (main) com.aws.greengrass.status.FleetStatusService: fss-status-update-published. Status update published to FSS. {trigger=NUCLEUS_LAUNCH, serviceName=FleetStatusService, currentState=RUNNING}
2024-07-09T12:10:08.296Z [INFO] (pool-3-thread-14) com.aws.greengrass.deployment.IotJobsHelper: Updating status of persisted deployment. {Status=IN_PROGRESS, StatusDetails={}, ThingName=Recorder_EMUL3, JobId=a0dd6d9a-b9d0-4bf2-986f-11710f1e1a20}
2024-07-09T12:10:08.391Z [INFO] (Thread-6) com.aws.greengrass.deployment.IotJobsHelper: Received Iot job description. {Status=IN_PROGRESS, queueAt=Tue Jul 09 12:07:33 UTC 2024, ThingName=Recorder_EMUL3, JobId=a0dd6d9a-b9d0-4bf2-986f-11710f1e1a20}
2024-07-09T12:10:08.393Z [INFO] (Thread-6) com.aws.greengrass.deployment.IotJobsHelper: Duplicate or outdated job notification. Ignoring.. {ThingName=Recorder_EMUL3, JobId=a0dd6d9a-b9d0-4bf2-986f-11710f1e1a20}
2024-07-09T12:10:08.591Z [INFO] (Thread-6) com.aws.greengrass.deployment.IotJobsHelper: Job status update was accepted. {Status=IN_PROGRESS, ThingName=Recorder_EMUL3, JobId=a0dd6d9a-b9d0-4bf2-986f-11710f1e1a20}
2024-07-09T12:10:08.679Z [INFO] (pool-3-thread-14) com.aws.greengrass.deployment.DeploymentService: Started deployment execution. {serviceName=DeploymentService, currentState=RUNNING, deployment=a0dd6d9a-b9d0-4bf2-986f-11710f1e1a20}
2024-07-09T12:10:08.857Z [INFO] (pool-3-thread-7) com.aws.greengrass.componentmanager.ComponentStore: delete-component-finish. {componentIdentifier=aws.greengrass.TokenExchangeService-v2.0.3}
2024-07-09T12:10:08.963Z [INFO] (pool-3-thread-7) com.aws.greengrass.componentmanager.ComponentStore: delete-component-finish. {componentIdentifier=aws.greengrass.ShadowManager-v2.3.8}
2024-07-09T12:10:09.027Z [INFO] (pool-3-thread-7) com.aws.greengrass.componentmanager.ComponentStore: delete-component-finish. {componentIdentifier=com.example.hello-v1.0.60}
2024-07-09T12:10:09.064Z [INFO] (pool-3-thread-7) com.aws.greengrass.componentmanager.ComponentStore: delete-component-finish. {componentIdentifier=aws.greengrass.LogManager-v2.3.7}
2024-07-09T12:10:09.163Z [INFO] (pool-3-thread-7) com.aws.greengrass.componentmanager.ComponentStore: delete-component-finish. {componentIdentifier=aws.greengrass.Cloudwatch-v3.1.0}
2024-07-09T12:10:09.325Z [INFO] (pool-3-thread-7) com.aws.greengrass.componentmanager.ComponentStore: delete-component-finish. {componentIdentifier=aws.greengrass.Nucleus-v2.12.6}
2024-07-09T12:10:10.127Z [INFO] (pool-2-thread-3) com.aws.greengrass.status.FleetStatusService: fss-status-update-published. Status update published to FSS. {trigger=COMPONENT_STATUS_CHANGE, serviceName=FleetStatusService, currentState=RUNNING}
2024-07-09T12:10:23.683Z [INFO] (pool-3-thread-14) com.aws.greengrass.deployment.DeploymentService: Current deployment finished. {DeploymentId=a0dd6d9a-b9d0-4bf2-986f-11710f1e1a20, GreengrassDeploymentId=edcfec89-8dbe-4653-8273-1151b2c7f109, serviceName=DeploymentServic
2024-07-09T12:10:23.739Z [ERROR] (pool-3-thread-14) com.aws.greengrass.deployment.DeploymentService: Deployment task failed with following errors. {DeploymentId=a0dd6d9a-b9d0-4bf2-986f-11710f1e1a20, detailed-deployment-status=FAILED_ROLLBACK_COMPLETE, deployment-error-t
com.aws.greengrass.deployment.exceptions.DeploymentException: Service com.example.hello in broken state after deployment
at com.aws.greengrass.deployment.KernelUpdateDeploymentTask.getDeploymentStatusDetails(KernelUpdateDeploymentTask.java:142)
at com.aws.greengrass.deployment.KernelUpdateDeploymentTask.call(KernelUpdateDeploymentTask.java:76)
at com.aws.greengrass.deployment.KernelUpdateDeploymentTask.call(KernelUpdateDeploymentTask.java:36)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
2024-07-09T12:10:23.869Z [INFO] (pool-3-thread-14) com.aws.greengrass.deployment.DeploymentStatusKeeper: Stored deployment status. {DeploymentId=a0dd6d9a-b9d0-4bf2-986f-11710f1e1a20, GreengrassDeploymentId=edcfec89-8dbe-4653-8273-1151b2c7f109, DeploymentStatus=FAILED}
2024-07-09T12:10:23.878Z [INFO] (pool-3-thread-14) com.aws.greengrass.deployment.IotJobsHelper: Updating status of persisted deployment. {Status=FAILED, StatusDetails={deployment-error-stack=[DEPLOYMENT_FAILURE, COMPONENT_UPDATE_ERROR, COMPONENT_BROKEN], deployment-erro
2024-07-09T12:10:23.983Z [INFO] (Thread-6) com.aws.greengrass.deployment.IotJobsHelper: Job status update was accepted. {Status=FAILED, ThingName=Recorder_EMUL3, JobId=a0dd6d9a-b9d0-4bf2-986f-11710f1e1a20}
2024-07-09T12:10:24.131Z [INFO] (pool-3-thread-14) com.aws.greengrass.status.FleetStatusService: fss-status-update-published. Status update published to FSS. {trigger=THING_GROUP_DEPLOYMENT, serviceName=FleetStatusService, currentState=RUNNING}
2024-07-09T12:10:24.143Z [INFO] (pool-3-thread-14) com.aws.greengrass.deployment.DeploymentDirectoryManager: Persist link to last deployment. {link=/greengrass/v2/deployments/previous-failure}
2024-07-09T12:10:24.499Z [INFO] (Thread-6) com.aws.greengrass.deployment.IotJobsHelper: Received empty jobs in notification . {ThingName=Recorder_EMUL3}
I guess the culprit is this line in the logs which deletes nucleus:
com.aws.greengrass.componentmanager.ComponentStore: delete-component-finish. {componentIdentifier=aws.greengrass.Nucleus-v2.12.6}
I've revised the deployment a couple times and seems as soon as I include any aws-owned component like shadow manager or log manager, nucleus gets deleted on rollback.
How did you install Greengrass? What are the commands you ran and what was the configuration file that you used?
This will happen if you do not put the correct version number of Nucleus in the initial configuration.
Thanks @MikeDombo! You rock!
That was my exact issue. The discrepancy between the installed nucleus version 2.12.6 and the one specified in config.yaml
2.12.4 .
After updating config.yaml
to have matching version, deployment rollback no longer kills nucleus.
This happened because when building my OS image that includes greengrass I was downloading the latest version:
curl -s https://d2s8p88vqu9w66.cloudfront.net/releases/greengrass-nucleus-latest.zip > greengrass-nucleus-latest.zip
But then forgot to update the code that was patching it with config.yaml
that was created quite some time ago and was assuming version 2.12.4
Will make sure to download fixed version of greengrass.
Glad that fixed it! Definitely pinning to a specific version in your image is the safest thing to do.
Describe the bug A new thing is provisioned with fleet plugin. A deployment exists for its group, so rollout starts immediately. Deployment fails for the thing. Rollback is successful. Checking
/alts/current/distro
directory and it's empty. Unless a follow-up successful deployment happens, greengrass won't be able to run on reboot because of missing/alts/current/distro/bin/loader
.To Reproduce Perform a failed first deployment for a newly provisioned thing. Check if
/alts/current/distro/bin/loader
exists after rollback.My deployment manifest:
Expected behavior After rollback greengrass is not missing essential loader for system service under
/alts/current/distro/bin/loader
which would make greengrass unsalvageable upon reboot.Actual behavior After rollback greengrass is missing essential loader for system service under
/alts/current/distro/bin/loader
and reboot would cause the system service to no longer be able to run.Environment
Looks related to: https://github.com/aws-greengrass/aws-greengrass-nucleus/issues/1342