aws-greengrass / aws-greengrass-nucleus

The Greengrass nucleus component provides functionality for device side orchestration of deployments and lifecycle management for execution of Greengrass components and applications. This includes features such as starting, stopping, and monitoring execution of components and apps, interprocess communication server for communication between components, component installation and configuration management.
Apache License 2.0
107 stars 44 forks source link

lifecycle or state: Spamming INFO level logs in greengrass.log #1649

Open shuheinakahodo-catapult opened 1 month ago

shuheinakahodo-catapult commented 1 month ago

Hi team,

I am having an issue where there are too many INFO log outputs in greengrass.log file, and the log file grows to > 50GB in size within a couple of hours. This is resulting in the entire system slowing down, deployment hangs at IN_PROGRESS status, and eventually the VM crashes because of no more disk space.

I can observe this issue particularly when deploying a component, both by local Greengrass CLI and cloud. The log lines look like below:

2024-08-07T23:57:21.910Z [INFO] (aws.greengrass.Cli-lifecycle) com.aws.greengrass.cli.CLIService: service-set-state. {serviceName=aws.greengrass.Cli, currentState=RUNNING, newState=STOPPING}
Unable to parse log message: 
2024-08-07T23:57:21.910Z [INFO] (aws.greengrass.Cli-lifecycle) com.aws.greengrass.cli.CLIService: service-set-state. {serviceName=aws.greengrass.Cli, currentState=RUNNING, newState=STOPPING}
Unable to parse log message: 
2024-08-07T23:57:21.910Z [INFO] (main-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=main, currentState=FINISHED, newState=INSTALLED}
Unable to parse log message: 
2024-08-07T23:57:21.910Z [INFO] (main-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=main, currentState=FINISHED, newState=INSTALLED}

as well as

2024-08-08T09:08:55.859Z [INFO] (AwsEventLoop 1) com.aws.greengrass.deployment.IotJobsHelper: Received Iot job description. {Status=IN_PROGRESS, queueAt=Wed Aug 07 22:42:06 UTC 2024, ThingName=xxx, JobId=xxx}
Unable to parse log message: 
2024-08-08T09:08:55.866Z [INFO] (AwsEventLoop 1) com.aws.greengrass.deployment.IotJobsHelper: Added the job to the queue. {ThingName=xxx, JobIdxxx}
Unable to parse log message: 
2024-08-08T09:08:58.512Z [INFO] (pool-3-thread-20) com.aws.greengrass.deployment.DeploymentService: Received deployment in the queue. {DeploymentId=xxx, GreengrassDeploymentId=null, DeploymentType=IOT_JOBS, serviceName=DeploymentService, currentState=RUNNING}
Unable to parse log message: 
2024-08-08T09:08:58.541Z [INFO] (pool-3-thread-20) com.aws.greengrass.deployment.DeploymentStatusKeeper: Stored deployment status. {DeploymentId=xxx, GreengrassDeploymentId=xxx, DeploymentStatus=IN_PROGRESS}

I can see billions of those log lines are created, by running gg-cli logs get -lf /home/vagrant/greengrass-dir/root/logs/greengrass.log --follow, soon after a deployment happens.

Is this a bug, or is there some sort of config that I can set upon installing or running Nucleus?

Environment

Additional context

So that I can verify the deployment can happen as expected, I've set Dlog.level=WARN for Nucleus installation:

sudo -E $JAVA -Droot="$GREENGRASS_ROOT" -Dlog.store=FILE -Dlog.level=WARN \
  -jar ./GreengrassInstaller/lib/Greengrass.jar \
  --aws-region us-east-1 \
  --thing-name "$THING_NAME" \
  --thing-group-name "$THING_GROUP_NAME" \
  --thing-policy-name "$THING_NAME"Policy \
  --tes-role-name "$THING_NAME"TESRole \
  --tes-role-alias-name "$THING_NAME"TESAlias \
  --component-default-user ggc_user:ggc_group \
  --provision true \
  --setup-system-service true \
  --deploy-dev-tools true

just so that I don't get tons of INFO logs above.

Then I ran deployment from the cloud, and it resulted in SUCCEEDED within a minute.

yitingb commented 4 weeks ago

Hi, for further investigation, we need to confirm the following:

  1. Did you make any log configuration changes to your previous deployment
  2. Are you using any shadow manager or interacting with mqtt very often?

If possible, could you provide us your full log under greengrass/v2/logs and config logs under greengrass/v2/config

shuheinakahodo-catapult commented 2 weeks ago

@yitingb

My apology for getting back to you this late. I have been away from keyboard for the last few weeks.

  1. I did not make any log config changes to the deployment (other than that it is now formatted in JSON as shown below).

I am unable to provide the full log for security reason, but here is the log just before the spamming happens.

{"thread":"pool-3-thread-25","level":"INFO","eventType":null,"message":"Found running component which meets the requirement and use it.","contexts":{"ComponentIdentifier":"aws.greengrass.Nucleus-v2.12.6"},"loggerName":"com.aws.greengrass.componentmanager.ComponentManager","timestamp":1724746653458,"cause":null}
{"thread":"pool-3-thread-25","level":"INFO","eventType":null,"message":"Found the best local candidate that satisfies the requirement.","contexts":{"LocalCandidateId":"aws.greengrass.Nucleus-v2.12.6"},"loggerName":"com.aws.greengrass.componentmanager.ComponentManager","timestamp":1724746653458,"cause":null}
{"thread":"pool-3-thread-25","level":"INFO","eventType":"resolve-component-version-end","message":"Resolved component version.","contexts":{"ResolvedComponent":"aws.greengrass.Nucleus-v2.12.6"},"loggerName":"com.aws.greengrass.componentmanager.ComponentManager","timestamp":1724746655092,"cause":null}
{"thread":"pool-3-thread-25","level":"INFO","eventType":"resolve-all-group-dependencies-finish","message":"Finish resolving all groups dependencies","contexts":{"resolvedComponents":"{aws.greengrass.Cli=ComponentMetadata(componentIdentifier=aws.greengrass.Cli-v2.12.6, dependencies={aws.greengrass.Nucleus=>=2.12.0 <2.13.0}), aws.greengrass.LogManager=ComponentMetadata(componentIdentifier=aws.greengrass.LogManager-v2.3.8, dependencies={aws.greengrass.Nucleus=>=2.1.0 <2.14.0}), aws.greengrass.LocalDebugConsole=ComponentMetadata(componentIdentifier=aws.greengrass.LocalDebugConsole-v2.4.3, dependencies={aws.greengrass.Cli=>=2.10.0 <2.14.0, aws.greengrass.Nucleus=>=2.10.0 <2.14.0}), aws.greengrass.Nucleus=ComponentMetadata(componentIdentifier=aws.greengrass.Nucleus-v2.12.6, dependencies={})}","componentToVersionRequirements":"{aws.greengrass.Cli={thinggroup/ShuA0c765905CoreGroup==2.12.6, aws.greengrass.LocalDebugConsole=>=2.10.0 <2.14.0}, aws.greengrass.LogManager={LOCAL_DEPLOYMENT==2.3.8}, aws.greengrass.LocalDebugConsole={LOCAL_DEPLOYMENT==2.4.3}, aws.greengrass.Nucleus={aws.greengrass.Cli=>=2.12.0 <2.13.0, aws.greengrass.LogManager=>=2.1.0 <2.14.0, aws.greengrass.LocalDebugConsole=>=2.10.0 <2.14.0}}"},"loggerName":"com.aws.greengrass.componentmanager.DependencyResolver","timestamp":1724746655110,"cause":null}
{"thread":"pool-3-thread-26","level":"INFO","eventType":"prepare-package-start","message":"","contexts":{"packageIdentifier":"aws.greengrass.Cli-v2.12.6"},"loggerName":"com.aws.greengrass.componentmanager.ComponentManager","timestamp":1724746655119,"cause":null}
{"thread":"pool-3-thread-26","level":"INFO","eventType":"prepare-package-start","message":"","contexts":{"packageIdentifier":"aws.greengrass.LogManager-v2.3.8"},"loggerName":"com.aws.greengrass.componentmanager.ComponentManager","timestamp":1724746655165,"cause":null}
{"thread":"pool-3-thread-26","level":"INFO","eventType":"prepare-package-start","message":"","contexts":{"packageIdentifier":"aws.greengrass.LocalDebugConsole-v2.4.3"},"loggerName":"com.aws.greengrass.componentmanager.ComponentManager","timestamp":1724746655169,"cause":null}
{"thread":"pool-3-thread-26","level":"INFO","eventType":"prepare-package-start","message":"","contexts":{"packageIdentifier":"aws.greengrass.Nucleus-v2.12.6"},"loggerName":"com.aws.greengrass.componentmanager.ComponentManager","timestamp":1724746655183,"cause":null}
{"thread":"pool-3-thread-24","level":"INFO","eventType":null,"message":"No component found with a pending bootstrap task","contexts":{},"loggerName":"com.aws.greengrass.deployment.bootstrap.BootstrapManager","timestamp":1724746655240,"cause":null}
{"thread":"pool-3-thread-24","level":"INFO","eventType":"register-service-update-action","message":"","contexts":{"action":"9c1a6853-6a8a-4815-b048-abd0fbd9ce3d","serviceName":"UpdateSystemPolicyService","currentState":"RUNNING"},"loggerName":"com.aws.greengrass.lifecyclemanager.UpdateSystemPolicyService","timestamp":1724746655242,"cause":null}
{"thread":"pool-3-thread-26","level":"INFO","eventType":"service-update-start","message":"","contexts":{"serviceName":"UpdateSystemPolicyService","currentState":"RUNNING"},"loggerName":"com.aws.greengrass.lifecyclemanager.UpdateSystemPolicyService","timestamp":1724746655242,"cause":null}
{"thread":"pool-3-thread-26","level":"INFO","eventType":"merge-config","message":"Applying deployment changes, deployment cannot be cancelled now","contexts":{"deployment":"9c1a6853-6a8a-4815-b048-abd0fbd9ce3d"},"loggerName":"com.aws.greengrass.deployment.DeploymentConfigMerger","timestamp":1724746655242,"cause":null}
{"thread":"pool-3-thread-26","level":"INFO","eventType":null,"message":"Persist configuration snapshot","contexts":{"file":"/home/vagrant/greengrass-poc/greengrass/root/deployments/9c1a6853-6a8a-4815-b048-abd0fbd9ce3d/rollback_snapshot.tlog"},"loggerName":"com.aws.greengrass.deployment.DeploymentDirectoryManager","timestamp":1724746655243,"cause":null}
{"thread":"aws.greengrass.Cli-lifecycle","level":"INFO","eventType":"service-set-state","message":"","contexts":{"serviceName":"aws.greengrass.Cli","currentState":"RUNNING","newState":"STOPPING"},"loggerName":"com.aws.greengrass.cli.CLIService","timestamp":1724746655247,"cause":null}
{"thread":"aws.greengrass.Cli-lifecycle","level":"INFO","eventType":"service-set-state","message":"","contexts":{"serviceName":"aws.greengrass.Cli","currentState":"RUNNING","newState":"STOPPING"},"loggerName":"com.aws.greengrass.cli.CLIService","timestamp":1724746655247,"cause":null}
{"thread":"aws.greengrass.Cli-lifecycle","level":"INFO","eventType":"service-set-state","message":"","contexts":{"serviceName":"aws.greengrass.Cli","currentState":"RUNNING","newState":"STOPPING"},"loggerName":"com.aws.greengrass.cli.CLIService","timestamp":1724746655247,"cause":null}
{"thread":"aws.greengrass.Cli-lifecycle","level":"INFO","eventType":"service-set-state","message":"","contexts":{"serviceName":"aws.greengrass.Cli","currentState":"RUNNING","newState":"STOPPING"},"loggerName":"com.aws.greengrass.cli.CLIService","timestamp":1724746655248,"cause":null}
{"thread":"aws.greengrass.Cli-lifecycle","level":"INFO","eventType":"service-set-state","message":"","contexts":{"serviceName":"aws.greengrass.Cli","currentState":"RUNNING","newState":"STOPPING"},"loggerName":"com.aws.greengrass.cli.CLIService","timestamp":1724746655248,"cause":null}

After this, the log infinitely continues with this {"thread":"aws.greengrass.Cli-lifecycle","level":"INFO","eventType":"service-set-state","message":"","contexts":{"serviceName":"aws.greengrass.Cli","currentState":"RUNNING","newState":"STOPPING"},"loggerName":"com.aws.greengrass.cli.CLIService","timestamp":1724746655248,"cause":null} in the last lines.

When I stop the VM and restart it, the log spamming did not occur and the deployment succeeded:

{"thread":"pool-3-thread-6","level":"INFO","eventType":"merge-config","message":"Applying deployment changes, deployment cannot be cancelled now","contexts":{"deployment":"9c1a6853-6a8a-4815-b048-abd0fbd9ce3d"},"loggerName":"com.aws.greengrass.deployment.DeploymentConfigMerger","timestamp":1724782811680,"cause":null}
{"thread":"pool-3-thread-6","level":"INFO","eventType":null,"message":"Persist configuration snapshot","contexts":{"file":"/home/vagrant/greengrass-poc/greengrass/root/deployments/9c1a6853-6a8a-4815-b048-abd0fbd9ce3d/rollback_snapshot.tlog"},"loggerName":"com.aws.greengrass.deployment.DeploymentDirectoryManager","timestamp":1724782811680,"cause":null}
{"thread":"Serialized listener processor","level":"INFO","eventType":"service-config-change","message":"Requesting restart for component","contexts":{"configNode":"services.main.lifecycle","serviceName":"main","currentState":"FINISHED"},"loggerName":"com.aws.greengrass.lifecyclemanager.GenericExternalService","timestamp":1724782811687,"cause":null}
{"thread":"main-lifecycle","level":"INFO","eventType":"service-set-state","message":"","contexts":{"serviceName":"main","currentState":"FINISHED","newState":"INSTALLED"},"loggerName":"com.aws.greengrass.lifecyclemanager.GenericExternalService","timestamp":1724782811687,"cause":null}
{"thread":"pool-3-thread-6","level":"INFO","eventType":"merge-config","message":"Removing services","contexts":{"service-to-remove":"[]"},"loggerName":"com.aws.greengrass.deployment.DeploymentConfigMerger","timestamp":1724782811688,"cause":null}
{"thread":"main-lifecycle","level":"INFO","eventType":"service-set-state","message":"","contexts":{"serviceName":"main","currentState":"INSTALLED","newState":"STARTING"},"loggerName":"com.aws.greengrass.lifecyclemanager.GenericExternalService","timestamp":1724782811688,"cause":null}
{"thread":"pool-3-thread-6","level":"INFO","eventType":"merge-config","message":"All services updated","contexts":{"deploymentId":"9c1a6853-6a8a-4815-b048-abd0fbd9ce3d"},"loggerName":"com.aws.greengrass.deployment.activator.DeploymentActivator","timestamp":1724782811688,"cause":null}
{"thread":"pool-3-thread-19","level":"INFO","eventType":"generic-service-finished","message":"Nothing done","contexts":{"serviceName":"main","currentState":"STARTING"},"loggerName":"com.aws.greengrass.lifecyclemanager.GenericExternalService","timestamp":1724782811688,"cause":null}
{"thread":"main-lifecycle","level":"INFO","eventType":"service-set-state","message":"","contexts":{"serviceName":"main","currentState":"STARTING","newState":"FINISHED"},"loggerName":"com.aws.greengrass.lifecyclemanager.GenericExternalService","timestamp":1724782811688,"cause":null}
{"thread":"main-lifecycle","level":"INFO","eventType":"service-set-state","message":"","contexts":{"serviceName":"main","currentState":"FINISHED","newState":"STOPPING"},"loggerName":"com.aws.greengrass.lifecyclemanager.GenericExternalService","timestamp":1724782811688,"cause":null}
{"thread":"pool-3-thread-19","level":"INFO","eventType":null,"message":"Shutdown initiated","contexts":{"serviceName":"main","currentState":"STOPPING"},"loggerName":"com.aws.greengrass.lifecyclemanager.GenericExternalService","timestamp":1724782811688,"cause":null}
{"thread":"pool-3-thread-6","level":"INFO","eventType":"service-update-finish","message":"","contexts":{"serviceName":"UpdateSystemPolicyService","currentState":"RUNNING"},"loggerName":"com.aws.greengrass.lifecyclemanager.UpdateSystemPolicyService","timestamp":1724782811688,"cause":null}
{"thread":"pool-3-thread-19","level":"INFO","eventType":"generic-service-shutdown","message":"","contexts":{"serviceName":"main","currentState":"STOPPING"},"loggerName":"com.aws.greengrass.lifecyclemanager.GenericExternalService","timestamp":1724782811688,"cause":null}
{"thread":"pool-3-thread-5","level":"INFO","eventType":"deployment-task-execution","message":"Finished deployment task","contexts":{"deploymentId":"9c1a6853-6a8a-4815-b048-abd0fbd9ce3d","serviceName":"DeploymentService","currentState":"RUNNING"},"loggerName":"com.aws.greengrass.deployment.DeploymentService","timestamp":1724782811688,"cause":null}
shuheinakahodo-catapult commented 2 weeks ago
  1. I don't think there is any component that interacts with MQTT often. Here are MQTT related logs from my machine just in case:
{"thread":"main","level":"INFO","eventType":null,"message":"Spooler has been configured","contexts":{"maxSizeInBytes":"2621440","storageType":"Memory","keepQos0WhenOffline":"false"},"loggerName":"com.aws.greengrass.mqttclient.spool.Spool","timestamp":1724782554334,"cause":null}
{"thread":"main","level":"INFO","eventType":null,"message":"Memory Spooler has been set up","contexts":{},"loggerName":"com.aws.greengrass.mqttclient.spool.Spool","timestamp":1724782554335,"cause":null}
{"thread":"AwsEventLoop 2","level":"ERROR","eventType":null,"message":"Failed to connect to AWS IoT Core","contexts":{"clientId":"ShuA0c765905Core","error":"A query to dns failed to resolve."},"loggerName":"com.aws.greengrass.mqttclient.AwsIotMqtt5Client","timestamp":1724782554423,"cause":null}
{"thread":"AwsEventLoop 2","level":"ERROR","eventType":null,"message":"Failed to connect to AWS IoT Core","contexts":{"clientId":"ShuA0c765905Core","error":"A query to dns failed to resolve."},"loggerName":"com.aws.greengrass.mqttclient.AwsIotMqtt5Client","timestamp":1724782555352,"cause":null}
{"thread":"AwsEventLoop 2","level":"INFO","eventType":null,"message":"Successfully connected to AWS IoT Core","contexts":{"clientId":"ShuA0c765905Core","sessionPresent":"false"},"loggerName":"com.aws.greengrass.mqttclient.AwsIotMqtt5Client","timestamp":1724782557964,"cause":null}