Open nickdnk opened 1 year ago
I'm getting a lot of this as well. Again only on the worker environment and always only one of the two.
Last 100 of the problematic instance. There are no errors in any of the other log files I can find.
2023/08/16 14:57:51.260683 [INFO] Running command /bin/sh -c systemctl stop xray.service
2023/08/16 14:57:51.274434 [INFO] Executing instruction: stop proxy
2023/08/16 14:57:51.274482 [INFO] Running command /bin/sh -c systemctl show -p PartOf httpd.service
2023/08/16 14:57:51.287472 [WARN] deregisterProcess Warning: process httpd is not registered, skipping...
2023/08/16 14:57:51.287505 [INFO] Running command /bin/sh -c systemctl show -p PartOf nginx.service
2023/08/16 14:57:51.300385 [INFO] Running command /bin/sh -c systemctl is-active nginx.service
2023/08/16 14:57:51.309818 [INFO] Running command /bin/sh -c systemctl show -p PartOf nginx.service
2023/08/16 14:57:51.320137 [INFO] Running command /bin/sh -c systemctl stop nginx.service
2023/08/16 14:57:51.467114 [INFO] Running command /bin/sh -c systemctl disable nginx.service
2023/08/16 14:57:51.704484 [INFO] Running command /bin/sh -c systemctl daemon-reload
2023/08/16 14:57:51.944630 [INFO] Running command /bin/sh -c systemctl reset-failed
2023/08/16 14:57:51.953801 [INFO] Executing instruction: stop php-fpm
2023/08/16 14:57:51.953835 [INFO] Running command /bin/sh -c systemctl show -p PartOf php-fpm.service
2023/08/16 14:57:51.966522 [INFO] Running command /bin/sh -c systemctl stop php-fpm.service
2023/08/16 14:57:52.057020 [INFO] Executing instruction: FlipApplication
2023/08/16 14:57:52.057047 [INFO] Removing /var/app/current/ if it exists
2023/08/16 14:57:52.363265 [INFO] Renaming /var/app/staging/ to /var/app/current/
2023/08/16 14:57:52.363526 [INFO] create soft link from /var/app/current/ to /var/www/html
2023/08/16 14:57:52.363573 [INFO] Executing instruction: start X-Ray
2023/08/16 14:57:52.363579 [INFO] X-Ray is not enabled.
2023/08/16 14:57:52.363582 [INFO] Executing instruction: start php-fpm
2023/08/16 14:57:52.363779 [INFO] Running command /bin/sh -c systemctl show -p PartOf php-fpm.service
2023/08/16 14:57:52.379370 [WARN] Warning: process php-fpm is already registered...
Deregistering the process ...
2023/08/16 14:57:52.379402 [INFO] Running command /bin/sh -c systemctl show -p PartOf php-fpm.service
2023/08/16 14:57:52.393227 [INFO] Running command /bin/sh -c systemctl is-active php-fpm.service
2023/08/16 14:57:52.404069 [INFO] Running command /bin/sh -c systemctl disable php-fpm.service
2023/08/16 14:57:52.614543 [INFO] Running command /bin/sh -c systemctl daemon-reload
2023/08/16 14:57:52.837774 [INFO] Running command /bin/sh -c systemctl reset-failed
2023/08/16 14:57:52.849583 [INFO] Running command /bin/sh -c systemctl daemon-reload
2023/08/16 14:57:53.050823 [INFO] Running command /bin/sh -c systemctl reset-failed
2023/08/16 14:57:53.059984 [INFO] Running command /bin/sh -c systemctl show -p PartOf php-fpm.service
2023/08/16 14:57:53.074126 [INFO] Running command /bin/sh -c systemctl is-active php-fpm.service
2023/08/16 14:57:53.083895 [INFO] Running command /bin/sh -c systemctl start php-fpm.service
2023/08/16 14:57:53.267227 [INFO] Executing instruction: start proxy with new configuration
2023/08/16 14:57:53.267276 [INFO] Running command /bin/sh -c /usr/sbin/nginx -t -c /var/proxy/staging/nginx/nginx.conf
2023/08/16 14:57:53.291203 [INFO] nginx: the configuration file /var/proxy/staging/nginx/nginx.conf syntax is ok
nginx: configuration file /var/proxy/staging/nginx/nginx.conf test is successful
2023/08/16 14:57:53.291368 [INFO] Running command /bin/sh -c cp -rp /var/proxy/staging/nginx/* /etc/nginx
2023/08/16 14:57:53.294811 [INFO] Running command /bin/sh -c systemctl show -p PartOf nginx.service
2023/08/16 14:57:53.311070 [INFO] Running command /bin/sh -c systemctl daemon-reload
2023/08/16 14:57:53.489838 [INFO] Running command /bin/sh -c systemctl reset-failed
2023/08/16 14:57:53.499430 [INFO] Running command /bin/sh -c systemctl show -p PartOf nginx.service
2023/08/16 14:57:53.513675 [INFO] Running command /bin/sh -c systemctl is-active nginx.service
2023/08/16 14:57:53.523517 [INFO] Running command /bin/sh -c systemctl start nginx.service
2023/08/16 14:57:53.647258 [INFO] Executing instruction: configureSqsd
2023/08/16 14:57:53.650422 [INFO] get sqsd conf from cfn metadata and write into sqsd conf file ...
2023/08/16 14:57:53.651915 [INFO] Executing instruction: startSqsd
2023/08/16 14:57:53.651935 [INFO] Running command /bin/sh -c systemctl show -p PartOf sqsd.service
2023/08/16 14:57:53.663928 [INFO] Running command /bin/sh -c systemctl is-active sqsd.service
2023/08/16 14:57:53.672727 [INFO] Running command /bin/sh -c systemctl start sqsd.service
2023/08/16 14:57:59.859633 [INFO] Executing instruction: Track pids in healthd
2023/08/16 14:57:59.859649 [INFO] This is an enhanced health env...
2023/08/16 14:57:59.859661 [INFO] Running command /bin/sh -c systemctl show -p ConsistsOf aws-eb.target | cut -d= -f2
2023/08/16 14:57:59.870997 [INFO] nginx.service cfn-hup.service amazon-cloudwatch-agent.service php-fpm.service sqsd.service healthd.service
2023/08/16 14:57:59.871029 [INFO] Running command /bin/sh -c systemctl show -p ConsistsOf eb-app.target | cut -d= -f2
2023/08/16 14:57:59.881275 [INFO]
2023/08/16 14:57:59.881499 [INFO] Executing instruction: RunAppDeployPostDeployHooks
2023/08/16 14:57:59.881585 [INFO] Executing platform hooks in .platform/hooks/postdeploy/
2023/08/16 14:57:59.881612 [INFO] The dir .platform/hooks/postdeploy/ does not exist
2023/08/16 14:57:59.881615 [INFO] Finished running scripts in /var/app/current/.platform/hooks/postdeploy
2023/08/16 14:57:59.881619 [INFO] Executing cleanup logic
2023/08/16 14:57:59.881705 [INFO] CommandService Response: {"status":"SUCCESS","api_version":"1.0","results":[{"status":"SUCCESS","msg":"Engine execution has succeeded.","returncode":0,"events":[{"msg":"Instance deployment: You included a 'vendor' folder in your source bundle. The deployment ignored 'composer.json' and didn't install Composer dependencies.","timestamp":1692197866757,"severity":"INFO"},{"msg":"Instance deployment completed successfully.","timestamp":1692197879881,"severity":"INFO"}]}]}
2023/08/16 14:57:59.881862 [INFO] Platform Engine finished execution on command: app-deploy
2023/08/16 14:58:05.105310 [INFO] Starting...
2023/08/16 14:58:05.105381 [INFO] Starting EBPlatform-PlatformEngine
2023/08/16 14:58:05.105401 [INFO] reading event message file
2023/08/16 14:58:05.105703 [INFO] Engine received EB command cfn-hup-exec
2023/08/16 14:58:05.174719 [INFO] Running command /bin/sh -c /opt/aws/bin/cfn-get-metadata -s arn:aws:cloudformation:eu-central-1:REDACTED:stack/awseb-e-REDACTED/REDACTED -r AWSEBAutoScalingGroup --region eu-central-1
2023/08/16 14:58:05.508542 [INFO] Running command /bin/sh -c /opt/aws/bin/cfn-get-metadata -s arn:aws:cloudformation:eu-central-1:REDACTED:stack/awseb-e-REDACTED/REDACTED -r AWSEBBeanstalkMetadata --region eu-central-1
2023/08/16 14:58:05.837325 [INFO] checking whether command app-deploy is applicable to this instance...
2023/08/16 14:58:05.837343 [INFO] this command is not applicable to the instance, thus instance shouldn't execute command
2023/08/16 14:58:05.837347 [INFO] skip command app-deploy for this instance...
2023/08/16 14:58:05.837367 [ERROR] Ignoring not applicable command.
2023/08/16 14:58:05.837370 [INFO] Executing cleanup logic
2023/08/16 14:58:05.837468 [INFO] CommandService Response: {"status":"FAILURE","api_version":"1.0","results":[{"status":"FAILURE","msg":"Ignoring not applicable command.","returncode":0,"events":[]}]}
2023/08/16 14:58:05.837673 [INFO] Platform Engine finished execution on command: app-deploy
2023/08/16 15:08:34.115957 [INFO] Starting...
2023/08/16 15:08:34.116018 [INFO] Starting EBPlatform-PlatformEngine
2023/08/16 15:08:34.116038 [INFO] reading event message file
2023/08/16 15:08:34.116228 [INFO] Engine received EB command cfn-hup-exec
2023/08/16 15:08:34.204704 [INFO] Running command /bin/sh -c /opt/aws/bin/cfn-get-metadata -s arn:aws:cloudformation:eu-central-1:REDACTED:stack/awseb-e-REDACTED/REDACTED -r AWSEBAutoScalingGroup --region eu-central-1
2023/08/16 15:08:34.543080 [INFO] Running command /bin/sh -c /opt/aws/bin/cfn-get-metadata -s arn:aws:cloudformation:eu-central-1:REDACTED:stack/awseb-e-REDACTED/REDACTED -r AWSEBBeanstalkMetadata --region eu-central-1
2023/08/16 15:08:34.889230 [INFO] checking whether command tail-log is applicable to this instance...
2023/08/16 15:08:34.889243 [INFO] this command is applicable to the instance, thus instance should execute command
2023/08/16 15:08:34.889246 [INFO] Engine command: (tail-log)
2023/08/16 15:08:34.889291 [INFO] Executing instruction: GetTailLogs
2023/08/16 15:08:34.889294 [INFO] Tail Logs...
2023/08/16 15:08:34.889836 [INFO] Running command /bin/sh -c tail -n 100 /var/log/eb-engine.log
I am unable to repro the issue with the latest php8.2 platform, please use support plan to get support from the team.
I am unable to repro the issue with the latest php8.2 platform, please cut a ticket to EB with more information attached.
Do you work at AWS? It’s unclear from your profile. Where exactly do I submit this, and what more information do you need?
I am unable to repro the issue with the latest php8.2 platform, please use support plan to get support from the team.
I don't need paid support. I need you to fix what is clearly a bug in the deployment flow. This didn't happen on previous versions of EB.
I'm using a mix of spot and on-demand instances for the worker environment, so it may be related to that.
Sorry we took a look at your log but the information that could help us is still limited.
2023/08/08 12:26:15.230601 [INFO] checking whether command app-deploy is applicable to this instance...
2023/08/08 12:26:15.230615 [INFO] this command is not applicable to the instance, thus instance shouldn't execute command
2023/08/08 12:26:15.230618 [INFO] skip command app-deploy for this instance...
2023/08/08 12:26:15.230630 [ERROR] Ignoring not applicable command.
This log can only help us nail down to the fact that the command service is not sending the correct instance ID to platform engine. However, without further information, we cannot figure out what was wrong with the CFN exect command request.
We are more than happy to help but raising a support case through AWS cloud support is still the best way of moving forward. The cloud support will also investigate if there are something specific set up in your environments.
I'm having the same issue, with PHP 8.0, so it seems there's a problem with the engine.
Sorry we took a look at your log but the information that could help us is still limited.
2023/08/08 12:26:15.230601 [INFO] checking whether command app-deploy is applicable to this instance... 2023/08/08 12:26:15.230615 [INFO] this command is not applicable to the instance, thus instance shouldn't execute command 2023/08/08 12:26:15.230618 [INFO] skip command app-deploy for this instance... 2023/08/08 12:26:15.230630 [ERROR] Ignoring not applicable command.
This log can only help us nail down to the fact that the command service is not sending the correct instance ID to platform engine. However, without further information, we cannot figure out what was wrong with the CFN exect command request.
We are more than happy to help but raising a support case through AWS cloud support is still the best way of moving forward. The cloud support will also investigate if there are something specific set up in your environments.
I don't have access to cloud support, so I cannot do that. What information do you need to help debug this?
I also have problems with our web environment consistently having "Degraded" health with no instances being in bad health and no problems whatsoever. This environment also includes spot instances, so it may be related to that. Have you tried including spot instances in attempt to recreate some of these problems? As you can see from @tehmaestro's response, it's not only affecting me, and as I said I changed nothing in our setup since it worked.
Clicking "View causes" takes me to this:
It seems to remain in this degraded health state either indefinitely or for several hours (not entirely sure). Edit: And just to be clear, I'm not doing anything crazy custom. Most things are simply set up using the EB console and it's mostly the default/suggested parameters. I have a few things set up with platform hooks (prebuild installs the APCu extension, but it only runs once per instance and simply skips the following deployments -- its log also has no errors) and nginx .conf files for virtual hosts, but these things have been the same for a long time and this health issue is new and started after 8.2 was released (that means version 4.x of EB). I also have ELB health checks enabled, but as evident from the reported health of the instances, this is working as expected. The fact that it's always only 1 of the workers and not both of them indicates that the problem is with the deployment logic and not with anything I'm doing.
Edit 2: Health is now back to normal for the web environment. I changed nothing and the health of all 3 instances has been OK the entire duration. That's 4 hours of degraded health for no obvious reason.
I updated EB to version 4.0.1, but it still hangs on deployment to our worker environment.
I've found out that maybe the health remaining degraded for the web environment is because of unbalanced AZ distribution due to spot instance availability, but this is not correctly displayed anywhere on the EB dashboard as far as I can tell. It's indicated in emails I receive from SNS, but as I mentioned clicking "View causes" on health does not help at all in figuring out the cause of "Degraded" health as all instances report as healthy. You may want to look into displaying this information somewhere.
EB 4.0.3 still causes 1 worker to hang on every deployment. It seems to always be the instance that has Lifecycle spot
, which is configured in EB like this:
Instances min: 2 Instances max: 4 Fleet composition: Combine purchase options and instances Maximum spot price: Default On-demand base: 1 On-demand above base: 0% Capacity rebalancing: On
Maybe this can help reproduce the issue. Again, it only happens for workers, never the web server env.
I should also add that the update comes from Codepipeline. It is, however, the very last step of the pipeline and the pipeline itself reports the deployment as successful, even though EB remains in degraded health until I reboot the instance. Deploying again without rebooting the instance causes the same instance to report the "wrong version deployed" error.
I'm also running a worker on EB with 'Docker running on 64bit Amazon Linux 2/3.6.4',
The initial build goes fine, and all servers are up and running as expected. However, every time I push a new update if I have 4 active servers only 1 will succeed the other 3 get stuck in the "upgrading state". If I drop to two servers 1 will succeed 1 will fail. If I have 8 servers 1 will succeed and 7 will fail.
The servers are actually running and processing data but are as far as EB is aware stuck upgrading. After some time EB will rollback the update on the stuck servers so then the servers are out of sync. If I terminate each failed server before it rolls back it rebuilds with the new code every time - no issue. If I push an update all but one server will get stuck in the loop. It happens every time without fail. All my non-worker EB servers running the same configuration are ok, this only affects my worker.
The eb-engine.log on each failed server looks like this:
2023/12/07 12:55:15.703103 [INFO] checking whether command app-deploy is applicable to this instance... 2023/12/07 12:55:15.703122 [INFO] this command is not applicable to the instance, thus instance shouldn't execute command 2023/12/07 12:55:15.703125 [INFO] skip command app-deploy for this instance... 2023/12/07 12:55:15.703133 [ERROR] Ignoring not applicable command. 2023/12/07 12:55:15.703136 [INFO] Executing cleanup logic 2023/12/07 12:55:15.703186 [INFO] CommandService Response: {"status":"FAILURE","api_version":"1.0","results":[{"status":"FAILURE","msg":"Ignoring not applicable command.","returncode":0,"events":[]}]}
My instances are 'On-Demand' and not 'Spot'. My Docker containers are running PHP 8.2. My code is also being updated via CodePipeline.
~I think this problem may have been resolved with 4.0.5. I've just deployed and did not have any hanging worker instances. Up until now It's been consistently hanging on every deployment without exception since I created this issue.~
Never mind. It still doesn't work. Maybe a freak accident that it worked just before.
I'm on Platform Docker running on 64bit Amazon Linux 2/3.7.1 and the issue persists.
The issue persists on EB 4.1.0.
Still broken on 4.1.1.
The PHP 8.3 platform is still affected by this issue. I have to reboot the spot worker instance on every deployment.
Hi Nickdnk!. I am running the same problem.. I have to restart the servers every day. Did you find the issue? Thanks in advance!
No, sorry.
The issue continued to persist even through the platform updates. I had one app deployed that didn't suffer from this and then suddenly it too started to have the same issue. I've moved from using Elastic Beanstalk to using ECR Fargate instead.
No, sorry.
The issue continued to persist even through the platform updates. I had one app deployed that didn't suffer from this and then suddenly it too started to have the same issue. I've moved from using Elastic Beanstalk to using ECR Fargate instead.
I'm in the process of doing the exact same thing as we speak. I've been on EB PHP for almost a decade, but its lack of flexibility has started to bother me, and I want to run our workloads on roadrunner, which isn't possible with the EB PHP stack.
I really appreciate the heads-up about Roadrunner. I’ve been with EB for about six years now, and while using PHP 7.X, I never ran into any issues. But after AWS deprecated that version, I had to upgrade to PHP 8.2. Unfortunately, ever since the migration, the servers have been hanging pretty frequently—sometimes once a day. For example, it happened this morning at 4:00 AM.
I always have to manually restart them, which is getting frustrating. I’m looking for a way to either fix this or set up automatic restarts. If I can’t find a solution soon, I might need to consider migrating.
Thanks again for your help!
Hello
After upgrading our platform to PHP 8.2 (4.0.0), we consistently have issues with our worker environment updating only one instance and letting the other one hang indefinitely. The pipeline considers the deployment successful, but health shows this:
Terminating the instance will launch a new one which works, which points to some kind of deployment command stalling. Rebooting the instance also works.
Relevant log from the hanging instance:
eb-engine.log