cloudcaptainsh / cloudcaptain

Issue Tracker for CloudCaptain
13 stars 3 forks source link

Cannot get new version of boxfuse to use -healthcheck=false anymore. The ELB never gets into a state that allows the project to run #268

Open noeljgrover opened 2 years ago

noeljgrover commented 2 years ago

Requesting system reboot => 20:28:40.882 i-xxxxxxx -> i-xxxxxxx => [ 20.087872] reboot: Restarting system 20:29:28.689 Destroying all Instances in Auto Scaling Group xxxxxxx ... 20:29:43.497 Auto Scaling Group: i-003xxxxxxx [Terminating] 20:30:44.249 Destroying Auto Scaling Group xxxxx ... 20:30:50.990 Destroying Launch Template xxxxxxx ... 20:30:50.990 WARNING: Run failed: Time out: ELB failed to put instances in service within 300 seconds => check the instance logs => ensure your application responds with an HTTP 200 at / on port 80 => ensure the healthcheck configuration (healthcheck.port, healthcheck.path, healthcheck.timeout) is correct 20:30:51.000 ERROR: Running xxxxxxxx failed! com.boxfuse.base.exception.BoxfuseException: Running xxxxxxxxxxxxxxxx failed! at com.boxfuse.client.core.Boxfuse.run(Boxfuse.java:655) at com.boxfuse.client.commandline.Main.run(Main.java:325) at com.boxfuse.client.commandline.Main.main(Main.java:133)

noeljgrover commented 2 years ago

This really looks like a bug. Why? Because I can get healthchecks working inside of AWS ELB TCP:80

BUT, boxfuse is looking for HTTP 200 at / on port 80.

I cannot get HTTP port 80 / to work inside of AWS ELB, thus boxfuse does not see the ELB go into a healthy state,

So, How do I force boxfuse to use TCP 80 vs HTTP????

HERE IS SUPPORTING Debugging logging:

13:55:47.784 Destroying Launch Template xxxxxxxx ... 13:55:49.553 WARNING: Run failed: Time out: ELB failed to put instances in service within 300 seconds => check the instance logs => ensure your application responds with an HTTP 200 at / on port 80 => ensure the healthcheck configuration (healthcheck.port, healthcheck.path, healthcheck.timeout) is correct 13:55:49.560 ERROR: Running xxxxxxxxxxfailed! com.boxfuse.base.exception.BoxfuseException: Running xxxxxxxxxx failed! at com.boxfuse.client.core.Boxfuse.run(Boxfuse.java:655) at com.boxfuse.client.commandline.Main.run(Main.java:325) at com.boxfuse.client.commandline.Main.main(Main.java:133)

noeljgrover commented 2 years ago

@axelfontaine Please do respond this is a production system that needs to be updated. Thank you

axelfontaine commented 2 years ago

Passing -healthcheck=false does indeed set the ELB healthcheck config to TCP:80 instead of HTTP:80, but only when the ELB is initially created.

If your ELB already exists and you don't want to destroy it and have it recreated, you can safely update it's healthcheck configuration in the AWS console. That update then won't be overridden by subsequent deploys.

noeljgrover commented 2 years ago

The elb is set to tcp:80 it passes last build and is running smoothly.

The issue is in the healthchecks.path I cannot set to tcp:80 as they fail during deployment and I cannot set the -healthcheck.path=tcp and I cannot pass as you state -healthcheck=false as that produces error. I have reported the debugging in a post on GitHub.

There is a bug in your current build that does not allow -healthcheck=false. Please check your code.

On Wed, Apr 27, 2022 at 12:33 AM Axel Fontaine @.***> wrote:

Passing -healthcheck=false does indeed set the ELB healthcheck config to TCP:80 instead of HTTP:80, but only when the ELB is initially created.

If your ELB already exists and you don't want to destroy it and have it recreated, you can safely update it's healthcheck configuration in the AWS console. That update then won't be overridden by subsequent deploys.

— Reply to this email directly, view it on GitHub https://github.com/cloudcaptainsh/cloudcaptain/issues/268#issuecomment-1110561611, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCI4HXDG5MFK364QTBFH5LVHDGSHANCNFSM5UCUIIFQ . You are receiving this because you authored the thread.Message ID: @.***>

-- LEGAL DISCLAIMER: The contents of this e-mail and any attachments are  strictly confidential and they may not be used or disclosed by someone who  is not a named recipient. If you have received this email in error please  notify the sender by replying to this email inserting the word "misdirected" as the message and delete this e-mail from your system.

axelfontaine commented 2 years ago

The actual healthcheck is performed by the ELB itself, and that appears to be failing. Make sure your application responds properly to the ELB request and things should work smoothly again.

noeljgrover commented 2 years ago

Hi Axel,

I'm sorry but you have a bug in the new build.

That is what I'm trying to communicate with you, but failing earlier.

The ELB is just fine and healthchecks are 100% ok with build 7.11.31 and is running fine for the last 4 months.

[image: Screen Shot 2022-04-30 at 5.20.28 PM.png] [image: Screen Shot 2022-04-30 at 5.20.43 PM.png]

[image: Screen Shot 2022-04-30 at 5.20.05 PM.png]

The issue is that I updated boxfuse to the newest version :

boxfuse -version CloudCaptain Client (previously called Boxfuse) v.1.35.2.1525 Copyright 2022 Axel Fontaine Labs GmbH. All rights reserved.

Now I cannot use the parameter -healthchecks=false anymore. It always fails the healthchecks now.

-- Noel Grover Founder & CEO, VoiceIt 612-423-9015 voiceit.io

LEGAL DISCLAIMER: The contents of this e-mail and any attachments are strictly confidential and they may not be used or disclosed by someone who is not a named recipient. If you have received this email in error please notify the sender by replying to this email inserting the word "misdirected" as the message and delete this e-mail from your system.

On Thu, Apr 28, 2022 at 3:41 AM Axel Fontaine @.***> wrote:

The actual healthcheck is performed by the ELB itself, and that appears to be failing. Make sure your application responds properly to the ELB request and things should work smoothly again.

— Reply to this email directly, view it on GitHub https://github.com/cloudcaptainsh/cloudcaptain/issues/268#issuecomment-1111909033, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCI4HROTRFCNT563H35OMTVHJFLBANCNFSM5UCUIIFQ . You are receiving this because you authored the thread.Message ID: @.***>

-- LEGAL DISCLAIMER: The contents of this e-mail and any attachments are  strictly confidential and they may not be used or disclosed by someone who  is not a named recipient. If you have received this email in error please  notify the sender by replying to this email inserting the word "misdirected" as the message and delete this e-mail from your system.

noeljgrover commented 2 years ago

I have to keep the current ELB and it not failing checks on TCP:80

it's running build 7.11.31 from 4 months ago.

The issues is I cannot use the same -healthcheck=false with newest build 7.11.32

It fails everytime with -healthcheck=false

noeljgrover commented 2 years ago

boxfuse run /Users/noel/development/deploy/voiceitapi1webonly-7.11.32.war -env=voiceitprod -jvm.args="-Dfile.encoding=UTF-8 -Dnet -XX:+UseConcMarkSweepGC" -healthcheck=false -X -securitygroup=sg-fbf4d386 17:34:34.880 CloudCaptain Client (previously called Boxfuse) v.1.35.2.1525 17:34:34.883 Copyright 2022 Axel Fontaine Labs GmbH. All rights reserved. 17:34:34.883 17:34:34.886 Loading configuration from xxxxxxx/boxfuse.conf ... 17:34:34.889 secret -> Mojk**** 17:34:34.889 user -> xxxxxxxx 17:34:34.890 Skipping non-existent config file: xxxx/boxfuse-voiceitprod.conf 17:34:34.890 Skipping non-existent config file: xxxxx boxfuse.conf 17:34:34.890 Skipping non-existent config file: xxxxxxx /boxfuse-voiceitprod.conf 17:34:34.890 Skipping non-existent config file: xxxxxx/./boxfuse.conf 17:34:34.890 Skipping non-existent config file: xxxxx./boxfuse-voiceitprod.conf 17:34:34.918 Found VBoxManage in /usr/local/bin

17:34:36.436 Unable to read cached account data for offline use: java.lang.ClassCastException -> java.util.LinkedHashMap cannot be cast to java.util.List

17:34:36.501 Account:xxxx 17:34:36.501 17:34:36.501 Cache Directory: /Users/noel/.boxfuse/cache 17:34:36.501 Work Directory: /Users/noel/.boxfuse/work 17:34:36.521 17:34:36.521 Using configuration: 17:34:36.521 env -> voiceitprod 17:34:36.521 healthcheck -> false 17:34:36.521 jvm.args -> -Dfile.encoding=UTF-8 -Dnet -XX:+UseConcMarkSweepGC 17:34:36.521 jvm.jmx -> true 17:34:36.521 jvm.main.class -> null 17:34:36.521 payload -> /Users/noel/development/deploy/voiceitapi1webonly-7.11.32.war 17:34:36.521 ports.jmx -> 5555/http:34.193.93.14 17:34:36.521 securitygroup -> sg-fbf4d386 17:34:36.521 17:34:36.524 Using Payload: /Users/noel/development/deploy/voiceitapi1webonly-7.11.32.war 17:34:38.967 linux 4.14.14 found in local inventory 17:34:38.980 glibc 2.25 found in local inventory 17:34:38.985 libgcc 4.9.2 found in local inventory 17:34:38.988 busybox 1.22.1.012 found in local inventory 17:34:38.992 cacerts 2020.01.14 found in local inventory 17:34:38.994 vboxsf 4.14.14 found in local inventory 17:34:38.996 libpng 1.2.52 found in local inventory 17:34:38.997 zlib 1.2.8 found in local inventory 17:34:39.000 freetype 2.6 found in local inventory 17:34:39.008 ttf-bitstream-vera 1.10 found in local inventory 17:34:39.056 tomcat 8.5.24 found in local inventory 17:34:39.056 Auto-configured http port to 80 17:34:39.202 openjdk 17.0.0.4 found in local inventory 17:34:42.914 Auto-configured payload port to http 17:34:42.914 Auto-configured healthcheck port to http 17:34:50.996 Using configured security group: sg-fbf4d386 (voiceit-api-1) 17:34:52.612 Creating Launch Template boxlt-noelgrover-voiceitprod-voiceitapi1webonly-7.11.32 ... 17:34:52.612 Creating Auto Scaling Group boxasg-noelgrover-voiceitprod-voiceitapi1webonly-7.11.32 ... 17:34:55.906 Waiting for Auto Scaling Group boxasg-noelgrover-voiceitprod-voiceitapi1webonly-7.11.32 to launch 1 t3.small Instance ... 17:35:01.065 Auto Scaling Group: i-061e92551c374a002 [Pending] 17:35:17.719 Auto Scaling Group: i-061e92551c374a002 [InService] 17:35:17.719 Waiting for ELB to put instances in service ... 17:35:21.054 ELB: i-061e92551c374a002 [OutOfService] => Instance registration is still in progress. 17:35:27.671 ELB: i-061e92551c374a002 [OutOfService] => Instance has not passed the configured HealthyThreshold number of health checks consecutively. 17:35:41.102 ELB: i-061e92551c374a002 [OutOfService] => Instance has failed at least the UnhealthyThreshold number of health checks consecutively.

Again this fails healthchecks, when it's not suppose to be checking....

noeljgrover commented 2 years ago

17:46:10.944 WARNING: Run failed: Time out: ELB failed to put instances in service within 300 seconds => check the instance logs => ensure your application responds with an HTTP 200 at / on port 80 => ensure the healthcheck configuration (healthcheck.port, healthcheck.path, healthcheck.timeout) is correct 17:46:10.954 ERROR: Running noelgrover/voiceitapi1webonly:7.11.32 failed! com.boxfuse.base.exception.BoxfuseException: Running noelgrover/voiceitapi1webonly:7.11.32 failed! at com.boxfuse.client.core.Boxfuse.run(Boxfuse.java:655) at com.boxfuse.client.commandline.Main.run(Main.java:325) at com.boxfuse.client.commandline.Main.main(Main.java:133)

Again the current ELB is passing on TCP:80 not HTTP 200 at / on port 80.

So there is a bug in current boxfuse and having to revert to backup version