Image Validator DOWN - 17th December 2019

IIIF / image-validator

Validator for the Image API

http://iiif.io/api/image/validator/

Apache License 2.0

35 stars 20 forks source link

Image Validator DOWN - 17th December 2019 #80

Closed glenrobson closed 4 years ago

glenrobson commented 4 years ago

Reported by Régis:

" The parameters seem to be ok, but the process end up with a blank page. The url is https://iiif.io/api/image/validator/results/?server=https%3A%2F%2Fccj-iiif.huma-num.fr&prefix=iiif%2Fimage&identifier=JP2%2F67352ccc-d1b0-11e1-89ae-279075081939.jp2&version=2.0&level=1"

Following the link results in a blank page with just the title and a Return to Validator link. Going to the validator page:

https://iiif.io/api/image/validator/

it doesn't list the available tests and checking:

http://image-validator.iiif.io

you get a blank screen. So I think the image validator Elastic Beanstalk service is down.

glenrobson commented 4 years ago

According to Elastic Beanstalk service it went down yesterday at 17:01 UK time. It did send a notification to say it had got into a Severe state. It sent the following emails:

16:39 - Environment health has transitioned from Ok to Warning
17:01 - Environment health has transitioned from Warning to Severe
17:11 - iiif-website-proxy Ok to Warning. 1.9 % of the requests to the ELB are failing with HTTP 5xx
17:16 - iiif-website-proxy Environment health has transitioned from Warning to Ok

So what has probably happened is it went down around 16:40 sent the warning emails and then someone tried to use the validator around 17:10 which caused the website proxy errors. It would be useful if the validator continued to send emails if its in severe state rather than just on the change on state.

glenrobson commented 4 years ago

Ssh and http access to the instance is failing. Trying to get the logs through Elastic Beanstalk is also failing. Status check for the instance both say the instance is OK. It currently has two checks:

System Status Checks - checks network and power
Instance Status Checks - checks operating system is receiving traffic.

and both say they are ok...

glenrobson commented 4 years ago

It would be nice if Elastic Beanstalk created a new instance if the 1 instance is failing... It looks like this might be possible with:

https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/environmentconfig-autoscaling-healthchecktype.html

Note in the description it mentions:

"Amazon EC2 status checks only cover an instance's health, not the health of your application, server, or any Docker containers running on the instance. If your application crashes, but the instance that it runs on is still healthy, it may be kicked out of the load balancer, but Auto Scaling won't replace it automatically. The default behavior is good for troubleshooting. If Auto Scaling replaced the instance as soon as the application crashed, you might not realize that anything went wrong, even if it crashed quickly after starting up. "

Which I think is happening here. The application is failing but this isn't enough to reboot the application...

glenrobson commented 4 years ago

To fix it for now I am going to reboot the instance...

glenrobson commented 4 years ago

So reboot didn't work. Had to terminate but should have tried stop first. Terminate removes the instance and it can't be restarted. Luckily Elastic beanstalk created a new instance and this seems to be working correctly.

Longer term it would be good to look into if Elastic beanstalk could auto deploy a new instance on it reaching severe for a period of time (maybe 1 hour to avoid deployment complications). I will monitor this and see if this is happening regularly...

glenrobson commented 4 years ago

Reported to AWS that the status checks for the instance seem to be failing as there is something seriously wrong with the instance and this should be picked up. Its more of an application issue if you can't get the logs or ssh into it.

If this starts happening more often then implementing https://github.com/IIIF/image-validator/issues/83 would be a solution.