Closed glenrobson closed 4 years ago
According to Elastic Beanstalk service it went down yesterday at 17:01 UK time. It did send a notification to say it had got into a Severe
state. It sent the following emails:
So what has probably happened is it went down around 16:40 sent the warning emails and then someone tried to use the validator around 17:10 which caused the website proxy errors. It would be useful if the validator continued to send emails if its in severe state rather than just on the change on state.
Ssh and http access to the instance is failing. Trying to get the logs through Elastic Beanstalk is also failing. Status check for the instance both say the instance is OK. It currently has two checks:
and both say they are ok...
It would be nice if Elastic Beanstalk created a new instance if the 1 instance is failing... It looks like this might be possible with:
Note in the description it mentions:
"Amazon EC2 status checks only cover an instance's health, not the health of your application, server, or any Docker containers running on the instance. If your application crashes, but the instance that it runs on is still healthy, it may be kicked out of the load balancer, but Auto Scaling won't replace it automatically. The default behavior is good for troubleshooting. If Auto Scaling replaced the instance as soon as the application crashed, you might not realize that anything went wrong, even if it crashed quickly after starting up. "
Which I think is happening here. The application is failing but this isn't enough to reboot the application...
To fix it for now I am going to reboot the instance...
So reboot didn't work. Had to terminate but should have tried stop first. Terminate removes the instance and it can't be restarted. Luckily Elastic beanstalk created a new instance and this seems to be working correctly.
Longer term it would be good to look into if Elastic beanstalk could auto deploy a new instance on it reaching severe for a period of time (maybe 1 hour to avoid deployment complications). I will monitor this and see if this is happening regularly...
Reported to AWS that the status checks for the instance seem to be failing as there is something seriously wrong with the instance and this should be picked up. Its more of an application issue if you can't get the logs or ssh into it.
If this starts happening more often then implementing https://github.com/IIIF/image-validator/issues/83 would be a solution.
Reported by Régis:
" The parameters seem to be ok, but the process end up with a blank page. The url is https://iiif.io/api/image/validator/results/?server=https%3A%2F%2Fccj-iiif.huma-num.fr&prefix=iiif%2Fimage&identifier=JP2%2F67352ccc-d1b0-11e1-89ae-279075081939.jp2&version=2.0&level=1"
Following the link results in a blank page with just the title and a
Return to Validator
link. Going to the validator page:https://iiif.io/api/image/validator/
it doesn't list the available tests and checking:
http://image-validator.iiif.io
you get a blank screen. So I think the image validator Elastic Beanstalk service is down.