After reviewing the smoke tests I found that during the downtime they recieved response http status code 502.
502 BAD GATEWAY are often caused by:
Server overload: the server ran out of resources and crashed, triggering an HTTP error 502. The possible reasons for this could be an unexpected spike in traffic or low memory.
Browser problems: your browser version is outdated or maybe there are corrupted files in your browser cache.
Firewall blocks: your firewall might be detecting false threats and blocking internet providers or IP addresses.
This could be due to:
High traffic: many people accessed the site at the Realfagsbiblioteks AI day. So this could be the culprit. But we had maybe even more traffic at the final presentation last semester.
AWS having problems: I looked into it and AWS says they had no issues in that timeframe
The VM had no more space: There have been some memory problems but I deleted a couple of big files so I find this unlikely.
Undetected errors with API key scheduler: I find this also hard to believe as even when the keys are not valid the docker container does not die and at least on localhost this happening several times does not kill the instance.
After reviewing the smoke tests I found that during the downtime they recieved response http status code 502.
502 BAD GATEWAY are often caused by:
This could be due to:
So even now the error can not be pinpointed.