Fortify-Labs / status

📈 Uptime monitor and status page for Fortify, powered by @upptime
https://status.fortify.gg
MIT License
0 stars 0 forks source link

🛑 API is down #4

Closed ThomasK33 closed 3 years ago

ThomasK33 commented 3 years ago

In 898fbeb, API (https://api.fortify.gg/graphql?query=%7Bversion%7D) was down:

ThomasK33 commented 3 years ago

Downtime caused by Kafka heartbeats timing out due to slow writes / write timeouts to influxdb

ThomasK33 commented 3 years ago

Kafak redeployment with adjusted timeouts should resolve this issue.

ThomasK33 commented 3 years ago

After further investigation it turned out that this issue was caused by Influxdb either taking a long time to complete the write or not accepting the write and completely timing out.

Thus each service writing to Influxdb was taking huge amount of time until it would time out, meanwhile the Kafka broker interpreted this as a consumer going stale (either a live lock or not responding at all).

After deactivating any writes to influx, the issue did not show up anymore.