Closed HerrLevin closed 3 years ago
There were two issues, overlapping:
Because the direkt.bahn.guru
postings on Twitter and Reddit went a bit viral, there has been and still is a lot of traffic on the stations search API (v5.db.transport.rest/locations?query=…
), the response times are higher than usual. This is due to several factors:
v5.{bvg,db,hvv,vbb}.transport.rest
APIs. The CPU is being exhausted briefly every few seconds.db-rest
-> Redis cache -> db-rest -> Caddy chain, and I haven't investigated yet what and why. On my laptop, the response time (for the same query, served from Redis) is in the low milliseconds, but on the server it's usually 25-60ms. 🤔Until yesterday, I had the Caddy load balancer configured to do active health checks against db-rest
, which in turn checks if HAFAS by querying departures at some station on the next monday. And, for a reason I don't know, the HAFAS API responds with errors every now and then, which causes the health check to fail, which causes Caddy to take the entire v5.db.transport.rest
API offline, responding with 504
. I have turned this off temporarily, because with this setup, the benefits (better stability of v5.db.transport.rest
itself) are not worth the disadvantages (frequently completely down due to random HAFAS failures).
504
s should not appear often anymore, but the overall response time is still high. I'm reluctant to move the APIs to a more powerful server though, because
v5.{bvg,db,hvv,vbb}.transport.rest
to a Kubernetes cluster run by @juliuste. I'll check if that improves the behaviour under load or not.Regardless, if you want to volunteer running a db-rest
instance, you're very much welcome to do so!
This is the status page BTW: https://stats.uptimerobot.com/57wNLs39M/784879516
Not sure if the report 10h of 404
downtime were actually an issue. 🤔
I'll close this. Please re-open if you have more questions related to this.
In the last 24h the hosted version at v5.db.transport.rest keeps going offline. Every request (except to the docs) won't return anything, it won't even timeout. My self hosted version works as expected.