Open MartinNowak opened 8 years ago
Do you already monitor uptime?
I get an e-mail whenever something is not accessible. Currently the only regular reason for this is a failure in DMDs exception handling code, which happens in the reverse proxy that sits in front of the registry process. Uptime is typically >99.9% - the usual CI build failures seem to be caused by GitHub failures instead.
How long would it take to move this to another server if the current one is dying?
The process would be to copy the last DB backup and to run the dub-registry process (+ setup a reverse proxy rule in the main web server), so ideally it should be a matter of minutes.
At which size will we need a failover solution and db replication?
I think replication, at least in a simple master-slave setting, should really be done now, even if it wouldn't help to combat GitHub downtime. Once we grow considerably (factor of 10?), I'd say DB level replication and running multiple dub-registry instances behind a load balancer starts to make sense. Right now it would just produce administration/cost overhead with no practical benefit over a simple master/slave solution, which also has the benefit of offering the possibility for federation.
now that dub is deployed on new servers and doesn't crash that much anymore I think we can close this
I think we can close this
Now that you’re a member of this repo, maybe you could to look into this yourself, because otherwise it won’t happen anytime soon, I guess…
With the growing importance of code.dlang.org we should start to think about reliability. Do you already monitor uptime? How long would it take to move this to another server if the current one is dying? At which size will we need a failover solution and db replication?
Given the amount of work and the little server resources we need, this might be a use-case for managed hosting.