Previously, the different threads for updating statuses, updating
runners jobs, updating metrics, and running the server were started
using forkIO. This means that failures in the threads other than the
thread for the server (which was kind of the main thread) weren't
propagated and so it was possible that the app was running seemingly
fine, but the updating thread(s) died for some reason.
To mitigate that, a manual thread observation was implemented that used
the health API endpoint to signal when not all threads were running
anymore. This worked, but only when running in an environment like
Kubernetes in combination with a liveness probe that restarts the pod
when the liveness probe fails.
I don't like that the implementation was expecting certain things from
its environment to make sure that everything works.
The new approach makes use of the async package. All threads are wrapped
in a Concrrently newtype which we can compose using the Applicative
instance and then run it in one go. If one of the threads throws, the
failure will be propagated and the app will crash. This way, the app
will stop working whenever it can't do its job properly. Also, it's way
less code now and way less self made stuff in there.
There's one little downside: If one (or more than one, if at least one
thread remains running) thread just ends without an exception, we won't
notice. This is not too bad because I'm pretty sure that all threads
will run forever unless there's an exception (which also shouldn't
happen). It's even an advantage for when the runners jobs view is
disabled: We don't need any extra handling for that, we just pure ()
and we're good to go.
Previously, the different threads for updating statuses, updating runners jobs, updating metrics, and running the server were started using
forkIO
. This means that failures in the threads other than the thread for the server (which was kind of the main thread) weren't propagated and so it was possible that the app was running seemingly fine, but the updating thread(s) died for some reason. To mitigate that, a manual thread observation was implemented that used the health API endpoint to signal when not all threads were running anymore. This worked, but only when running in an environment like Kubernetes in combination with a liveness probe that restarts the pod when the liveness probe fails. I don't like that the implementation was expecting certain things from its environment to make sure that everything works.The new approach makes use of the async package. All threads are wrapped in a
Concrrently
newtype which we can compose using the Applicative instance and then run it in one go. If one of the threads throws, the failure will be propagated and the app will crash. This way, the app will stop working whenever it can't do its job properly. Also, it's way less code now and way less self made stuff in there. There's one little downside: If one (or more than one, if at least one thread remains running) thread just ends without an exception, we won't notice. This is not too bad because I'm pretty sure that all threads will run forever unless there's an exception (which also shouldn't happen). It's even an advantage for when the runners jobs view is disabled: We don't need any extra handling for that, we justpure ()
and we're good to go.