Open eric0fw opened 4 months ago
Hey @eric0fw,
Thanks for your kind words :) We are using the library in production and it is not causing any problems.
The additional supervision tree check obviously makes sense for environments running in a k8 cluster. Depending on the complexity of the application, checking just the last process (child) may not be sufficient to consider the application ready:
Supervisor.which_children/1
returns a status of :restarting
instead of pid for such a process.GenServer.init/1
returns :ignore
, initialisation of the supervision tree continues but the process is ignored:
Returning :ignore will cause start_link/3 to return :ignore and the process will exit normally without entering the loop or calling terminate/2. If used when part of a supervision tree the parent supervisor will not fail to start nor immediately try to restart the GenServer. The remainder of the supervision tree will be started and so the GenServer should not be required by other processes.
source: https://hexdocs.pm/elixir/1.17.2/GenServer.html#c:init/1
In both of these situations I would consider the supervision tree not ready, especially if all processes are required to run the app. Nevertheless, the problem is interesting and we could look into it.
@andrzej-mag I would be happy to hear your thoughts on this topic.
I am sorry for late reply. I didn't receive any notification and I wasn't aware of this discussion.
Application supervision trees are created from stable and trusted processes usually. If some supervised OTP process would crash infrequently, process would be restarted so quickly that it would be way below any orchestration platform time resolution threshold. If supervised process would crash continuously, application supervisor should terminate itself and application - no need for any additional health-checks. I don't see any practical value in health-checking application supervision tree or single tree processes. Situation is different with health-checking unsupervised external dependencies which can fail due to uncontrolled reasons (eg. network issue) for extended time.
To health-check some important OTP process you can use for example is_pid(Process.whereis(:some_important_process))
in the check callback implementation. If checked OTP process is supervised, supervisor restart intensity and period in reference to orchestration timeouts should be considered probably.
Existence
should be started as a last process in the supervision tree as all health-checked dependencies should be started first anyway. Library will act then as a "terminator" process mentioned in the @eric0fw comment.
To use Existence
as a readiness check with dependencies requiring substantial time to initialize you can set initial_delay
in the check configuration.
Hi,
First, I want to thank you for developing Existence. I have been exploring its features and it seems like an excellent tool for managing health checks in Elixir applications.
I have a suggestion for a predefined check that I believe could enhance the library: a built-in check to ensure that the entire supervision tree has fully started, utilizing a "terminator" process or similar mechanism. This would be particularly useful for readiness probes in Kubernetes deployments.
Proposed Feature:
Benefits:
Considerations:
I appreciate your time and consideration of this suggestion. Looking forward to hearing your thoughts!
Best regards,