In cases where there's a misconfiguration in the static configuration which leads to the portal API not starting, the UI simply shows a "Ready in a minute, waiting for portal API" (or similar), but doesn't give a hint whether there's a problem (or which problem there is).
Proposal:
At startup of the portal API, record error messages until the server has successfully started
Catch errors at startup and instead of just quitting portal-api, go into an "error mode" which behaves as follows:
Report on the /health endpoint as 200 OK "healthy", but with a payload which makes it clear that it's not (so that services are routed e.g. in Kubernetes), but don't serve anything else than this end point
Return error information as JSON on the health end point and surface these to the UI
Kill the process after 60 seconds so that it would be restarted by the orchestrator
Advantages:
The UI (wicked.portal) will display whether the portal API is not up and running, and also possibly why it is not running correctly
This has been a source of confusion for many; it's actually not the system itself which isn't working, it's usually a faulty configuration, a missing environment variable or something similar.
There has been a change to this, even though the above behavior is still not implemented. Now the portal UI also restarts if it cannot reach the API in a certain amount of time (30 seconds).
In cases where there's a misconfiguration in the static configuration which leads to the portal API not starting, the UI simply shows a "Ready in a minute, waiting for portal API" (or similar), but doesn't give a hint whether there's a problem (or which problem there is).
Proposal:
Advantages:
This has been a source of confusion for many; it's actually not the system itself which isn't working, it's usually a faulty configuration, a missing environment variable or something similar.