KumoCorp / kumomta

The first Open-Source high-performance MTA developed from the ground-up for high-volume email sending environments.
https://kumomta.com
Apache License 2.0
231 stars 32 forks source link

Add Support for an HTTP Endpoint that Returns the Current Status of KumoMTA #257

Open midnight-pm opened 1 month ago

midnight-pm commented 1 month ago

Hello!

As a feature request ([enhancement]), I'd like to submit for a new endpoint to be added - accessible (as an example) as http://localhost:8000/status.json:


Example Payload:

{
    "information": {
        "product": "KumoMTA"
        , "build": "2024.07.08-902a1576"
        , "uptime": "86400"
        , "status": "operational"
    },
    "environment":
    {
        "os": "Linux"
        , "kernel": "6.1.82-99.168.amzn2023.aarch64"
        , "hostname": "foo.bar.internal"
        , "architecture": "aarch64"
        , "current_time": "2024-08-22T11:53:55-0400"
    }
}

In this example, the value for ["environment"]["current_time"] is populated with an ISO 8601-compliant timestamp - represented by the value of echo $(date +%F)"T"$(date +%T)$(date +%z).


Alternative Example Payload:

{
    "information": {
        "product": "KumoMTA"
        , "build": "2024.07.08-902a1576"
        , "uptime": "86400"
        , "status": "operational"
    },
    "environment":
    {
        "os": "Linux"
        , "kernel": "6.1.82-99.168.amzn2023.aarch64"
        , "hostname": "foo.bar.internal"
        , "architecture": "aarch64"
        , "current_time": "1724341944"
    }
}

In this example, the value for ["environment"]["current_time"] is populated with the number of seconds since Epoch - represented by the value of date +%s.


I recognize that this appears to be redundant based upon being able to access similar information through something such as systemctl status kumomta, and obtaining other information through commands such as uname -a, hostname -s, /opt/kumomta/sbin/kumod --version, and the aforementioned date commands. This is for the purpose of being able to perform status checks remotely, or by using an external monitoring service as opposed to wrapping the above commands with something such as a Python script, etc.

Additionally, checking against a supplied endpoint itself allows one to apply a timeout, and to verify that the actual process is responsive and not hung which may not necessarily be represented in a timely manner by systemctl status [service].

Alternatively, the entirety of the ["environment"] object of the above exampled JSON payload can be discarded.

{
    "information": {
        "product": "KumoMTA"
        , "build": "2024.07.08-902a1576"
        , "uptime": "86400"
        , "status": "operational"
    }
}

The idea, predominantly, is just to determine the status of KumoMTA, and verify that the endpoint is responsive (HTTP response code 200), but without necessarily needing to hit and parse the metrics.json endpoint - particularly given that a non-operational endpoint (and downed process) would then result in an error being received by the client indicating a failure to connect in the form of either a timeout (curl error code number 28) or a rejected connection (curl error code number 7).

wez commented 4 weeks ago

We did recently add the /api/check-liveness/v1 endpoint which returns a 200 status when the system is up and ready to accept messages, or 503 otherwise, with a textual description for why; eg: performing initial spool enumeration, over disk, or over memory.

The process start timestamp might be handy to export via the /metrics endpoint so that prometheus can chart continuous uptime.

I think the other information is potentially useful, but I'm cagey about exposing too much system information through random network endpoints.