Closed virtualguy closed 4 months ago
@KrishnaIyer is Basic Station even sending status messages? If not, what do we do with this?
This is expected behaviour IMO.
The LoRa Basics Station LNS protocol does not support periodic status messages.
There's an open issue on the LBS repo but there isn't much going on with that afaik.
We need to wait for the protocol to add support.
@KrishnaIyer Can you use the connection state of the websocket? In the console a gateway will show as blue and connected before uplinks come through. I'm not sure how quickly this times out to change back to disconnected but it does change state eventually.
Connected/Disconnected is enough for me, perhaps that's exposed somewhere else in ttn-lw-cli?
The way stats works in V3 is that we create a GatewayConnectionStats
entry when the gateway connects and remove that entry when the gateway disconnects.
So if you check for connections stats (ttn-lw-cli get-connnection-stats
I think) and you get a 404, that means the gateway has disconnected. Else the gateway is connected.
I guess we can improve the UX on the console? But there is no additional need for using the WebSocket state as this is already taken care of in the way we handle gateway connections.
So if you check for connections stats (
ttn-lw-cli get-connnection-stats
I think) and you get a 404, that means the gateway has disconnected. Else the gateway is connected.
That's it indeed.
Right, that works. Knowing the time since disconnect would be a nice to have but I'm happy with how it is. Feel free to close the ticket
Knowing the time since disconnect would be a nice to have
That's not a bad suggestion. But this cannot be a part of the stats itself in our design. This could be done via gateway disconnection events but that's a different discussion tracked internally. Closing this for now.
Thanks for reporting @virtualguy.
I feel we need to distinguish two things:
1) Status messages as aliveness indicator
It looks like people are interpreting the last_status_received_at
as an indicator of aliveness. Coming from the connectionless world of the UDP packet forwarder, this certainly makes sense. However, with Basic Station, aliveness can be measured on the connection level using the TCP connection state. Using last_status_received_at
as an aliveness indicator for Basic Station will not make much sense in the future. Status message intervals will be configurable and could potentially be configured to very long intervals (especially on bandwidth limited links).
The most efficient way to check aliveness is the TCP keepalive mechanism. This is exactly what Basic Station uses by default. Surely, respective techniques on upper layers are feasible as well (WS ping/pong, regular app-layer status messages sent by the gateway, status queries from the LNS), but will have their respective drawbacks.
2) Status messages for operational metrics
As for the actual status message, I think no matter what set of default metrics are going to be defined, it will never satisfy the data hunger of all gateway operators. And surely, the most important metric is going to be highly gateway platform specific and there is no way a generic gateway client, like Basic Station, is expected to integrate custom code to fetch that metric via some bus from some component.
Therefore, the way Basic Station addresses the topic of status messages from day one, is via generic event
messages. Station supports the injection of arbitrary messages into the LNS protocol from the outside via named pipes (https://doc.sm.tc/station/conf.html?highlight=cmd#configuration-files). To create the named pipe (aka 'fifo'), type mkfifo cmd.fifo
in station's home directory (restarting station is required after the fifo is created). Then, during runtime any external process on the host system (with write access to the named pipe) can inject JSON messages towards the LNS, like this:
echo '{"msgtype":"event", ...}' > cmd.fifo
In practise, let's say a solar powered gateway could have a cron job firing a script which collects all the necessary metrics from the different places, then constructs a JSON-formatted event and pushes it to the LNS via:
echo '{"msgtype":"event", "type": "status", "battery": 52, "solar": 123, "temp": 4, "last_full": 4201}' > cmd.fifo
This requires that the LNS just forwards the msgtype:event
message through to some event log or MQTT topic, etc. to make it available to the application.
While I agree that a default set of internal metrics sent autonomously by Station makes a lot of sense, I think with the method described above most needs for status reporting are already satisfied. Wouldn't you agree? So, instead of waiting for the final status message format, @KrishnaIyer , maybe it's worth considering to handle msgtype:event
messages on the LNS side?
From my point of view TTS is already determining connected vs disconnected and it would be great to expose this in a consistent way across both UDP and Basic Station. I.e. just replicate the blue connected indicator from the console. I appreciate that doesn't really belong in connection-stats.
That's also really interesting info about named pipes in Basic Station. Would be great for shipping arbitrary metrics and info. Though having a standardized set of core metrics/status info would be better for consistency across vendors
(1.) Let's make sure that we properly document what the last_status
and last_status_received_at
fields mean in the API reference, and also clarify that a successful response on GetGatewayConnectionStats
means that the gateway is connected and a NotFound
error means not connected (to this cluster).
(2.) I think it would be a good idea to detect these {"msgtype":"event", ...}
messages. Perhaps we can allow gateways to send json-encoded google.protobuf.Any
messages in there. If the Gateway Server then detects a GatewayStatus
message, it will update last_status
and last_status_received_at
with that.
{
"msgtype": "event",
"payload": {
"@type": "type.googleapis.com/ttn.lorawan.v3.GatewayStatus",
"time": "0001-01-01T00:00:00Z",
"boot_time": "0001-01-01T00:00:00Z",
"metrics": {
"cpu_percentage": 67.8,
"load_1": 2.34,
"load_5": 1.23,
"load_15": 0.98,
"temp": 34.5
}
}
}
Yeah thanks @beitler for the explanation. Yeah this is certainly doable.
Summary
There appear to be missing stats for the gateways when running basic station on ttn-lw-cli. In particular something to indicate 'last seen' the same as in the web console
Steps to Reproduce
Compare the output of ttn-lw-cli for a udp and a basic station. Note that this is a Tektelic Macro
Environment
The Things Network Command-line Interface: ttn-lw-cli Version: 3.10.7 Build date: 2021-01-14T12:34:23Z Git commit: ecf52d6 Go version: go1.15.6 OS/Arch: linux/amd64
How do you propose to implement this?
...
How do you propose to test this?
I'm happy to test
Can you do this yourself and submit a Pull Request?
No