brocaar / chirpstack-application-server

ChirpStack Application Server is an open-source LoRaWAN application-server.
https://www.chirpstack.io
MIT License
499 stars 325 forks source link

Gateway status history #454

Open mmrein opened 4 years ago

mmrein commented 4 years ago

Summary

Gateway status history which can help detecting network or other issues resulting in gateway not sending data to server.

What is the use-case?

From my point of view it will be best to use a gateway stats and watch if it was updated in given timeframe or not. This usually defaults to 30s for semtech packet-forwarder, but I'm not sure if something like that works with concentratord or basic-station.

While considering and trying few other possibilities to accomplish this I came to conclusion this could quite effectively be done on app server itself as it already stores metrics in redis.

Problem is that I did not find any possibility to distinguish if current rx/tx counters has been updated or not if they are all zero.

Implementation description

Adding a new value to metrics record, something like stats counter, which would simply increase by one each time the statistics are updated. It can then be shown in graph on gw details page or read with API for external processing.

Reading that value would require adding it to GatewayStats API, which has to be done in /protobuf/as/external/api/gateway.proto#L218 and /swagger/as/external/api/gateway.swagger.json#L427 i guess?

An example:

https://github.com/mmrein/chirpstack-application-server/compare/master...mmrein:gw_status

Snímek z 2020-04-02 15-18-07

Can you implement this by yourself and make a pull request?

Mostly yes, not sure about the API part. At least I can try.

I'll wait for some feedback if you think this would be ok or not.


EDIT: Updated formatting, added screenshot.

darkfader commented 4 years ago

I wanted these metrics as well so I tried the gRPC Python API and all I got was a list of timestamps. I would like to see some metrics on gateway status too. For example if a gateway is connected via 4G and connection resets, there are missing status messages. I wrote a bash script to read the log output and can match the missing status messages with the connection log from the 4G provider. Oh, a frequency/channel histogram (up-time/down-time/retries) would be nice too.

mmrein commented 4 years ago

Yes, there are a few options to do this, I just think it could be done in appserver itself quite effectively.

Oh, a frequency/channel histogram (up-time/down-time/retries) would be nice too.

That I guess would be another story as we would first have to gather that info and store it somewhere.

sagar-patel-sls commented 3 years ago

Hi @mmrein, can we use the state interval from the gateway-profile?

mmrein commented 3 years ago

Hi @sagar-patel-sls, state interval from gw profile could (and probably should) be used to distinguish if the status count received in given period was ok or not.

Gateway state which was implemented fairly recently with dashboards wouldn't really help because AFAIK only current/last value is stored, not the history of the state.

Therefore I still think it would be the best to store simple status counter in redis with every received status message. That counter would then be retreived and compared with gw profile state to see if the connection was ok or not in given timeframe.

Quetions would be:

  1. Is it ok to implement such cunter?
  2. How to display the results with regard to interval which we would like to see? The 1 month history with 1 day resolution is ok for packet counters currently displayed, but not ideal for status.

Real life sceanrio:

We experienced some data loss from devices over the weekend. In search for reason of such loss I retrieved status history using api. Turns out there was no connection from gateways for circa 3 hours, two days ago (ISP issues).

sagar-patel-sls commented 3 years ago

Hi @mmrein ,

  1. Is it ok to implement such a counter? Yes, this state is useful if the gateway is connected via 4G.

  2. How to display the results with regard to the interval which we would like to see? The 1-month history with the 1-day resolution is ok for packet counters currently displayed, but not ideal for status. Current stat graph look good for me but we need to take suggestion from @brocaar

@brocaar can you please give your suggestion. Thanks

mmrein commented 3 years ago

As I'm thinking about the 2. again, current interval may actually be ok (or even better) for basic status overview.

In given example scenario that would just show a lower counter value that day. It would not show much detail (which would have to be searched manually) but on the other hand it gives better overview of where to search for such details.

Besides that, using for example Hour interval would give nice detail but not much history because reasonable number of values to show on graph would only be lets say 2 days. Reasonable history would have to be at least a week for which an Hour interval would return 168 values, which seems a bit too much.

Last option could be the possibility to change interval and history on the fly but that would be much more complicated to do (at least for me).

So yes, lets keep the interval unchanged for now.

mmrein commented 3 years ago

Alright, I've had few free days so I was messing around with this and I believe I have a working example. Problem is that I can't test it because I just can't get customized API to work.

Trying to link from local, I did usual repo fork, new working branch, clone to local, edit and build - which seems to work ok as I can see new fields in generated Go and Swagger structs. I then copied generated files from chirp-api/swagger/as/external/api/* to chirp-app-server/static/swagger (as seen in https://github.com/brocaar/chirpstack-application-server/issues/470#issuecomment-626698381).

Finally I added a replace to chirp-app-server's go.mod file: replace github.com/brocaar/chirpstack-api/go => /path/to/local/folder/chirpstack-api/go

Yet I'm still getting an unknown field 'StatCount' in struct literal of type api.GatewayStats, so it seems like its still sourcing the original API code instead of my customized one.

I also tried to build, upload and release in my git repo which a) is really inconvenient to make debug changes and b) just ends up in various other errors because of repo name changes and whatnot.