Add metrics to REST endpoints

Consensys / teku

Open-source Ethereum consensus client written in Java

Apache License 2.0

685 stars 291 forks source link

It would be great to have metrics for all Teku's REST API endpoints. One use-case is to help monitor Beacon Nodes that are serving multiple Validator Clients. This data can also help any efforts to identify methods that need a performance boost.

For every REST API method in Teku, it creates a datapoint with the method_name, source_ip and reponse_time, etc. (we can have more data if needed, e.g. url params).

It is important to consider the API identifier for the datapoints, not the URL that was used. E.g. ie /eth/v1/whatever?foo and /eth/v1/whatever?bar should both just be labelled /eth/v1/whatever.

We won't be able to capture the source ip in metrics because each ip would be a new time series in prometheus so it would get very expensive very fast.

Probably the starting point is just a counter to record the number of calls to each endpoint.

Capturing the duration is always more interesting - we could just record the total processing time for that method as a counter which would allow calculating the average and maybe see spikes but it loses a lot of info. I'd be worried about using a histogram type of thing because that's about 4 or 5 timeseries for each api method which gets to be quite a lot. We're probably better off doing that as an access log file type of thing - particularly if we use structured logs so the duration is a particular field in the log message and can be easily parsed or setup a log4j config to record them in a database. That would be a separate ticket to this. :)

Consensys / teku

Add metrics to REST endpoints #6171