Consensys / teku

Open-source Ethereum consensus client written in Java
https://consensys.io/teku
Apache License 2.0
685 stars 291 forks source link

Add metrics to REST endpoints #6171

Open lucassaldanha opened 2 years ago

lucassaldanha commented 2 years ago

It would be great to have metrics for all Teku's REST API endpoints. One use-case is to help monitor Beacon Nodes that are serving multiple Validator Clients. This data can also help any efforts to identify methods that need a performance boost.

For every REST API method in Teku, it creates a datapoint with the method_name, source_ip and reponse_time, etc. (we can have more data if needed, e.g. url params).

It is important to consider the API identifier for the datapoints, not the URL that was used. E.g. ie /eth/v1/whatever?foo and /eth/v1/whatever?bar should both just be labelled /eth/v1/whatever.

ajsutton commented 2 years ago

We won't be able to capture the source ip in metrics because each ip would be a new time series in prometheus so it would get very expensive very fast.

Probably the starting point is just a counter to record the number of calls to each endpoint.

Capturing the duration is always more interesting - we could just record the total processing time for that method as a counter which would allow calculating the average and maybe see spikes but it loses a lot of info. I'd be worried about using a histogram type of thing because that's about 4 or 5 timeseries for each api method which gets to be quite a lot. We're probably better off doing that as an access log file type of thing - particularly if we use structured logs so the duration is a particular field in the log message and can be easily parsed or setup a log4j config to record them in a database. That would be a separate ticket to this. :)