Open lucassaldanha opened 2 years ago
We won't be able to capture the source ip in metrics because each ip would be a new time series in prometheus so it would get very expensive very fast.
Probably the starting point is just a counter to record the number of calls to each endpoint.
Capturing the duration is always more interesting - we could just record the total processing time for that method as a counter which would allow calculating the average and maybe see spikes but it loses a lot of info. I'd be worried about using a histogram type of thing because that's about 4 or 5 timeseries for each api method which gets to be quite a lot. We're probably better off doing that as an access log file type of thing - particularly if we use structured logs so the duration is a particular field in the log message and can be easily parsed or setup a log4j config to record them in a database. That would be a separate ticket to this. :)
It would be great to have metrics for all Teku's REST API endpoints. One use-case is to help monitor Beacon Nodes that are serving multiple Validator Clients. This data can also help any efforts to identify methods that need a performance boost.
For every REST API method in Teku, it creates a datapoint with the method_name, source_ip and reponse_time, etc. (we can have more data if needed, e.g. url params).
It is important to consider the API identifier for the datapoints, not the URL that was used. E.g. ie
/eth/v1/whatever?foo
and/eth/v1/whatever?bar
should both just be labelled/eth/v1/whatever
.