Closed mikelorant closed 5 months ago
I have worked with @matthope to create a preliminary implementation of this feature. Before I take the final steps to turn this into an open pull request I wanted to have a discussion if this is a feature that would be beneficial to add the Fastly exporter.
The initial implementation is based on the work done by @matthope in 2020 and required some effort to bring forward to the head of the master branch.
I then took the opportunity to refine the implementation based on his feedback and our discussions.
This means this work will need to be combined into 2 commits each attributed to the developer who added the code. As there are no contributing guidelines there is clarity required about how this work should be submitted.
The current state of the implementation is:
These are a combination of 2 stacked branches layered upon master.
The final diff report can be viewed here: https://github.com/fastly/fastly-exporter/compare/main...fairfaxmedia:fastly-exporter:feature/aggregate-datacenter-improve
Any feedback would be greatly appreciated.
Preliminary pull request #153 created.
This pull request is being split into multi pull requests.
The first change is to refactor the way labels are implemented allowing the default labels to be changed easily. See pull request #167.
Problem
Currently, when collecting stats for
201
services after running the exporter for13
days with12
shards the metrics endpoint output size is as follows:With a scrape interval of
60
seconds the bandwidth requirement becomes7,082
KB/s. In terms of storage requirements, this is424,910
KB60
mins24
hours =584
GB of raw data per day.This can cause considerable impact on Prometheus scraping performance as this is a very large payload.
Proposal
Currently, each datacenter is a label which multiplies the number of each metric. When combined with a metric that has a
status_code
label this can explode the number of metrics returned.A possible solution to reduce the output size of the metrics endpoint would be to aggregate the datacenter.
Analysis of how this might impact the output size for the earlier example is as follows:
With a scrape interval of
60
seconds the bandwidth requirement becomes127
KB/s. In terms of storage requirements, this is7,646
KB60
mins24
hours =11
GB of raw data per day.A comparison to the results with having individual datacenter metrics shows the following improvements:
A side effect of having aggregated datacenter metrics would be the memory consumption should be reduced. It it hard to determine the exact impact but there should certainly be some improvements.
Conclusion
Aggregated data center metrics would provide an option for users that wish to reduce the metrics endpoint output size. By providing this as an option (not the default) this would allow users to decide if the benefits of reducing the output size outweigh the loss of inidivual datacenter metrics.