fastly / fastly-exporter

A Prometheus exporter for the Fastly Real-time Analytics API
Apache License 2.0
99 stars 36 forks source link

Reduce output size of metrics endpoint #152

Closed mikelorant closed 5 months ago

mikelorant commented 1 year ago

Problem

Currently, when collecting stats for 201 services after running the exporter for 13 days with 12 shards the metrics endpoint output size is as follows:

Shard Services Payload (KB)
1 18 30,792
2 25 55,378
3 16 40,243
4 15 29,123
5 21 34,345
6 22 40,100
7 10 19,948
8 19 47,790
9 11 20,234
10 15 40,499
11 19 37,366
12 19 29,092
Total 210 424,910

With a scrape interval of 60 seconds the bandwidth requirement becomes 7,082 KB/s. In terms of storage requirements, this is 424,910 KB 60 mins 24 hours = 584 GB of raw data per day.

This can cause considerable impact on Prometheus scraping performance as this is a very large payload.

Proposal

Currently, each datacenter is a label which multiplies the number of each metric. When combined with a metric that has a status_code label this can explode the number of metrics returned.

A possible solution to reduce the output size of the metrics endpoint would be to aggregate the datacenter.

Analysis of how this might impact the output size for the earlier example is as follows:

Shard Services Payload (KB)
1 17 645
2 25 934
3 16 607
4 14 531
5 21 796
6 21 786
7 10 394
8 18 686
9 10 398
10 15 582
11 19 718
12 15 569
Total 201 7,646

With a scrape interval of 60 seconds the bandwidth requirement becomes 127 KB/s. In terms of storage requirements, this is 7,646 KB 60 mins 24 hours = 11 GB of raw data per day.

A comparison to the results with having individual datacenter metrics shows the following improvements:

Datacenter Payload (KB) Rate (KB/s) Storage (Daily in GB) Reduction
Individual 424,910 7082 584
Aggregated 7,646 127 11 98%

A side effect of having aggregated datacenter metrics would be the memory consumption should be reduced. It it hard to determine the exact impact but there should certainly be some improvements.

Conclusion

Aggregated data center metrics would provide an option for users that wish to reduce the metrics endpoint output size. By providing this as an option (not the default) this would allow users to decide if the benefits of reducing the output size outweigh the loss of inidivual datacenter metrics.

mikelorant commented 1 year ago

I have worked with @matthope to create a preliminary implementation of this feature. Before I take the final steps to turn this into an open pull request I wanted to have a discussion if this is a feature that would be beneficial to add the Fastly exporter.

The initial implementation is based on the work done by @matthope in 2020 and required some effort to bring forward to the head of the master branch.

I then took the opportunity to refine the implementation based on his feedback and our discussions.

This means this work will need to be combined into 2 commits each attributed to the developer who added the code. As there are no contributing guidelines there is clarity required about how this work should be submitted.

The current state of the implementation is:

These are a combination of 2 stacked branches layered upon master.

The final diff report can be viewed here: https://github.com/fastly/fastly-exporter/compare/main...fairfaxmedia:fastly-exporter:feature/aggregate-datacenter-improve

Any feedback would be greatly appreciated.

mikelorant commented 1 year ago

Preliminary pull request #153 created.

mikelorant commented 6 months ago

This pull request is being split into multi pull requests.

The first change is to refactor the way labels are implemented allowing the default labels to be changed easily. See pull request #167.