banzaicloud / spot-termination-exporter

Prometheus spot instance exporter to monitor AWS instance termination with Hollowtrees
Apache License 2.0
36 stars 16 forks source link

Optionally expose new metrics for AWS Spot Rebalance Recommendations #13

Open gjtempleton opened 3 years ago

gjtempleton commented 3 years ago

Is your feature request related to a problem? Please describe.

AWS now provide a new metadata endpoint to (potentially) pre-warn of likely spot interruption on instances. Metrics on these rebalance recommendations along with the time they were generated are likely to be useful for cluster operators.

Describe the solution you'd like to see

The spot termination exporter to (potentially optionally) expose a number of new metrics:

aws_instance_metadata_service_events_available     Metadata service events endpoint available
aws_instance_rebalance_recommended                 Instance rebalance is recommended
aws_instance_rebalance_recommended_at              Unix epoch rebalance recommendation was exposed at

Describe alternatives you've considered

A completely separate component scraping the relevant metadata endpoint (as it differs from the already scraped spot termination endpoint.) However this would result in running another daemonset alongside the existing one.

Additional context

I've already done most of the work to perform this scraping and metrics exposition on an internal fork of the project, happy to raise the PR to add this functionality to the wider project.

hcbraun commented 3 years ago

@gjtempleton Could you share your work? I need this feature too

gjtempleton commented 3 years ago

Hey @hcbraun, sorry, I somehow missed your comment before, I've finally raised #15 if you want to use this functionality, but given the seeming lack of activity on this repo here am prepared to maintain my fork at gjtempleton/spot-termination-exporter going forward.

hcbraun commented 3 years ago

Thanks a lot, @gjtempleton !

gjtempleton commented 3 years ago

@hcbraun Sorry, I realised on finally testing my implementation that I'd left a couple of early returns which would have meant the new metric wouldn't be exposed. I've updated this PR and published a Docker image available as ghcr.io/gjtempleton/spot-termination-exporter:0.1.1