Prometheus formatted /metrics or similar http endpoint for monitoring integration

HariSekhon commented 6 years ago

Feature Request to add Prometheus /metrics http endpoint for monitoring integration:

https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cscrape_config%3E

elasticmachine commented 6 years ago

Pinging @elastic/es-core-infra

jasontedor commented 6 years ago

@HariSekhon I'll confess to having effectively zero knowledge of Prometheus so please be patient with me. 🙏

I followed your documentation and did some reading and your feature request seems unnecessary since the default path is only /metrics but it can be configured to be another endpoint? From your documentation link:

# The HTTP resource path on which to fetch metrics from targets.
[ metrics_path: <path> | default = /metrics ]

Am I missing something? I'm eager to learn!

HariSekhon commented 6 years ago

Yes the url path is configurable but the format needs to follow this and not regular JSON:

https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md

jasontedor commented 6 years ago

Sure but I do not think that is necessary. My understanding, please correct me if I am wrong, is that you can use an exporter to expose the stats. An exporter does the job of reading the metrics from something like Elasticsearch, and then translating them to the format that Prometheus expects. I see that there is a community-maintained exporter at justwatchcom/elasticsearch_exporter. Can you help me understand why that isn't sufficient here?

jasontedor commented 6 years ago

No additional feedback, closing.

HariSekhon commented 6 years ago

Sorry, thought I'd replied to this....

It would be better if the right metrics format was exposed from within Elasticsearch itself by enabling a single setting in yaml.

Running a second exporter process has the following drawbacks:

More Processes - doubles the processes, one per elasticsearch node
Needless Admin - more needless automation/admin work to deploy and manage it
Security - higher security attack surface area introduced by extra running service
Efficiency - less efficient, requires double querying from intermediate exporter service and then from Prometheus so metrics may not be as fresh
Out of Date - that 3rd party exporter may become out of sync with main Elasticsearch project over time is Elasticsearch changes or adds things it doesn't support
- you can already see this happening as pull requests are being submitted to support more Elasticsearch metrics:
- https://github.com/justwatchcom/elasticsearch_exporter/pull/163
- https://github.com/justwatchcom/elasticsearch_exporter/pull/168

It would be much better to just expose the Elasticsearch metrics in prometheus format in a different url endpoint as it would avoid all of the above.

ghost commented 6 years ago

@jasontedor adding prometheus metrics would be a great feature for us. We run 10+ ELK clusters in our company and therefore need to run 10 exporters to monitor the clusters. If this was integrated into elasticsearch itself we could just scrape the data using prometheus.

tomcallahan commented 6 years ago

Reopening given additional comments.

This seems to me like something that could/should be accomplished via a plugin (community or Elastic); I'm curious how widespread demand for this would be. @jethr0null - any insight or opinion here?

jasontedor commented 6 years ago

Thanks for sharing your thoughts @HariSekhon. From a technical perspective I have a few comments:

More Processes - doubles the processes, one per elasticsearch node

I agree that there would be an additional process, but it is not a doubling (there will already be many other processes running), and it is expected that this process is lightweight. So while there is an incremental process add, it is not going to place much load on constrained resources.

Needless Admin - more needless automation/admin work to deploy and manage it

I agree.

Security - higher security attack surface area introduced by extra running service

The exporter does not need to be open for requests, or otherwise open to the world at all; I expect this risk to be de minimis.

Efficiency - less efficient, requires double querying from intermediate exporter service and then from Prometheus so metrics may not be as fresh

I am not sure if I follow this one. Some process has to query Elasticsearch to collect the stats?

Out of Date - that 3rd party exporter may become out of sync with main Elasticsearch project over time is Elasticsearch changes or adds things it doesn't support

I agree.

From our side, we would take on additional development and maintenance burden. Yet, we already have a solution in our stack for monitoring Elasticsearch. That is via the built-in monitoring exporting that we have today and the Kibana Monitoring UI, and soon will be via a Beats metricset for Elasticsearch and still using the Kibana Monitoring UI. We even provide the Elastic Stack Monitoring Service. Naturally for us we would prefer to invest our development cycles in the functionality that we are building there.

I will leave this open for now to gather additional feedback from the community, and we will discuss this in an upcoming team meeting, but I want to be upfront about where we are likely to land.

ghost commented 6 years ago

Hi @jasontedor,

The exporter does not need to be open for requests, or otherwise open to the world at all; I expect this risk to be de minimis.

The exporter needs to query elasticsearch. This is an extra user account, that needs the right permissions. So unless the exporter defines what rights the account needs most users will probably make it superuser.

The exporter does not need to be open for requests, or otherwise open to the world at all; I expect this risk to be de minimis.

agreed.

I am not sure if I follow this one. Some process has to query Elasticsearch to collect the stats?

Elasticsearch is queried by the exporter, the exporter is queried by prometheus. Say the exporter queries every 30 seconds and prometheus every minute. your metrics are at worst 1 minute and 30 seconds old. Not to speak about the extra network load. Whereas if prometheus was integrated into Elasticsearch it would be directly queried from prometheus so metrics would never be older than the interval that is configured in prometheus.

Yet, we already have a solution in our stack for monitoring Elasticsearch

If every database, http server and service provided it's own monitoring system we wouldn't be able to keep up with the chaos of maintaining all those systems. If you look at all the software that directly exposes prometheus metrics.

I understand that integrating prometheus metrics into elasticsearch itself might not be an option or something that is not wanted. But an official elasticsearch exporter would be awesome!

HariSekhon commented 6 years ago

Efficiency - less efficient, requires double querying from intermediate exporter service and then from Prometheus so metrics may not be as fresh

Lagging stats as mentioned by @rjrhaverkamp 2 chained queries (Prometheus -> Exporter -> Elasticsearch) vs 1 query (Prometheus -> Elasticsearch)

Second this:

If every database, http server and service provided it's own monitoring system we wouldn't be able to keep up with the chaos of maintaining all those systems. If you look at all the software that directly exposes prometheus metrics.

Every system's own monitoring tools means a proliferation of different dashboards to monitor which is far from ideal for an ops team.

The drive should be towards a single-pane-of-glass across all tools being the ideal as there are a lot of different technologies in most modern IT departments.

It would be far more efficient if Elasticsearch nodes contained an option to enable http endpoint for Prometheus to scrape directly, add-on exporters feel like a retroactive hack for systems that don't yet have native Prometheus support. In the future I expect that systems that do not incorporate this endpoint natively will be the odd ones out as there seems to be a widespread drive towards Prometheus adoption.

jasontedor commented 6 years ago

Every system's own monitoring tools means a proliferation of different dashboards to monitor which is far from ideal for an ops team.

The drive should be towards a single-pane-of-glass across all tools being the ideal as there are a lot of different technologies in most modern IT departments.

Ours is intended to be a collection and monitoring system for a wide range of software, not just Elasticsearch. We have collectors and dashboards for many common stack components like Kafka, Redis, etc.

jasontedor commented 6 years ago

We discussed this in our weekly team discussion meeting and agreed that we are not going to support Prometheus natively. We are going to continue to invest our effort in Beats/Kibana/Watcher.

elastic / elasticsearch

Prometheus formatted /metrics or similar http endpoint for monitoring integration #32129