Closed ruflin closed 4 years ago
@sorantis
If we want to switch to ceph-mgr
, it's worth considering a Prometheus plugin. See: https://docs.ceph.com/docs/master/mgr/prometheus/
It provides pool, OSD metadata series, disc statistics. It's supported since the Luminous release.
If we agree to switch to Prometheus endpoint, I need some guidance on deprecating existing implementation
At the moment I will proceed with a new metricset
(see below)cephmgr
that with use Prometheus metrics endpoint.
The existing implementation should be still valid for older versions of Ceph. Newer versions that have ceph-mgr could be handled by a separate metricset. Using Prometheus here is an attractive option, but I'd stick to native APIs wherever possible for several reasons:
My recommendation would be to use native APIs wherever possible.
It seems that we responded in the same time...
according to what we discussed offline, let's try to stick to native APIs as Prometheus module is not enabled by default.
After talking more with @mtojek about this, it seems that the right way would be to use the mgr's restful
module instead of prometheus
due to the points listed above, but also due to security. Prometheus endpoints at the moment don't support secure communication, which means that in case of building implementation on the prometheus
module, for secure communication Metricbeat has to be deployed locally and configured with TLS. With restful
there’s no such limitation - Metricbeat can be deployed on another node and restful
can be configured to use TLS.
@sorantis I booted up a demo Ceph cluster to review restful resources. To be honest, most of data exposed via endpoint is rather configuration than exact metrics.
I'm afraid it might be hard for end-user to conclude the cluster health state and available storage.
Apart from that, there is one resource that gives you a valid (but also too deep) information is /perf:
Just updating the thread. We had a discussion with @sorantis and will go with /request
API resource which internally calls and returns same output as ceph
command (e.g. ceph status
, ceph df
).
Sample call/output:
>>> command='df'
>>> requests.post('https://host:port/request?wait=1', json={'prefix': command, 'format': 'json'}, auth=("demo", "password")).json()
{u'waiting': [], u'has_failed': False, u'state': u'success', u'is_waiting': False, u'running': [], u'failed': [], u'finished': [{u'outb': u'{"stats":{"total_bytes":10737418240,"total_avail_bytes":9621471232,"total_used_bytes":42205184,"total_used_raw_bytes":1115947008,"total_used_raw_ratio":0.10393066704273224,"num_osds":1,"num_per_pool_osds":1},"stats_by_class":{},"pools":[{"name":"rbd","id":1,"stats":{"stored":0,"objects":0,"kb_used":0,"bytes_used":0,"percent_used":0,"max_avail":9084600320}},{"name":"cephfs_data","id":2,"stats":{"stored":0,"objects":0,"kb_used":0,"bytes_used":0,"percent_used":0,"max_avail":9084600320}},{"name":"cephfs_metadata","id":3,"stats":{"stored":2286,"objects":22,"kb_used":512,"bytes_used":524288,"percent_used":5.7708399253897369e-05,"max_avail":9084600320}},{"name":".rgw.root","id":4,"stats":{"stored":2398,"objects":6,"kb_used":384,"bytes_used":393216,"percent_used":4.3281925172777846e-05,"max_avail":9084600320}},{"name":"default.rgw.control","id":5,"stats":{"stored":0,"objects":8,"kb_used":0,"bytes_used":0,"percent_used":0,"max_avail":9084600320}},{"name":"default.rgw.meta","id":6,"stats":{"stored":1173,"objects":7,"kb_used":384,"bytes_used":393216,"percent_used":4.3281925172777846e-05,"max_avail":9084600320}},{"name":"default.rgw.log","id":7,"stats":{"stored":0,"objects":176,"kb_used":0,"bytes_used":0,"percent_used":0,"max_avail":9084600320}},{"name":"default.rgw.buckets.index","id":8,"stats":{"stored":0,"objects":2,"kb_used":0,"bytes_used":0,"percent_used":0,"max_avail":9084600320}},{"name":"default.rgw.buckets.data","id":9,"stats":{"stored":37122728,"objects":21,"kb_used":36480,"bytes_used":37355520,"percent_used":0.0040951217524707317,"max_avail":9084600320}},{"name":"default.rgw.buckets.non-ec","id":10,"stats":{"stored":0,"objects":0,"kb_used":0,"bytes_used":0,"percent_used":0,"max_avail":9084600320}}]}\n', u'outs': u'', u'command': u'df format=json'}], u'is_finished': True, u'id': u'140124650075600'}
I'm working on the following metricsets (metricset ~ ceph command):
mgr_cluster_health ~ ceph status
mgr_cluster_disk ~ ceph df
mgr_osd_disk ~ ceph osd df
mgr_osd_pool_stats ~ ceph osd pool stats
mgr_osd_perf ~ ceph osd perf
mgr_osd_tree ~ ceph osd tree
The mgr
prefix suggests that these metricsets are compatible with Ceph Manager Daemon (https://docs.ceph.com/docs/master/mgr/).
Module updated to use new API. PRs merged. Resolving.
Hi @mtojek : I'm looking at the cherry-pick for #16254 and I can't find the changes for the mgr_osddisk. /go/src/github.com/elastic/beats/metricbeat/module/ceph# ls -lrt | grep mgr drwxr-xr-x 3 root root 137 Feb 26 14:57 mgr_cluster_disk drwxr-xr-x 3 root root 125 Feb 26 14:57 mgr_osd_perf drwxr-xr-x 3 root root 143 Feb 26 14:57 mgr_cluster_health drwxr-xr-x 3 root root 143 Feb 26 14:57 mgr_osd_pool_stats drwxr-xr-x 3 root root 128 Feb 26 14:57 mgr_pool_disk drwxr-xr-x 3 root root 125 Feb 26 14:57 mgr_osd_tree
All the other metricset are present except for the mgr_osd_disk. should we fall back to osd_df?
Hi @mtojek : I'm looking at the cherry-pick for #16254 and I can't find the changes for the mgr_osddisk. /go/src/github.com/elastic/beats/metricbeat/module/ceph# ls -lrt | grep mgr drwxr-xr-x 3 root root 137 Feb 26 14:57 mgr_cluster_disk drwxr-xr-x 3 root root 125 Feb 26 14:57 mgr_osd_perf drwxr-xr-x 3 root root 143 Feb 26 14:57 mgr_cluster_health drwxr-xr-x 3 root root 143 Feb 26 14:57 mgr_osd_pool_stats drwxr-xr-x 3 root root 128 Feb 26 14:57 mgr_pool_disk drwxr-xr-x 3 root root 125 Feb 26 14:57 mgr_osd_tree
All the other metricset are present except for the mgr_osd_disk. should we fall back to osd_df?
Hi! It's renamed to mgr_pool_disk
(https://github.com/elastic/beats/pull/16254#discussion_r380244077).
Thank you @mtojek. I must have missed this comment :).
Hi folks, just for you to know: at Ceph project we're planning to deprecate soon the restful
API you're relying on here.
The alternatives would either be the fine-grained Ceph Dashboard REST API (more of a management API, so probably not the best for you) or the Prometheus exporter (which gives you all the metrics in a single shot).
@epuertat thanks for letting us know. We did consider Prometheus exporter earlier, but decided to stick to the native API capabilities. We'll need to revisit this. Which release are you planning to remove the restful API from?
@sorantis: v17 (codenamed Quincy) to be released by first half of 2022. Please let us know if you need any guidance on this.
@epuertat good to know. Any plans to support Prometheus endpoint natively? AFAIK today the user will have to manually enable the exporter via ceph mgr module enable prometheus
.
cc @akshay-saraswat
@sorantis, no plans to change that. The Prometheus exporter is embedded inside a Ceph service. It's probably the reference 'metrics agent' for the Ceph project (others are less maintained, like influx, telegraf, zabbix, ...).
The main downside I see there is that it only supports plain-text HTTP, but if you really need HTTPS, it wouldn't be that hard to get that change in [ceph-dashboard sample HTTPS Cherrypy config].
ceph-rest-api
is replaced byceph-mgr
in newer releases http://docs.ceph.com/docs/luminous/mgr/restful/# See https://github.com/elastic/beats/pull/7661#issuecomment-406651024 for additional details.