elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.17k stars 4.91k forks source link

Update Ceph module to support new API #7723

Closed ruflin closed 4 years ago

ruflin commented 6 years ago

ceph-rest-api is replaced by ceph-mgr in newer releases http://docs.ceph.com/docs/luminous/mgr/restful/# See https://github.com/elastic/beats/pull/7661#issuecomment-406651024 for additional details.

mtojek commented 4 years ago

@sorantis

If we want to switch to ceph-mgr, it's worth considering a Prometheus plugin. See: https://docs.ceph.com/docs/master/mgr/prometheus/ It provides pool, OSD metadata series, disc statistics. It's supported since the Luminous release.

If we agree to switch to Prometheus endpoint, I need some guidance on deprecating existing implementation

mtojek commented 4 years ago

At the moment I will proceed with a new metricset cephmgr that with use Prometheus metrics endpoint. (see below)

sorantis commented 4 years ago

The existing implementation should be still valid for older versions of Ceph. Newer versions that have ceph-mgr could be handled by a separate metricset. Using Prometheus here is an attractive option, but I'd stick to native APIs wherever possible for several reasons:

My recommendation would be to use native APIs wherever possible.

mtojek commented 4 years ago

It seems that we responded in the same time...

according to what we discussed offline, let's try to stick to native APIs as Prometheus module is not enabled by default.

sorantis commented 4 years ago

After talking more with @mtojek about this, it seems that the right way would be to use the mgr's restful module instead of prometheus due to the points listed above, but also due to security. Prometheus endpoints at the moment don't support secure communication, which means that in case of building implementation on the prometheus module, for secure communication Metricbeat has to be deployed locally and configured with TLS. With restful there’s no such limitation - Metricbeat can be deployed on another node and restful can be configured to use TLS.

mtojek commented 4 years ago

@sorantis I booted up a demo Ceph cluster to review restful resources. To be honest, most of data exposed via endpoint is rather configuration than exact metrics.

Here are some of them: /mon: ``` [ { "addr": "172.30.0.2:3300/0", "in_quorum": true, "leader": true, "name": "edcb751e8aa1", "public_addr": "172.30.0.2:3300/0", "public_addrs": { "addrvec": [ { "addr": "172.30.0.2:3300", "nonce": 0, "type": "v2" } ] }, "rank": 0, "server": "edcb751e8aa1" } ] ``` /osd: ``` [ { "cluster_addr": "172.30.0.2:6803/186", "cluster_addrs": { "addrvec": [ { "addr": "172.30.0.2:6802", "nonce": 186, "type": "v2" }, { "addr": "172.30.0.2:6803", "nonce": 186, "type": "v1" } ] }, "down_at": 20, "heartbeat_back_addr": "172.30.0.2:6807/186", "heartbeat_back_addrs": { "addrvec": [ { "addr": "172.30.0.2:6806", "nonce": 186, "type": "v2" }, { "addr": "172.30.0.2:6807", "nonce": 186, "type": "v1" } ] }, "heartbeat_front_addr": "172.30.0.2:6805/186", "heartbeat_front_addrs": { "addrvec": [ { "addr": "172.30.0.2:6804", "nonce": 186, "type": "v2" }, { "addr": "172.30.0.2:6805", "nonce": 186, "type": "v1" } ] }, "in": 1, "last_clean_begin": 4, "last_clean_end": 18, "lost_at": 0, "osd": 0, "pools": [ 1, 2, 3, 4, 5, 6, 7, 8 ], "primary_affinity": 1.0, "public_addr": "172.30.0.2:6801/186", "public_addrs": { "addrvec": [ { "addr": "172.30.0.2:6800", "nonce": 186, "type": "v2" }, { "addr": "172.30.0.2:6801", "nonce": 186, "type": "v1" } ] }, "reweight": 1.0, "server": "edcb751e8aa1", "state": [ "exists", "up" ], "up": 1, "up_from": 21, "up_thru": 21, "uuid": "eb1c8d6d-70c2-4511-a1b8-e9e7e5f624aa", "valid_commands": [ "scrub", "deep-scrub", "repair" ], "weight": 1.0 } ] ``` /pool: ``` [ { "application_metadata": {}, "auid": 0, "cache_min_evict_age": 0, "cache_min_flush_age": 0, "cache_mode": "none", "cache_target_dirty_high_ratio_micro": 600000, "cache_target_dirty_ratio_micro": 400000, "cache_target_full_ratio_micro": 800000, "create_time": "2020-02-05 17:34:09.277269", "crush_rule": 0, "erasure_code_profile": "", "expected_num_objects": 0, "fast_read": false, "flags": 1, "flags_names": "hashpspool", "grade_table": [], "hit_set_count": 0, "hit_set_grade_decay_rate": 0, "hit_set_params": { "type": "none" }, "hit_set_period": 0, "hit_set_search_last_n": 0, "last_change": "6", "last_force_op_resend": "0", "last_force_op_resend_preluminous": "0", "last_force_op_resend_prenautilus": "0", "last_pg_merge_meta": { "last_epoch_clean": 0, "last_epoch_started": 0, "ready_epoch": 0, "source_pgid": "0.0", "source_version": "0'0", "target_version": "0'0" }, "min_read_recency_for_promote": 0, "min_size": 1, "min_write_recency_for_promote": 0, "object_hash": 2, "options": {}, "pg_autoscale_mode": "warn", "pg_num": 8, "pg_num_pending": 8, "pg_num_target": 8, "pg_placement_num_target": 8, "pgp_num": 8, "pool": 1, "pool_name": "rbd", "pool_snaps": [], "quota_max_bytes": 0, "quota_max_objects": 0, "read_tier": -1, "removed_snaps": "[]", "size": 1, "snap_epoch": 0, "snap_mode": "selfmanaged", "snap_seq": 0, "stripe_width": 0, "target_max_bytes": 0, "target_max_objects": 0, "tier_of": -1, "tiers": [], "type": 1, "use_gmt_hitset": true, "write_tier": -1 }, { "application_metadata": { "cephfs": { "data": "cephfs" } }, "auid": 0, "cache_min_evict_age": 0, "cache_min_flush_age": 0, "cache_mode": "none", "cache_target_dirty_high_ratio_micro": 600000, "cache_target_dirty_ratio_micro": 400000, "cache_target_full_ratio_micro": 800000, "create_time": "2020-02-05 17:34:10.354727", "crush_rule": 0, "erasure_code_profile": "", "expected_num_objects": 0, "fast_read": false, "flags": 1, "flags_names": "hashpspool", "grade_table": [], "hit_set_count": 0, "hit_set_grade_decay_rate": 0, "hit_set_params": { "type": "none" }, "hit_set_period": 0, "hit_set_search_last_n": 0, "last_change": "7", "last_force_op_resend": "0", "last_force_op_resend_preluminous": "0", "last_force_op_resend_prenautilus": "0", "last_pg_merge_meta": { "last_epoch_clean": 0, "last_epoch_started": 0, "ready_epoch": 0, "source_pgid": "0.0", "source_version": "0'0", "target_version": "0'0" }, "min_read_recency_for_promote": 0, "min_size": 1, "min_write_recency_for_promote": 0, "object_hash": 2, "options": {}, "pg_autoscale_mode": "warn", "pg_num": 8, "pg_num_pending": 8, "pg_num_target": 8, "pg_placement_num_target": 8, "pgp_num": 8, "pool": 2, "pool_name": "cephfs_data", "pool_snaps": [], "quota_max_bytes": 0, "quota_max_objects": 0, "read_tier": -1, "removed_snaps": "[]", "size": 1, "snap_epoch": 0, "snap_mode": "selfmanaged", "snap_seq": 0, "stripe_width": 0, "target_max_bytes": 0, "target_max_objects": 0, "tier_of": -1, "tiers": [], "type": 1, "use_gmt_hitset": true, "write_tier": -1 }, { "application_metadata": { "cephfs": { "metadata": "cephfs" } }, "auid": 0, "cache_min_evict_age": 0, "cache_min_flush_age": 0, "cache_mode": "none", "cache_target_dirty_high_ratio_micro": 600000, "cache_target_dirty_ratio_micro": 400000, "cache_target_full_ratio_micro": 800000, "create_time": "2020-02-05 17:34:11.310873", "crush_rule": 0, "erasure_code_profile": "", "expected_num_objects": 0, "fast_read": false, "flags": 1, "flags_names": "hashpspool", "grade_table": [], "hit_set_count": 0, "hit_set_grade_decay_rate": 0, "hit_set_params": { "type": "none" }, "hit_set_period": 0, "hit_set_search_last_n": 0, "last_change": "8", "last_force_op_resend": "0", "last_force_op_resend_preluminous": "0", "last_force_op_resend_prenautilus": "0", "last_pg_merge_meta": { "last_epoch_clean": 0, "last_epoch_started": 0, "ready_epoch": 0, "source_pgid": "0.0", "source_version": "0'0", "target_version": "0'0" }, "min_read_recency_for_promote": 0, "min_size": 1, "min_write_recency_for_promote": 0, "object_hash": 2, "options": { "pg_autoscale_bias": 4.0, "pg_num_min": 16, "recovery_priority": 5 }, "pg_autoscale_mode": "warn", "pg_num": 8, "pg_num_pending": 8, "pg_num_target": 8, "pg_placement_num_target": 8, "pgp_num": 8, "pool": 3, "pool_name": "cephfs_metadata", "pool_snaps": [], "quota_max_bytes": 0, "quota_max_objects": 0, "read_tier": -1, "removed_snaps": "[]", "size": 1, "snap_epoch": 0, "snap_mode": "selfmanaged", "snap_seq": 0, "stripe_width": 0, "target_max_bytes": 0, "target_max_objects": 0, "tier_of": -1, "tiers": [], "type": 1, "use_gmt_hitset": true, "write_tier": -1 }, { "application_metadata": { "rgw": {} }, "auid": 0, "cache_min_evict_age": 0, "cache_min_flush_age": 0, "cache_mode": "none", "cache_target_dirty_high_ratio_micro": 600000, "cache_target_dirty_ratio_micro": 400000, "cache_target_full_ratio_micro": 800000, "create_time": "2020-02-05 17:34:13.193509", "crush_rule": 0, "erasure_code_profile": "", "expected_num_objects": 0, "fast_read": false, "flags": 1, "flags_names": "hashpspool", "grade_table": [], "hit_set_count": 0, "hit_set_grade_decay_rate": 0, "hit_set_params": { "type": "none" }, "hit_set_period": 0, "hit_set_search_last_n": 0, "last_change": "10", "last_force_op_resend": "0", "last_force_op_resend_preluminous": "0", "last_force_op_resend_prenautilus": "0", "last_pg_merge_meta": { "last_epoch_clean": 0, "last_epoch_started": 0, "ready_epoch": 0, "source_pgid": "0.0", "source_version": "0'0", "target_version": "0'0" }, "min_read_recency_for_promote": 0, "min_size": 1, "min_write_recency_for_promote": 0, "object_hash": 2, "options": {}, "pg_autoscale_mode": "warn", "pg_num": 8, "pg_num_pending": 8, "pg_num_target": 8, "pg_placement_num_target": 8, "pgp_num": 8, "pool": 4, "pool_name": ".rgw.root", "pool_snaps": [], "quota_max_bytes": 0, "quota_max_objects": 0, "read_tier": -1, "removed_snaps": "[]", "size": 1, "snap_epoch": 0, "snap_mode": "selfmanaged", "snap_seq": 0, "stripe_width": 0, "target_max_bytes": 0, "target_max_objects": 0, "tier_of": -1, "tiers": [], "type": 1, "use_gmt_hitset": true, "write_tier": -1 }, { "application_metadata": { "rgw": {} }, "auid": 0, "cache_min_evict_age": 0, "cache_min_flush_age": 0, "cache_mode": "none", "cache_target_dirty_high_ratio_micro": 600000, "cache_target_dirty_ratio_micro": 400000, "cache_target_full_ratio_micro": 800000, "create_time": "2020-02-05 17:34:14.554436", "crush_rule": 0, "erasure_code_profile": "", "expected_num_objects": 0, "fast_read": false, "flags": 1, "flags_names": "hashpspool", "grade_table": [], "hit_set_count": 0, "hit_set_grade_decay_rate": 0, "hit_set_params": { "type": "none" }, "hit_set_period": 0, "hit_set_search_last_n": 0, "last_change": "12", "last_force_op_resend": "0", "last_force_op_resend_preluminous": "0", "last_force_op_resend_prenautilus": "0", "last_pg_merge_meta": { "last_epoch_clean": 0, "last_epoch_started": 0, "ready_epoch": 0, "source_pgid": "0.0", "source_version": "0'0", "target_version": "0'0" }, "min_read_recency_for_promote": 0, "min_size": 1, "min_write_recency_for_promote": 0, "object_hash": 2, "options": {}, "pg_autoscale_mode": "warn", "pg_num": 8, "pg_num_pending": 8, "pg_num_target": 8, "pg_placement_num_target": 8, "pgp_num": 8, "pool": 5, "pool_name": "default.rgw.control", "pool_snaps": [], "quota_max_bytes": 0, "quota_max_objects": 0, "read_tier": -1, "removed_snaps": "[]", "size": 1, "snap_epoch": 0, "snap_mode": "selfmanaged", "snap_seq": 0, "stripe_width": 0, "target_max_bytes": 0, "target_max_objects": 0, "tier_of": -1, "tiers": [], "type": 1, "use_gmt_hitset": true, "write_tier": -1 }, { "application_metadata": { "rgw": {} }, "auid": 0, "cache_min_evict_age": 0, "cache_min_flush_age": 0, "cache_mode": "none", "cache_target_dirty_high_ratio_micro": 600000, "cache_target_dirty_ratio_micro": 400000, "cache_target_full_ratio_micro": 800000, "create_time": "2020-02-05 17:34:16.544549", "crush_rule": 0, "erasure_code_profile": "", "expected_num_objects": 0, "fast_read": false, "flags": 1, "flags_names": "hashpspool", "grade_table": [], "hit_set_count": 0, "hit_set_grade_decay_rate": 0, "hit_set_params": { "type": "none" }, "hit_set_period": 0, "hit_set_search_last_n": 0, "last_change": "14", "last_force_op_resend": "0", "last_force_op_resend_preluminous": "0", "last_force_op_resend_prenautilus": "0", "last_pg_merge_meta": { "last_epoch_clean": 0, "last_epoch_started": 0, "ready_epoch": 0, "source_pgid": "0.0", "source_version": "0'0", "target_version": "0'0" }, "min_read_recency_for_promote": 0, "min_size": 1, "min_write_recency_for_promote": 0, "object_hash": 2, "options": {}, "pg_autoscale_mode": "warn", "pg_num": 8, "pg_num_pending": 8, "pg_num_target": 8, "pg_placement_num_target": 8, "pgp_num": 8, "pool": 6, "pool_name": "default.rgw.meta", "pool_snaps": [], "quota_max_bytes": 0, "quota_max_objects": 0, "read_tier": -1, "removed_snaps": "[]", "size": 1, "snap_epoch": 0, "snap_mode": "selfmanaged", "snap_seq": 0, "stripe_width": 0, "target_max_bytes": 0, "target_max_objects": 0, "tier_of": -1, "tiers": [], "type": 1, "use_gmt_hitset": true, "write_tier": -1 }, { "application_metadata": { "rgw": {} }, "auid": 0, "cache_min_evict_age": 0, "cache_min_flush_age": 0, "cache_mode": "none", "cache_target_dirty_high_ratio_micro": 600000, "cache_target_dirty_ratio_micro": 400000, "cache_target_full_ratio_micro": 800000, "create_time": "2020-02-05 17:34:18.505341", "crush_rule": 0, "erasure_code_profile": "", "expected_num_objects": 0, "fast_read": false, "flags": 1, "flags_names": "hashpspool", "grade_table": [], "hit_set_count": 0, "hit_set_grade_decay_rate": 0, "hit_set_params": { "type": "none" }, "hit_set_period": 0, "hit_set_search_last_n": 0, "last_change": "16", "last_force_op_resend": "0", "last_force_op_resend_preluminous": "0", "last_force_op_resend_prenautilus": "0", "last_pg_merge_meta": { "last_epoch_clean": 0, "last_epoch_started": 0, "ready_epoch": 0, "source_pgid": "0.0", "source_version": "0'0", "target_version": "0'0" }, "min_read_recency_for_promote": 0, "min_size": 1, "min_write_recency_for_promote": 0, "object_hash": 2, "options": {}, "pg_autoscale_mode": "warn", "pg_num": 8, "pg_num_pending": 8, "pg_num_target": 8, "pg_placement_num_target": 8, "pgp_num": 8, "pool": 7, "pool_name": "default.rgw.log", "pool_snaps": [], "quota_max_bytes": 0, "quota_max_objects": 0, "read_tier": -1, "removed_snaps": "[]", "size": 1, "snap_epoch": 0, "snap_mode": "selfmanaged", "snap_seq": 0, "stripe_width": 0, "target_max_bytes": 0, "target_max_objects": 0, "tier_of": -1, "tiers": [], "type": 1, "use_gmt_hitset": true, "write_tier": -1 }, { "application_metadata": { "rgw": {} }, "auid": 0, "cache_min_evict_age": 0, "cache_min_flush_age": 0, "cache_mode": "none", "cache_target_dirty_high_ratio_micro": 600000, "cache_target_dirty_ratio_micro": 400000, "cache_target_full_ratio_micro": 800000, "create_time": "2020-02-05 17:34:20.965857", "crush_rule": 0, "erasure_code_profile": "", "expected_num_objects": 0, "fast_read": false, "flags": 1, "flags_names": "hashpspool", "grade_table": [], "hit_set_count": 0, "hit_set_grade_decay_rate": 0, "hit_set_params": { "type": "none" }, "hit_set_period": 0, "hit_set_search_last_n": 0, "last_change": "18", "last_force_op_resend": "0", "last_force_op_resend_preluminous": "0", "last_force_op_resend_prenautilus": "0", "last_pg_merge_meta": { "last_epoch_clean": 0, "last_epoch_started": 0, "ready_epoch": 0, "source_pgid": "0.0", "source_version": "0'0", "target_version": "0'0" }, "min_read_recency_for_promote": 0, "min_size": 1, "min_write_recency_for_promote": 0, "object_hash": 2, "options": {}, "pg_autoscale_mode": "warn", "pg_num": 8, "pg_num_pending": 8, "pg_num_target": 8, "pg_placement_num_target": 8, "pgp_num": 8, "pool": 8, "pool_name": "default.rgw.buckets.index", "pool_snaps": [], "quota_max_bytes": 0, "quota_max_objects": 0, "read_tier": -1, "removed_snaps": "[]", "size": 1, "snap_epoch": 0, "snap_mode": "selfmanaged", "snap_seq": 0, "stripe_width": 0, "target_max_bytes": 0, "target_max_objects": 0, "tier_of": -1, "tiers": [], "type": 1, "use_gmt_hitset": true, "write_tier": -1 } ] ``` /server: ``` [ { "ceph_version": "ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)", "hostname": "", "services": [ { "id": "14116", "type": "rbd-mirror" } ] }, { "ceph_version": "ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)", "hostname": "edcb751e8aa1", "services": [ { "id": "demo", "type": "mds" }, { "id": "edcb751e8aa1", "type": "mgr" }, { "id": "edcb751e8aa1", "type": "mon" }, { "id": "0", "type": "osd" }, { "id": "edcb751e8aa1", "type": "rgw" }, { "id": "edcb751e8aa1", "type": "rgw-nfs" } ] } ] ```

I'm afraid it might be hard for end-user to conclude the cluster health state and available storage.

Apart from that, there is one resource that gives you a valid (but also too deep) information is /perf:

Here is a sample: ``` { "mds.demo": { "mds.caps": { "description": "Capabilities", "nick": "caps", "priority": 8, "type": 2, "units": 1, "value": 0 }, "mds.dir_commit": { "description": "Directory commit", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds.dir_fetch": { "description": "Directory fetch", "priority": 5, "type": 10, "units": 1, "value": 12 }, "mds.dir_merge": { "description": "Directory merge", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds.dir_split": { "description": "Directory split", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds.exported_inodes": { "description": "Exported inodes", "nick": "exi", "priority": 8, "type": 10, "units": 1, "value": 0 }, "mds.forward": { "description": "Forwarding request", "nick": "fwd", "priority": 8, "type": 10, "units": 1, "value": 0 }, "mds.imported_inodes": { "description": "Imported inodes", "nick": "imi", "priority": 8, "type": 10, "units": 1, "value": 0 }, "mds.inode_max": { "description": "Max inodes, cache size", "priority": 5, "type": 2, "units": 1, "value": 2147483647 }, "mds.inodes": { "description": "Inodes", "nick": "inos", "priority": 10, "type": 2, "units": 1, "value": 10 }, "mds.inodes_expired": { "description": "Inodes expired", "priority": 5, "type": 2, "units": 1, "value": 0 }, "mds.inodes_pinned": { "description": "Inodes pinned", "priority": 5, "type": 2, "units": 1, "value": 10 }, "mds.inodes_with_caps": { "description": "Inodes with capabilities", "priority": 5, "type": 2, "units": 1, "value": 0 }, "mds.load_cent": { "description": "Load per cent", "priority": 5, "type": 2, "units": 1, "value": 0 }, "mds.openino_dir_fetch": { "description": "OpenIno incomplete directory fetchings", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds.reply_latency": { "count": 0, "description": "Reply latency", "nick": "rlat", "priority": 10, "type": 5, "units": 1, "value": 0 }, "mds.request": { "description": "Requests", "nick": "req", "priority": 10, "type": 10, "units": 1, "value": 0 }, "mds.root_rbytes": { "description": "root inode rbytes", "priority": 5, "type": 2, "units": 1, "value": 0 }, "mds.root_rfiles": { "description": "root inode rfiles", "priority": 5, "type": 2, "units": 1, "value": 0 }, "mds.root_rsnaps": { "description": "root inode rsnaps", "priority": 5, "type": 2, "units": 1, "value": 0 }, "mds.subtrees": { "description": "Subtrees", "priority": 5, "type": 2, "units": 1, "value": 2 }, "mds_cache.ireq_enqueue_scrub": { "description": "Internal Request type enqueue scrub", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds_cache.ireq_exportdir": { "description": "Internal Request type export dir", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds_cache.ireq_flush": { "description": "Internal Request type flush", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds_cache.ireq_fragmentdir": { "description": "Internal Request type fragmentdir", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds_cache.ireq_fragstats": { "description": "Internal Request type frag stats", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds_cache.ireq_inodestats": { "description": "Internal Request type inode stats", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds_cache.num_recovering_enqueued": { "description": "Files waiting for recovery", "nick": "recy", "priority": 8, "type": 2, "units": 1, "value": 0 }, "mds_cache.num_recovering_prioritized": { "description": "Files waiting for recovery with elevated priority", "priority": 5, "type": 2, "units": 1, "value": 0 }, "mds_cache.num_recovering_processing": { "description": "Files currently being recovered", "priority": 5, "type": 2, "units": 1, "value": 0 }, "mds_cache.num_strays": { "description": "Stray dentries", "nick": "stry", "priority": 8, "type": 2, "units": 1, "value": 0 }, "mds_cache.num_strays_delayed": { "description": "Stray dentries delayed", "priority": 5, "type": 2, "units": 1, "value": 0 }, "mds_cache.num_strays_enqueuing": { "description": "Stray dentries enqueuing for purge", "priority": 5, "type": 2, "units": 1, "value": 0 }, "mds_cache.recovery_completed": { "description": "File recoveries completed", "nick": "recd", "priority": 8, "type": 10, "units": 1, "value": 0 }, "mds_cache.recovery_started": { "description": "File recoveries started", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds_cache.strays_created": { "description": "Stray dentries created", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds_cache.strays_enqueued": { "description": "Stray dentries enqueued for purge", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds_cache.strays_migrated": { "description": "Stray dentries migrated", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds_cache.strays_reintegrated": { "description": "Stray dentries reintegrated", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds_log.ev": { "description": "Events", "nick": "evts", "priority": 8, "type": 2, "units": 1, "value": 0 }, "mds_log.evadd": { "description": "Events submitted", "nick": "subm", "priority": 8, "type": 10, "units": 1, "value": 0 }, "mds_log.evex": { "description": "Total expired events", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds_log.evexd": { "description": "Current expired events", "priority": 5, "type": 2, "units": 1, "value": 0 }, "mds_log.evexg": { "description": "Expiring events", "priority": 5, "type": 2, "units": 1, "value": 0 }, "mds_log.evtrm": { "description": "Trimmed events", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds_log.jlat": { "count": 0, "description": "Journaler flush latency", "priority": 5, "type": 5, "units": 1, "value": 0 }, "mds_log.replayed": { "description": "Events replayed", "nick": "repl", "priority": 8, "type": 10, "units": 1, "value": 1 }, "mds_log.seg": { "description": "Segments", "nick": "segs", "priority": 8, "type": 2, "units": 1, "value": 1 }, "mds_log.segadd": { "description": "Segments added", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds_log.segex": { "description": "Total expired segments", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds_log.segexd": { "description": "Current expired segments", "priority": 5, "type": 2, "units": 1, "value": 0 }, "mds_log.segexg": { "description": "Expiring segments", "priority": 5, "type": 2, "units": 1, "value": 0 }, "mds_log.segtrm": { "description": "Trimmed segments", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds_mem.cap": { "description": "Capabilities", "priority": 5, "type": 2, "units": 1, "value": 0 }, "mds_mem.cap+": { "description": "Capabilities added", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds_mem.cap-": { "description": "Capabilities removed", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds_mem.dir": { "description": "Directories", "priority": 5, "type": 2, "units": 1, "value": 12 }, "mds_mem.dir+": { "description": "Directories opened", "priority": 5, "type": 10, "units": 1, "value": 12 }, "mds_mem.dir-": { "description": "Directories closed", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds_mem.dn": { "description": "Dentries", "nick": "dn", "priority": 8, "type": 2, "units": 1, "value": 10 }, "mds_mem.dn+": { "description": "Dentries opened", "priority": 5, "type": 10, "units": 1, "value": 10 }, "mds_mem.dn-": { "description": "Dentries closed", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds_mem.heap": { "description": "Heap size", "priority": 5, "type": 2, "units": 1, "value": 332028 }, "mds_mem.ino": { "description": "Inodes", "nick": "ino", "priority": 8, "type": 2, "units": 1, "value": 13 }, "mds_mem.ino+": { "description": "Inodes opened", "priority": 5, "type": 10, "units": 1, "value": 13 }, "mds_mem.ino-": { "description": "Inodes closed", "priority": 5, "type": 10, "units": 1, "value": 0 }, "mds_server.cap_revoke_eviction": { "description": "Cap Revoke Client Eviction", "nick": "cre", "priority": 8, "type": 10, "units": 1, "value": 0 }, "mds_server.handle_client_request": { "description": "Client requests", "nick": "hcr", "priority": 8, "type": 10, "units": 1, "value": 0 }, "mds_server.handle_client_session": { "description": "Client session messages", "nick": "hcs", "priority": 8, "type": 10, "units": 1, "value": 40 }, "mds_server.handle_slave_request": { "description": "Slave requests", "nick": "hsr", "priority": 8, "type": 10, "units": 1, "value": 0 }, "mds_server.req_create_latency": { "count": 0, "description": "Request type create latency", "priority": 5, "type": 5, "units": 1, "value": 0 }, "mds_server.req_getattr_latency": { "count": 0, "description": "Request type get attribute latency", "priority": 5, "type": 5, "units": 1, "value": 0 }, "mds_server.req_getfilelock_latency": { "count": 0, "description": "Request type get file lock latency", "priority": 5, "type": 5, "units": 1, "value": 0 }, "mds_server.req_link_latency": { "count": 0, "description": "Request type link latency", "priority": 5, "type": 5, "units": 1, "value": 0 }, "mds_server.req_lookup_latency": { "count": 0, "description": "Request type lookup latency", "priority": 5, "type": 5, "units": 1, "value": 0 }, "mds_server.req_lookuphash_latency": { "count": 0, "description": "Request type lookup hash of inode latency", "priority": 5, "type": 5, "units": 1, "value": 0 }, "mds_server.req_lookupino_latency": { "count": 0, "description": "Request type lookup inode latency", "priority": 5, "type": 5, "units": 1, "value": 0 }, "mds_server.req_lookupname_latency": { "count": 0, "description": "Request type lookup name latency", "priority": 5, "type": 5, "units": 1, "value": 0 }, "mds_server.req_lookupparent_latency": { "count": 0, "description": "Request type lookup parent latency", "priority": 5, "type": 5, "units": 1, "value": 0 }, "mds_server.req_lookupsnap_latency": { "count": 0, "description": "Request type lookup snapshot latency", "priority": 5, "type": 5, "units": 1, "value": 0 }, "mds_server.req_lssnap_latency": { "count": 0, "description": "Request type list snapshot latency", "priority": 5, "type": 5, "units": 1, "value": 0 }, "mds_server.req_mkdir_latency": { "count": 0, "description": "Request type make directory latency", "priority": 5, "type": 5, "units": 1, "value": 0 }, "mds_server.req_mknod_latency": { "count": 0, "description": "Request type make node latency", "priority": 5, "type": 5, "units": 1, "value": 0 }, "mds_server.req_mksnap_latency": { "count": 0, "description": "Request type make snapshot latency", "priority": 5, "type": 5, "units": 1, "value": 0 }, ... ```
mtojek commented 4 years ago

Just updating the thread. We had a discussion with @sorantis and will go with /request API resource which internally calls and returns same output as ceph command (e.g. ceph status, ceph df).

Sample call/output:

>>> command='df'
>>> requests.post('https://host:port/request?wait=1', json={'prefix': command, 'format': 'json'}, auth=("demo", "password")).json()
{u'waiting': [], u'has_failed': False, u'state': u'success', u'is_waiting': False, u'running': [], u'failed': [], u'finished': [{u'outb': u'{"stats":{"total_bytes":10737418240,"total_avail_bytes":9621471232,"total_used_bytes":42205184,"total_used_raw_bytes":1115947008,"total_used_raw_ratio":0.10393066704273224,"num_osds":1,"num_per_pool_osds":1},"stats_by_class":{},"pools":[{"name":"rbd","id":1,"stats":{"stored":0,"objects":0,"kb_used":0,"bytes_used":0,"percent_used":0,"max_avail":9084600320}},{"name":"cephfs_data","id":2,"stats":{"stored":0,"objects":0,"kb_used":0,"bytes_used":0,"percent_used":0,"max_avail":9084600320}},{"name":"cephfs_metadata","id":3,"stats":{"stored":2286,"objects":22,"kb_used":512,"bytes_used":524288,"percent_used":5.7708399253897369e-05,"max_avail":9084600320}},{"name":".rgw.root","id":4,"stats":{"stored":2398,"objects":6,"kb_used":384,"bytes_used":393216,"percent_used":4.3281925172777846e-05,"max_avail":9084600320}},{"name":"default.rgw.control","id":5,"stats":{"stored":0,"objects":8,"kb_used":0,"bytes_used":0,"percent_used":0,"max_avail":9084600320}},{"name":"default.rgw.meta","id":6,"stats":{"stored":1173,"objects":7,"kb_used":384,"bytes_used":393216,"percent_used":4.3281925172777846e-05,"max_avail":9084600320}},{"name":"default.rgw.log","id":7,"stats":{"stored":0,"objects":176,"kb_used":0,"bytes_used":0,"percent_used":0,"max_avail":9084600320}},{"name":"default.rgw.buckets.index","id":8,"stats":{"stored":0,"objects":2,"kb_used":0,"bytes_used":0,"percent_used":0,"max_avail":9084600320}},{"name":"default.rgw.buckets.data","id":9,"stats":{"stored":37122728,"objects":21,"kb_used":36480,"bytes_used":37355520,"percent_used":0.0040951217524707317,"max_avail":9084600320}},{"name":"default.rgw.buckets.non-ec","id":10,"stats":{"stored":0,"objects":0,"kb_used":0,"bytes_used":0,"percent_used":0,"max_avail":9084600320}}]}\n', u'outs': u'', u'command': u'df format=json'}], u'is_finished': True, u'id': u'140124650075600'}
mtojek commented 4 years ago

I'm working on the following metricsets (metricset ~ ceph command):

mgr_cluster_health ~ ceph status mgr_cluster_disk ~ ceph df mgr_osd_disk ~ ceph osd df mgr_osd_pool_stats ~ ceph osd pool stats mgr_osd_perf ~ ceph osd perf mgr_osd_tree ~ ceph osd tree

The mgr prefix suggests that these metricsets are compatible with Ceph Manager Daemon (https://docs.ceph.com/docs/master/mgr/).

mtojek commented 4 years ago

Module updated to use new API. PRs merged. Resolving.

toha70 commented 4 years ago

Hi @mtojek : I'm looking at the cherry-pick for #16254 and I can't find the changes for the mgr_osddisk. /go/src/github.com/elastic/beats/metricbeat/module/ceph# ls -lrt | grep mgr drwxr-xr-x 3 root root 137 Feb 26 14:57 mgr_cluster_disk drwxr-xr-x 3 root root 125 Feb 26 14:57 mgr_osd_perf drwxr-xr-x 3 root root 143 Feb 26 14:57 mgr_cluster_health drwxr-xr-x 3 root root 143 Feb 26 14:57 mgr_osd_pool_stats drwxr-xr-x 3 root root 128 Feb 26 14:57 mgr_pool_disk drwxr-xr-x 3 root root 125 Feb 26 14:57 mgr_osd_tree

All the other metricset are present except for the mgr_osd_disk. should we fall back to osd_df?

mtojek commented 4 years ago

Hi @mtojek : I'm looking at the cherry-pick for #16254 and I can't find the changes for the mgr_osddisk. /go/src/github.com/elastic/beats/metricbeat/module/ceph# ls -lrt | grep mgr drwxr-xr-x 3 root root 137 Feb 26 14:57 mgr_cluster_disk drwxr-xr-x 3 root root 125 Feb 26 14:57 mgr_osd_perf drwxr-xr-x 3 root root 143 Feb 26 14:57 mgr_cluster_health drwxr-xr-x 3 root root 143 Feb 26 14:57 mgr_osd_pool_stats drwxr-xr-x 3 root root 128 Feb 26 14:57 mgr_pool_disk drwxr-xr-x 3 root root 125 Feb 26 14:57 mgr_osd_tree

All the other metricset are present except for the mgr_osd_disk. should we fall back to osd_df?

Hi! It's renamed to mgr_pool_disk (https://github.com/elastic/beats/pull/16254#discussion_r380244077).

toha70 commented 4 years ago

Thank you @mtojek. I must have missed this comment :).

epuertat commented 3 years ago

Hi folks, just for you to know: at Ceph project we're planning to deprecate soon the restful API you're relying on here.

The alternatives would either be the fine-grained Ceph Dashboard REST API (more of a management API, so probably not the best for you) or the Prometheus exporter (which gives you all the metrics in a single shot).

sorantis commented 3 years ago

@epuertat thanks for letting us know. We did consider Prometheus exporter earlier, but decided to stick to the native API capabilities. We'll need to revisit this. Which release are you planning to remove the restful API from?

epuertat commented 3 years ago

@sorantis: v17 (codenamed Quincy) to be released by first half of 2022. Please let us know if you need any guidance on this.

sorantis commented 3 years ago

@epuertat good to know. Any plans to support Prometheus endpoint natively? AFAIK today the user will have to manually enable the exporter via ceph mgr module enable prometheus.

cc @akshay-saraswat

epuertat commented 3 years ago

@sorantis, no plans to change that. The Prometheus exporter is embedded inside a Ceph service. It's probably the reference 'metrics agent' for the Ceph project (others are less maintained, like influx, telegraf, zabbix, ...).

The main downside I see there is that it only supports plain-text HTTP, but if you really need HTTPS, it wouldn't be that hard to get that change in [ceph-dashboard sample HTTPS Cherrypy config].