canonical / microceph

MicroCeph is snap-deployed Ceph with built-in clustering
https://snapcraft.io/microceph
GNU Affero General Public License v3.0
212 stars 33 forks source link

ceph mon out of quorum #435

Open alherm7 opened 1 week ago

alherm7 commented 1 week ago

Issue report

What version of MicroCeph are you using ?

clusteradmin@clcray ~$ sudo snap info microceph 130 ↵
name: microceph summary: Simple clustered Ceph deployment publisher: Canonical✓ store-url: https://snapcraft.io/microceph contact: https://matrix.to/#/#ubuntu-ceph:matrix.org license: AGPL-3.0 description: | MicroCeph is the easiest way to get up and running with Ceph. It is focused on providing a modern deployment and management experience to Ceph administrators and storage software developers.

The below commands will set you up with a testing environment on a single machine using file-backed OSDs - you'll need about 15 GiB of available space on your root drive:

  sudo snap install microceph
  sudo snap refresh --hold microceph
  sudo microceph cluster bootstrap
  sudo microceph disk add loop,4G,3
  sudo ceph status

You're done!

You can remove everything cleanly with:

  sudo snap remove microceph

To learn more about MicroCeph see the documentation:

https://canonical-microceph.readthedocs-hosted.com commands:

What are the steps to reproduce this issue ?

  1. after installation of 3 nodes with microceph i create a cephfs:
    • sudo ceph osd pool create cephfs_metadata 32 32
    • sudo ceph osd pool create cephfs_data 64 64
    • sudo ceph fs new orion_cephfs001 cephfs_metadata cephfs_data
  2. Connect microceph to microk8s using the microk8s rook-ceph plugin, and link it as an external ceph cluster with 'microk8s connect-external-ceph'
  3. We have been running the ceph cluster for a few weeks without issues using both the microk8s ceph-rbd and the cephfs storage class. Suddenly the mon has fallen out of quorum.

What happens (observed behaviour) ?

Ceph mon is out of quorum. This causes microk8s to have problems with pods that have mounted something from the microceph cluster. I don't know how to fix the microceph cluster and get the mon back into quorum.

What were you expecting to happen ?

Relevant logs, error output, etc.

I have a microceph cluster and one of the nodes is suddenly out of quorum:

clusteradmin@clolympus ~$ sudo microceph.ceph status
[sudo] password for clusteradmin: cluster: id: 6e925488-0ac5-4f22-9961-28e68f3c12f5 health: HEALTH_WARN 1/3 mons down, quorum clhercules,clolympus Degraded data redundancy: 1831/5493 objects degraded (33.333%), 65 pgs degraded, 129 pgs undersized

services: mon: 3 daemons, quorum clhercules,clolympus (age 19h), out of quorum: clcray mgr: clolympus(active, since 22h), standbys: clhercules, clcray mds: 1/1 daemons up, 2 standby osd: 3 osds: 2 up (since 16h), 2 in (since 16h)

data: volumes: 1/1 healthy pools: 4 pools, 129 pgs objects: 1.83k objects, 6.4 GiB usage: 14 GiB used, 5.5 TiB / 5.5 TiB avail pgs: 1831/5493 objects degraded (33.333%) 65 active+undersized+degraded 64 active+undersized

How can i fix this? Microceph was installed via snap and i have already tried to restart it:

clusteradmin@clcray ~$ sudo snap restart microceph 1 ↵
2024-10-02T09:02:37+02:00 INFO Waiting for "snap.microceph.mds.service" to stop. Restarted. clusteradmin@clcray ~$ sudo microceph status
MicroCeph deployment summary:

Here are the mon logs from clcray:

clusteradmin@clcray ~$ tail --lines 500 /var/snap/microceph/common/logs/ceph-mon.clcray.log
-235> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: kBZip2Compression supported: 0 -234> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: kZSTDNotFinalCompression supported: 0 -233> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: kLZ4Compression supported: 1 -232> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: kZlibCompression supported: 1 -231> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: kLZ4HCCompression supported: 1 -230> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: kSnappyCompression supported: 1 -229> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Fast CRC32 supported: Supported on x86 -228> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: DMutex implementation: pthread_mutex_t -227> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: [db/version_set.cc:5527] Recovering from manifest file: /var/lib/ceph/mon/ceph-clcray/store.db/MANIFEST-012151

-226> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: [db/column_family.cc:630] --------------- Options for column family [default]:

-225> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.comparator: leveldb.BytewiseComparator -224> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.merge_operator: -223> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compaction_filter: None -222> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compaction_filter_factory: None -221> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.sst_partitioner_factory: None -220> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.memtable_factory: SkipListFactory -219> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.table_factory: BlockBasedTable -218> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: table_factory options: flush_block_policy_factory: FlushBlockBySizePolicyFactory (0x55cdeedca220) cache_index_and_filter_blocks: 1 cache_index_and_filter_blocks_with_high_priority: 0 pin_l0_filter_and_index_blocks_in_cache: 0 pin_top_level_index_and_filter: 1 index_type: 0 data_block_index_type: 0 index_shortening: 1 data_block_hash_table_util_ratio: 0.750000 checksum: 4 no_block_cache: 0 block_cache: 0x55cdeed7b350 block_cache_name: BinnedLRUCache block_cache_options: capacity : 536870912 num_shard_bits : 4 strict_capacity_limit : 0 high_pri_pool_ratio: 0.000 block_cache_compressed: (nil) persistent_cache: (nil) block_size: 4096 block_size_deviation: 10 block_restart_interval: 16 index_block_restart_interval: 1 metadata_block_size: 4096 partition_filters: 0 use_delta_encoding: 1 filter_policy: bloomfilter whole_key_filtering: 1 verify_compression: 0 read_amp_bytes_per_bit: 0 format_version: 5 enable_index_compression: 1 block_align: 0 max_auto_readahead_size: 262144 prepopulate_block_cache: 0 initial_auto_readahead_size: 8192 num_file_reads_for_auto_readahead: 2

-217> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.write_buffer_size: 33554432 -216> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.max_write_buffer_number: 2 -215> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compression: NoCompression -214> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.bottommost_compression: Disabled -213> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.prefix_extractor: nullptr -212> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.memtable_insert_with_hint_prefix_extractor: nullptr -211> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.num_levels: 7 -210> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.min_write_buffer_number_to_merge: 1 -209> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.max_write_buffer_number_to_maintain: 0 -208> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.max_write_buffer_size_to_maintain: 0 -207> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.bottommost_compression_opts.window_bits: -14 -206> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.bottommost_compression_opts.level: 32767 -205> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.bottommost_compression_opts.strategy: 0 -204> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.bottommost_compression_opts.max_dict_bytes: 0 -203> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.bottommost_compression_opts.zstd_max_train_bytes: 0 -202> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.bottommost_compression_opts.parallel_threads: 1 -201> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.bottommost_compression_opts.enabled: false -200> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.bottommost_compression_opts.max_dict_buffer_bytes: 0 -199> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.bottommost_compression_opts.use_zstd_dict_trainer: true -198> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compression_opts.window_bits: -14 -197> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compression_opts.level: 32767 -196> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compression_opts.strategy: 0 -195> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compression_opts.max_dict_bytes: 0 -194> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compression_opts.zstd_max_train_bytes: 0 -193> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compression_opts.use_zstd_dict_trainer: true -192> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compression_opts.parallel_threads: 1 -191> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compression_opts.enabled: false -190> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compression_opts.max_dict_buffer_bytes: 0 -189> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.level0_file_num_compaction_trigger: 4 -188> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.level0_slowdown_writes_trigger: 20 -187> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.level0_stop_writes_trigger: 36 -186> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.target_file_size_base: 67108864 -185> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.target_file_size_multiplier: 1 -184> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.max_bytes_for_level_base: 268435456 -183> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.level_compaction_dynamic_level_bytes: 1 -182> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.max_bytes_for_level_multiplier: 10.000000 -181> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[0]: 1 -180> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[1]: 1 -179> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[2]: 1 -178> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[3]: 1 -177> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[4]: 1 -176> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[5]: 1 -175> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[6]: 1 -174> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.max_sequential_skip_in_iterations: 8 -173> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.max_compaction_bytes: 1677721600 -172> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.ignore_max_compaction_bytes_for_input: true -171> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.arena_block_size: 1048576 -170> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.soft_pending_compaction_bytes_limit: 68719476736 -169> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.hard_pending_compaction_bytes_limit: 274877906944 -168> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.disable_auto_compactions: 0 -167> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compaction_style: kCompactionStyleLevel -166> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compaction_pri: kMinOverlappingRatio -165> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compaction_options_universal.size_ratio: 1 -164> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compaction_options_universal.min_merge_width: 2 -163> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compaction_options_universal.max_merge_width: 4294967295 -162> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compaction_options_universal.max_size_amplification_percent: 200 -161> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compaction_options_universal.compression_size_percent: -1 -160> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compaction_options_universal.stop_style: kCompactionStopStyleTotalSize -159> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compaction_options_fifo.max_table_files_size: 1073741824 -158> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.compaction_options_fifo.allow_compaction: 0 -157> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.table_properties_collectors: -156> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.inplace_update_support: 0 -155> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.inplace_update_num_locks: 10000 -154> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.memtable_prefix_bloom_size_ratio: 0.000000 -153> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.memtable_whole_key_filtering: 0 -152> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.memtable_huge_page_size: 0 -151> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.bloom_locality: 0 -150> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.max_successive_merges: 0 -149> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.optimize_filters_for_hits: 0 -148> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.paranoid_file_checks: 0 -147> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.force_consistency_checks: 1 -146> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.report_bg_io_stats: 0 -145> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.ttl: 2592000 -144> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.periodic_compaction_seconds: 0 -143> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.preclude_last_level_data_seconds: 0 -142> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.preserve_internal_time_seconds: 0 -141> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.enable_blob_files: false -140> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.min_blob_size: 0 -139> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.blob_file_size: 268435456 -138> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.blob_compression_type: NoCompression -137> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.enable_blob_garbage_collection: false -136> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.blob_garbage_collection_age_cutoff: 0.250000 -135> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.blob_garbage_collection_force_threshold: 1.000000 -134> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.blob_compaction_readahead_size: 0 -133> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.blob_file_starting_level: 0 -132> 2024-10-02T09:02:41.462+0200 77a32070ce80 4 rocksdb: Options.experimental_mempurge_threshold: 0.000000 -131> 2024-10-02T09:02:41.463+0200 77a32070ce80 4 rocksdb: [db/version_set.cc:4390] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed. -130> 2024-10-02T09:02:41.463+0200 77a32070ce80 4 rocksdb: [db/version_set.cc:5566] Recovered from manifest file:/var/lib/ceph/mon/ceph-clcray/store.db/MANIFEST-012151 succeeded,manifest_file_number is 12151, next_file_number is 12153, last_sequence is 4956585, log_number is 12146,prev_log_number is 0,max_column_family is 0,min_log_number_to_keep is 12106

-129> 2024-10-02T09:02:41.463+0200 77a32070ce80 4 rocksdb: [db/version_set.cc:5581] Column family [default] (ID 0), log number is 12146

-128> 2024-10-02T09:02:41.463+0200 77a32070ce80 4 rocksdb: [db/db_impl/db_impl_open.cc:539] DB ID: 3e4b0cb8-1208-4487-8a51-04cfcb9d6fe5

-127> 2024-10-02T09:02:41.463+0200 77a32070ce80 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1727852561464992, "job": 1, "event": "recovery_started", "wal_files": [12106, 12110, 12115, 12120, 12125, 12130, 12135, 12140, 12145, 12150]} -126> 2024-10-02T09:02:41.463+0200 77a32070ce80 4 rocksdb: [db/db_impl/db_impl_open.cc:1029] Skipping log #12106 since it is older than min log to keep #12146 -125> 2024-10-02T09:02:41.463+0200 77a32070ce80 4 rocksdb: [db/db_impl/db_impl_open.cc:1029] Skipping log #12110 since it is older than min log to keep #12146 -124> 2024-10-02T09:02:41.463+0200 77a32070ce80 4 rocksdb: [db/db_impl/db_impl_open.cc:1029] Skipping log #12115 since it is older than min log to keep #12146 -123> 2024-10-02T09:02:41.463+0200 77a32070ce80 4 rocksdb: [db/db_impl/db_impl_open.cc:1029] Skipping log #12120 since it is older than min log to keep #12146 -122> 2024-10-02T09:02:41.463+0200 77a32070ce80 4 rocksdb: [db/db_impl/db_impl_open.cc:1029] Skipping log #12125 since it is older than min log to keep #12146 -121> 2024-10-02T09:02:41.463+0200 77a32070ce80 4 rocksdb: [db/db_impl/db_impl_open.cc:1029] Skipping log #12130 since it is older than min log to keep #12146 -120> 2024-10-02T09:02:41.463+0200 77a32070ce80 4 rocksdb: [db/db_impl/db_impl_open.cc:1029] Skipping log #12135 since it is older than min log to keep #12146 -119> 2024-10-02T09:02:41.463+0200 77a32070ce80 4 rocksdb: [db/db_impl/db_impl_open.cc:1029] Skipping log #12140 since it is older than min log to keep #12146 -118> 2024-10-02T09:02:41.463+0200 77a32070ce80 4 rocksdb: [db/db_impl/db_impl_open.cc:1029] Skipping log #12145 since it is older than min log to keep #12146 -117> 2024-10-02T09:02:41.463+0200 77a32070ce80 4 rocksdb: [db/db_impl/db_impl_open.cc:1043] Recovering log #12150 mode 2 -116> 2024-10-02T09:02:41.463+0200 77a32070ce80 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1727852561465074, "job": 1, "event": "recovery_finished"} -115> 2024-10-02T09:02:41.463+0200 77a32070ce80 4 rocksdb: [db/version_set.cc:5047] Creating manifest 12156

-114> 2024-10-02T09:02:41.464+0200 77a32070ce80 4 rocksdb: [db/version_set.cc:4390] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed. -113> 2024-10-02T09:02:41.483+0200 77a32070ce80 4 rocksdb: [db/db_impl/db_impl_open.cc:1987] SstFileManager instance 0x55cdeed0ee00 -112> 2024-10-02T09:02:41.483+0200 77a32070ce80 4 rocksdb: DB pointer 0x55cdeee12000 -111> 2024-10-02T09:02:41.483+0200 77a31de00640 4 rocksdb: [db/compaction/compaction_job.cc:1995] [default] [JOB 3] Compacting 4@0 + 1@6 files to L6, score 1.00 -110> 2024-10-02T09:02:41.483+0200 77a31de00640 4 rocksdb: [db/compaction/compaction_job.cc:2001] [default]: Compaction start summary: Base version 2 Base level 0, inputs: [12107(632KB) 12099(879KB) 12091(1203KB) 12083(208KB)], [12081(14MB)]

-109> 2024-10-02T09:02:41.483+0200 77a31de00640 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1727852561484928, "job": 3, "event": "compaction_started", "compaction_reason": "LevelL0FilesNum", "files_L0": [12107, 12099, 12091, 12083], "files_L6": [12081], "score": 1, "input_data_size": 18307128, "oldest_snapshot_seqno": -1} -108> 2024-10-02T09:02:41.483+0200 77a313e00640 4 rocksdb: [db/db_impl/db_impl.cc:1109] ------- DUMPING STATS ------- -107> 2024-10-02T09:02:41.483+0200 77a313e00640 4 rocksdb: [db/db_impl/db_impl.cc:1111] DB Stats Uptime(secs): 0.0 total, 0.0 interval Cumulative writes: 0 writes, 0 keys, 0 commit groups, 0.0 writes per commit group, ingest: 0.00 GB, 0.00 MB/s Cumulative WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 GB, 0.00 MB/s Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent Interval writes: 0 writes, 0 keys, 0 commit groups, 0.0 writes per commit group, ingest: 0.00 MB, 0.00 MB/s Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 GB, 0.00 MB/s Interval stall: 00:00:0.000 H:M:S, 0.0 percent

Compaction Stats [default] Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)

L0 4/4 2.86 MB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 0.0 0.0 L6 1/1 14.60 MB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 0.0 0.0 Sum 5/5 17.46 MB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 0.0 0.0 Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 0.0 0.0

Compaction Stats [default] Priority Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)

Blob file count: 0, total size: 0.0 GB, garbage size: 0.0 GB, space amp: 0.0

Uptime(secs): 0.0 total, 0.0 interval Flush(GB): cumulative 0.000, interval 0.000 AddFile(GB): cumulative 0.000, interval 0.000 AddFile(Total Files): cumulative 0, interval 0 AddFile(L0 Files): cumulative 0, interval 0 AddFile(Keys): cumulative 0, interval 0 Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count Block cache BinnedLRUCache@0x55cdeed7b350#1266880 capacity: 512.00 MB usage: 21.44 KB table_size: 0 occupancy: 18446744073709551615 collections: 1 last_copies: 0 last_secs: 3.9e-05 secs_since: 0 Block cache entry stats(count,size,portion): FilterBlock(4,7.50 KB,0.00143051%) IndexBlock(4,13.94 KB,0.00265837%) Misc(1,0.00 KB,0%)

File Read Latency Histogram By Level [default]

-106> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa63c38) adding auth protocol: cephx -105> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa63c38) adding auth protocol: cephx -104> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa63c38) adding auth protocol: cephx -103> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa63c38) adding auth protocol: none -102> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa63c38) adding con mode: secure -101> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa63c38) adding con mode: crc -100> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa63c38) adding con mode: secure -99> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa63c38) adding con mode: crc -98> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa63c38) adding con mode: secure -97> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa63c38) adding con mode: crc -96> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa63c38) adding con mode: crc -95> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa63c38) adding con mode: secure -94> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa63c38) adding con mode: crc -93> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa63c38) adding con mode: secure -92> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa63c38) adding con mode: crc -91> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa63c38) adding con mode: secure -90> 2024-10-02T09:02:41.484+0200 77a32070ce80 2 auth: KeyRing::load: loaded key file /var/lib/ceph/mon/ceph-clcray/keyring -89> 2024-10-02T09:02:41.484+0200 77a32070ce80 0 starting mon.clcray rank 0 at public addrs [v2:10.1.5.26:3300/0,v1:10.1.5.26:6789/0] at bind addrs [v2:10.1.5.26:3300/0,v1:10.1.5.26:6789/0] mon_data /var/lib/ceph/mon/ceph-clcray fsid 6e925488-0ac5-4f22-9961-28e68f3c12f5 -88> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa64538) adding auth protocol: cephx -87> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa64538) adding auth protocol: cephx -86> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa64538) adding auth protocol: cephx -85> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa64538) adding auth protocol: none -84> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa64538) adding con mode: secure -83> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa64538) adding con mode: crc -82> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa64538) adding con mode: secure -81> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa64538) adding con mode: crc -80> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa64538) adding con mode: secure -79> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa64538) adding con mode: crc -78> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa64538) adding con mode: crc -77> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa64538) adding con mode: secure -76> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa64538) adding con mode: crc -75> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa64538) adding con mode: secure -74> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa64538) adding con mode: crc -73> 2024-10-02T09:02:41.484+0200 77a32070ce80 5 AuthRegistry(0x55cdefa64538) adding con mode: secure -72> 2024-10-02T09:02:41.485+0200 77a32070ce80 2 auth: KeyRing::load: loaded key file /var/lib/ceph/mon/ceph-clcray/keyring -71> 2024-10-02T09:02:41.485+0200 77a32070ce80 5 adding auth protocol: cephx -70> 2024-10-02T09:02:41.485+0200 77a32070ce80 5 adding auth protocol: cephx -69> 2024-10-02T09:02:41.485+0200 77a32070ce80 10 log_channel(cluster) update_config to_monitors: true to_syslog: false syslog_facility: prio: info to_graylog: false graylog_host: 127.0.0.1 graylog_port: 12201) -68> 2024-10-02T09:02:41.485+0200 77a32070ce80 10 log_channel(audit) update_config to_monitors: true to_syslog: false syslog_facility: prio: info to_graylog: false graylog_host: 127.0.0.1 graylog_port: 12201) -67> 2024-10-02T09:02:41.486+0200 77a32070ce80 1 mon.clcray@-1(???) e4 preinit fsid 6e925488-0ac5-4f22-9961-28e68f3c12f5 -66> 2024-10-02T09:02:41.486+0200 77a32070ce80 0 mon.clcray@-1(???).mds e23 new map -65> 2024-10-02T09:02:41.486+0200 77a32070ce80 0 mon.clcray@-1(???).mds e23 print_map e23 enable_multiple, ever_enabled_multiple: 1,1 default compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2} legacy client fscid: 1

Filesystem 'orion_cephfs001' (1) fs_name orion_cephfs001 epoch 17 flags 12 joinable allow_snaps allow_multimds_snaps created 2024-09-25T22:53:04.082944+0200 modified 2024-09-30T14:20:18.149682+0200 tableserver 0 root 0 session_timeout 60 session_autoclose 300 max_file_size 1099511627776 max_xattr_size 65536 required_client_features {} last_failure 0 last_failure_osd_epoch 60 compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2} max_mds 1 in 0 up {0=213836} failed damaged stopped data_pools [3] metadata_pool 2 inline_data disabled balancer bal_rank_mask -1 standby_count_wanted 1 [mds.clolympus{0:213836} state up:active seq 24 addr [v2:10.1.5.25:6800/1936689307,v1:10.1.5.25:6801/1936689307] compat {c=[1],r=[1],i=[7ff]}]

Standby daemons:

[mds.clhercules{-1:110248} state up:standby seq 1 addr [v2:10.1.5.29:6800/2437141169,v1:10.1.5.29:6801/2437141169] compat {c=[1],r=[1],i=[7ff]}] [mds.clcray{-1:247244} state up:standby seq 1 addr [v2:10.1.5.26:6800/1408146768,v1:10.1.5.26:6801/1408146768] compat {c=[1],r=[1],i=[7ff]}]

-64> 2024-10-02T09:02:41.488+0200 77a32070ce80 0 mon.clcray@-1(???).osd e95 crush map has features 3314933000852226048, adjusting msgr requires -63> 2024-10-02T09:02:41.488+0200 77a32070ce80 0 mon.clcray@-1(???).osd e95 crush map has features 288514051259236352, adjusting msgr requires -62> 2024-10-02T09:02:41.488+0200 77a32070ce80 0 mon.clcray@-1(???).osd e95 crush map has features 288514051259236352, adjusting msgr requires -61> 2024-10-02T09:02:41.488+0200 77a32070ce80 0 mon.clcray@-1(???).osd e95 crush map has features 288514051259236352, adjusting msgr requires -60> 2024-10-02T09:02:41.489+0200 77a32070ce80 1 mon.clcray@-1(???).paxosservice(auth 1..181) refresh upgraded, format 0 -> 3 -59> 2024-10-02T09:02:41.489+0200 77a32070ce80 4 mon.clcray@-1(???).mgr e0 loading version 75 -58> 2024-10-02T09:02:41.490+0200 77a32070ce80 4 mon.clcray@-1(???).mgr e75 active server: v2:10.1.5.25:6810/187845,v1:10.1.5.25:6811/187845 -57> 2024-10-02T09:02:41.490+0200 77a32070ce80 4 mon.clcray@-1(???).mgr e75 mkfs or daemon transitioned to available, loading commands -56> 2024-10-02T09:02:41.490+0200 77a32070ce80 4 set_mon_vals no callback set -55> 2024-10-02T09:02:41.490+0200 77a32070ce80 4 set_mon_vals failed to set cluster_network = 10.1.5.26/24: Configuration option 'cluster_network' may not be modified at runtime -54> 2024-10-02T09:02:41.490+0200 77a32070ce80 10 set_mon_vals osd_pool_default_crush_rule = 2 -53> 2024-10-02T09:02:41.491+0200 77a32070ce80 4 set_mon_vals no callback set -52> 2024-10-02T09:02:41.491+0200 77a32070ce80 4 set_mon_vals failed to set cluster_network = 10.1.5.26/24: Configuration option 'cluster_network' may not be modified at runtime -51> 2024-10-02T09:02:41.492+0200 77a32070ce80 2 auth: KeyRing::load: loaded key file /var/lib/ceph/mon/ceph-clcray/keyring -50> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 asok(0x55cdeefb6000) register_command compact hook 0x55cdeecb7f80 -49> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 asok(0x55cdeefb6000) register_command smart hook 0x55cdeecb7f80 -48> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 asok(0x55cdeefb6000) register_command mon_status hook 0x55cdeecb7f80 -47> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 asok(0x55cdeefb6000) register_command heap hook 0x55cdeecb7f80 -46> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 asok(0x55cdeefb6000) register_command connection scores dump hook 0x55cdeecb7f80 -45> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 asok(0x55cdeefb6000) register_command connection scores reset hook 0x55cdeecb7f80 -44> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 asok(0x55cdeefb6000) register_command sync_force hook 0x55cdeecb7f80 -43> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 asok(0x55cdeefb6000) register_command add_bootstrap_peer_hint hook 0x55cdeecb7f80 -42> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 asok(0x55cdeefb6000) register_command add_bootstrap_peer_hintv hook 0x55cdeecb7f80 -41> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 asok(0x55cdeefb6000) register_command quorum enter hook 0x55cdeecb7f80 -40> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 asok(0x55cdeefb6000) register_command quorum exit hook 0x55cdeecb7f80 -39> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 asok(0x55cdeefb6000) register_command ops hook 0x55cdeecb7f80 -38> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 asok(0x55cdeefb6000) register_command sessions hook 0x55cdeecb7f80 -37> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 asok(0x55cdeefb6000) register_command dump_historic_ops hook 0x55cdeecb7f80 -36> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 asok(0x55cdeefb6000) register_command dump_historic_slow_ops hook 0x55cdeecb7f80 -35> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 AuthRegistry(0x55cdeee13a20) adding auth protocol: cephx -34> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 AuthRegistry(0x55cdeee13a20) adding auth protocol: cephx -33> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 AuthRegistry(0x55cdeee13a20) adding auth protocol: cephx -32> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 AuthRegistry(0x55cdeee13a20) adding auth protocol: none -31> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 AuthRegistry(0x55cdeee13a20) adding con mode: secure -30> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 AuthRegistry(0x55cdeee13a20) adding con mode: crc -29> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 AuthRegistry(0x55cdeee13a20) adding con mode: secure -28> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 AuthRegistry(0x55cdeee13a20) adding con mode: crc -27> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 AuthRegistry(0x55cdeee13a20) adding con mode: secure -26> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 AuthRegistry(0x55cdeee13a20) adding con mode: crc -25> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 AuthRegistry(0x55cdeee13a20) adding con mode: crc -24> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 AuthRegistry(0x55cdeee13a20) adding con mode: secure -23> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 AuthRegistry(0x55cdeee13a20) adding con mode: crc -22> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 AuthRegistry(0x55cdeee13a20) adding con mode: secure -21> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 AuthRegistry(0x55cdeee13a20) adding con mode: crc -20> 2024-10-02T09:02:41.492+0200 77a32070ce80 5 AuthRegistry(0x55cdeee13a20) adding con mode: secure -19> 2024-10-02T09:02:41.492+0200 77a32070ce80 2 auth: KeyRing::load: loaded key file /var/lib/ceph/mon/ceph-clcray/keyring -18> 2024-10-02T09:02:41.492+0200 77a32070ce80 2 mon.clcray@-1(???) e4 init -17> 2024-10-02T09:02:41.493+0200 77a32070ce80 4 mgrc handle_mgr_map Got map version 75 -16> 2024-10-02T09:02:41.493+0200 77a32070ce80 4 mgrc handle_mgr_map Active mgr is now [v2:10.1.5.25:6810/187845,v1:10.1.5.25:6811/187845] -15> 2024-10-02T09:02:41.493+0200 77a32070ce80 4 mgrc reconnect Starting new session with [v2:10.1.5.25:6810/187845,v1:10.1.5.25:6811/187845] -14> 2024-10-02T09:02:41.494+0200 77a32070ce80 0 mon.clcray@-1(probing) e4 my rank is now 0 (was -1) -13> 2024-10-02T09:02:41.495+0200 77a31ca00640 -1 mon.clcray@0(probing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied -12> 2024-10-02T09:02:41.530+0200 77a31de00640 3 rocksdb: [db/db_impl/db_impl_compaction_flush.cc:3496] Compaction error: Corruption: block checksum mismatch: stored = 207325769, computed = 2928106427, type = 4 in /var/lib/ceph/mon/ceph-clcray/store.db/012083.sst offset 156734 size 6222 -11> 2024-10-02T09:02:41.530+0200 77a31de00640 3 rocksdb: [db/error_handler.cc:397] Background IO error Corruption: block checksum mismatch: stored = 207325769, computed = 2928106427, type = 4 in /var/lib/ceph/mon/ceph-clcray/store.db/012083.sst offset 156734 size 6222 -10> 2024-10-02T09:02:41.530+0200 77a31de00640 4 rocksdb: [db/error_handler.cc:285] ErrorHandler: Set regular background error

-9> 2024-10-02T09:02:41.530+0200 77a31de00640  4 rocksdb: (Original Log Time 2024/10/02-09:02:41.532100) [db/compaction/compaction_job.cc:865] [default] compacted to: base level 6 level multiplier 10.00 max bytes base 268435456 files[4 0 0 0 0 0 1] max score 0.00, MB/sec: 388.4 rd, 286.5 wr, level 6, files in(4, 1) out(1 +0 blob) MB in(2.9, 14.6 +0.0 blob) out(12.9 +0.0 blob), read-write-amplify(10.6) write-amplify(4.5) Corruption: block checksum mismatch: stored = 207325769, computed = 2928106427, type = 4  in /var/lib/ceph/mon/ceph-clcray/store.db/012083.sst offset 156734 size 6222, records
-8> 2024-10-02T09:02:41.530+0200 77a31de00640  4 rocksdb: (Original Log Time 2024/10/02-09:02:41.532126) EVENT_LOG_v1 {"time_micros": 1727852561532117, "job": 3, "event": "compaction_finished", "compaction_time_micros": 47130, "compaction_time_cpu_micros": 47119, "output_level": 6, "num_output_files": 1, "total_output_size": 13504176, "num_input_records": 24520, "num_output_records": 21400, "num_subcompactions": 1, "output_compression": "NoCompression", "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [4, 0, 0, 0, 0, 0, 1]}
-7> 2024-10-02T09:02:41.530+0200 77a31de00640  2 rocksdb: [db/db_impl/db_impl_compaction_flush.cc:2986] Waiting after background compaction error: Corruption: block checksum mismatch: stored = 207325769, computed = 2928106427, type = 4  in /var/lib/ceph/mon/ceph-clcray/store.db/012083.sst offset 156734 size 6222, Accumulated background error counts: 1
-6> 2024-10-02T09:02:41.696+0200 77a31ca00640 -1 mon.clcray@0(probing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
-5> 2024-10-02T09:02:41.704+0200 77a317a00640  5 mon.clcray@0(probing) e4 _ms_dispatch setting monitor caps on this connection
-4> 2024-10-02T09:02:41.704+0200 77a317a00640  1 mon.clcray@0(synchronizing) e4 sync_obtain_latest_monmap
-3> 2024-10-02T09:02:41.704+0200 77a317a00640  1 mon.clcray@0(synchronizing) e4 sync_obtain_latest_monmap obtained monmap e4
-2> 2024-10-02T09:02:41.704+0200 77a317a00640 -1 rocksdb: submit_common error: Corruption: block checksum mismatch: stored = 207325769, computed = 2928106427, type = 4  in /var/lib/ceph/mon/ceph-clcray/store.db/012083.sst offset 156734 size 6222 code =  Rocksdb transaction: 

PutCF( prefix = mon_sync key = 'latest_monmap' value size = 475) PutCF( prefix = mon_sync key = 'in_sync' value size = 8) PutCF( prefix = mon_sync key = 'last_committed_floor' value size = 8) -1> 2024-10-02T09:02:41.706+0200 77a317a00640 -1 ./src/mon/MonitorDBStore.h: In function 'int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef)' thread 77a317a00640 time 2024-10-02T09:02:41.705702+0200 ./src/mon/MonitorDBStore.h: 355: ceph_abort_msg("failed to write to db")

ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) reef (stable) 1: (ceph::__ceph_abort(char const, int, char const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)+0xd7) [0x77a3218537db] 2: (MonitorDBStore::apply_transaction(std::shared_ptr)+0xa32) [0x55cdec1596f2] 3: (Monitor::sync_start(entity_addrvec_t&, bool)+0x370) [0x55cdec111390] 4: (Monitor::handle_probe_reply(boost::intrusive_ptr)+0xa1b) [0x55cdec11c77b] 5: (Monitor::handle_probe(boost::intrusive_ptr)+0x2e9) [0x55cdec11dee9] 6: (Monitor::dispatch_op(boost::intrusive_ptr)+0xc7d) [0x55cdec12a2bd] 7: (Monitor::_ms_dispatch(Message*)+0x3e6) [0x55cdec12ac66] 8: (Dispatcher::ms_dispatch2(boost::intrusive_ptr const&)+0x4b) [0x55cdec15b0fb] 9: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr const&)+0x450) [0x77a321abac60] 10: (DispatchQueue::entry()+0x5ef) [0x77a321ab829f] 11: (DispatchQueue::DispatchThread::entry()+0x11) [0x77a321b7e041] 12: /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x77a320eaeac3] 13: /lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x77a320f40850]

 0> 2024-10-02T09:02:41.707+0200 77a317a00640 -1 *** Caught signal (Aborted) **

in thread 77a317a00640 thread_name:ms_dispatch

ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) reef (stable) 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x77a320e5c520] 2: pthread_kill() 3: raise() 4: abort() 5: (ceph::__ceph_abort(char const, int, char const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)+0x190) [0x77a321853894] 6: (MonitorDBStore::apply_transaction(std::shared_ptr)+0xa32) [0x55cdec1596f2] 7: (Monitor::sync_start(entity_addrvec_t&, bool)+0x370) [0x55cdec111390] 8: (Monitor::handle_probe_reply(boost::intrusive_ptr)+0xa1b) [0x55cdec11c77b] 9: (Monitor::handle_probe(boost::intrusive_ptr)+0x2e9) [0x55cdec11dee9] 10: (Monitor::dispatch_op(boost::intrusive_ptr)+0xc7d) [0x55cdec12a2bd] 11: (Monitor::_ms_dispatch(Message*)+0x3e6) [0x55cdec12ac66] 12: (Dispatcher::ms_dispatch2(boost::intrusive_ptr const&)+0x4b) [0x55cdec15b0fb] 13: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr const&)+0x450) [0x77a321abac60] 14: (DispatchQueue::entry()+0x5ef) [0x77a321ab829f] 15: (DispatchQueue::DispatchThread::entry()+0x11) [0x77a321b7e041] 16: /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x77a320eaeac3] 17: /lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x77a320f40850] NOTE: a copy of the executable, or objdump -rdS is needed to interpret this.

--- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 rbd_pwl 0/ 5 journaler 0/ 5 objectcacher 0/ 5 immutable_obj_cache 0/ 5 client 1/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 0 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 1 reserver 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 rgw_sync 1/ 5 rgw_datacache 1/ 5 rgw_access 1/ 5 rgw_dbstore 1/ 5 rgw_flight 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 compressor 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 1/ 5 fuse 2/ 5 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace 1/ 5 prioritycache 0/ 5 test 0/ 5 cephfs_mirror 0/ 5 cephsqlite 0/ 5 seastore 0/ 5 seastore_onode 0/ 5 seastore_odata 0/ 5 seastore_omap 0/ 5 seastore_tm 0/ 5 seastore_t 0/ 5 seastore_cleaner 0/ 5 seastore_epm 0/ 5 seastore_lba 0/ 5 seastore_fixedkv_tree 0/ 5 seastore_cache 0/ 5 seastore_journal 0/ 5 seastore_device 0/ 5 seastore_backref 0/ 5 alienstore 1/ 5 mclock 0/ 5 cyanstore 1/ 5 memstore -2/-2 (syslog threshold) -1/-1 (stderr threshold) --- pthread ID / name mapping for recent threads --- 77a313e00640 / ceph-mon 77a317a00640 / ms_dispatch 77a31ca00640 / msgr-worker-2 77a31de00640 / rocksdb:low 77a31f200640 / admin_socket 77a32070ce80 / ceph-mon max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-mon.clcray.log --- end dump of recent events ---

Additional comments.

sabaini commented 1 week ago

Hi,

From the log it seems node clcray encountered disk corruption for the rocksdb store:

-11> 2024-10-02T09:02:41.530+0200 77a31de00640 3 rocksdb: [db/error_handler.cc:397] Background IO error Corruption: block checksum mismatch: stored = 207325769, computed = 2928106427, type = 4 in /var/lib/ceph/mon/ceph-clcray/store.db/012083.sst offset 156734 size 6222

This makes the MON service on clcray abort startup.

I'd suggest to check this node (or the underlying machine in case this runs on a VM), as DB corruption often is caused by hardware issues (mem, disk e.g.)

As for Ceph cluster repair while there are tools out there to repair rocksdb files the most straightforward method would be to remove the node from the cluster and re-add it (after ensuring the HW is ok as per above).

alherm7 commented 1 week ago

Thanks @sabaini . I will check the hardware that microceph is running on. Might be a faulty mem block or a failing disk as you suggest.