Open inspur-wyq opened 1 year ago
kernel crash when rm core from cache
open-cas-linux-21.06.5.0555.release
[ 6365.608652] [Open-CAS] Adding device /dev/disk/by-id/dm-name-ceph--8c95bf7f--b3a6--4457--89aa--0c351eabcf26-osd--block--446ee347--3361--4c97--9d8e--b5135487b9bb as core core3 to cache cache1 [ 6370.921584] VFS: Open an exclusive opened block device for write sdd. current [129139 sgdisk]. parent [5935 rook] [ 6370.953674] VFS: Open an exclusive opened block device for write dm-2. current [129143 sgdisk]. parent [5935 rook] [ 6371.009815] VFS: Open an exclusive opened block device for write sdb. current [129157 sgdisk]. parent [5935 rook] [ 6371.043087] VFS: Open an exclusive opened block device for write dm-1. current [129163 sgdisk]. parent [5935 rook] [ 6371.099131] VFS: Open an exclusive opened block device for write sde. current [129179 sgdisk]. parent [5935 rook] [ 6371.164891] VFS: Open an exclusive opened block device for write sdc. current [129192 sgdisk]. parent [5935 rook] [ 6371.198044] VFS: Open an exclusive opened block device for write dm-0. current [129201 sgdisk]. parent [5935 rook] [ 6371.252286] VFS: Open an exclusive opened block device for write sda. current [129214 sgdisk]. parent [8134 rook] [ 6449.304742] VFS: Open an exclusive opened block device for write sdd. current [130576 sgdisk]. parent [11689 rook] [ 6449.333058] VFS: Open an exclusive opened block device for write dm-2. current [130580 sgdisk]. parent [11689 rook] [ 6449.379080] VFS: Open an exclusive opened block device for write sdb. current [130593 sgdisk]. parent [11689 rook] [ 6449.408518] VFS: Open an exclusive opened block device for write dm-1. current [130600 sgdisk]. parent [8132 rook] [ 6449.461321] VFS: Open an exclusive opened block device for write sde. current [130614 sgdisk]. parent [8132 rook] [ 6449.527263] VFS: Open an exclusive opened block device for write sdc. current [130631 sgdisk]. parent [5935 rook] [ 6449.556086] VFS: Open an exclusive opened block device for write dm-0. current [130637 sgdisk]. parent [5935 rook] [ 6449.576621] VFS: Open an exclusive opened block device for write sda. current [130641 sgdisk]. parent [5935 rook] [ 6449.856675] VFS: Open an write opened block device exclusively cas1-3. current [130648 ceph-volume]. parent [5935 rook] [ 6471.823245] watchdog: BUG: soft lockup - CPU#44 stuck for 22s! [cas_mngt_1:110582] [ 6471.824117] Modules linked in: cas_cache(OE) cas_disk(OE) ipt_rpfilter xt_set xt_multiport iptable_raw ip_set_hash_ip ip_set_hash_net ip_set ipip tunnel4 ip_tunnel veth xt_statistic xt_nat xt_addrtype ip6table_nat ip6_tables iptable_mangle xt_physdev xt_conntrack xt_comment xt_mark iptable_filter nf_conntrack_netlink nfnetlink sch_ingress iptable_nat xt_MASQUERADE ip_tables rbd ceph libceph dns_resolver overlay openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c 8021q garp mrp bonding vfat fat dm_multipath ipmi_ssif aes_ce_blk aes_ce_cipher ghash_ce sha1_ce ses enclosure ngbe sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel br_netfilter bridge stp llc dm_mod fuse ext4 mbcache jbd2 sd_mod t10_pi ahci sha2_ce libahci sha256_arm64 igb libata ixgbe ast smartpqi drm_vram_helper drm_ttm_helper scsi_transport_sas ttm mdio aes_neon_bs aes_neon_blk crypto_simd cryptd [last unloaded: cas_disk] [ 6471.824255] CPU: 44 PID: 110582 Comm: cas_mngt_1 Kdump: loaded Tainted: G OE 5.10.0-136.37.0.113.oe2203sp1.aarch64 #1 [ 6471.824258] Hardware name: Inspur CS5260F /YZMB-02006-101 , BIOS 4.0.8 12/22/21 16:05:48 [ 6471.824263] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) [ 6471.824278] pc : downgrade_write+0x2d0/0x350 [ 6471.824343] lr : ocf_hb_id_prot_unlock_wr+0x64/0x90 [cas_cache] [ 6471.824346] sp : ffff8000268dbcc0 [ 6471.824349] x29: ffff8000268dbcc0 x28: 000000000dd3c376 [ 6471.824353] x27: ffff8000282b6c80 x26: ffffbcc7d75d5ab8 [ 6471.824358] x25: 0000000000000003 x24: ffff8000283f7040 [ 6471.824362] x23: 000000000535b974 x22: 0000000000000001 [ 6471.824366] x21: ffff800aff489d00 x20: ffff8000282b6c80 [ 6471.824370] x19: 00000000000000e8 x18: 0000000000000000 [ 6471.824373] x17: 0000000000000000 x16: ffffbcc84677f060 [ 6471.824377] x15: 0000000000000000 x14: 0000000000000000 [ 6471.824380] x13: 0000000000000000 x12: 0000000000000000 [ 6471.824384] x11: 0000001587515ebc x10: 0000000000000000 [ 6471.824387] x9 : ffffbcc7d75baf24 x8 : 0000000000000000 [ 6471.824391] x7 : ffff1e063ff82740 x6 : ffffbcc847ce7740 [ 6471.824394] x5 : ffff8000268dbc80 x4 : 0000000000000000 [ 6471.824398] x3 : 0000000000000000 x2 : ffffffffffffff00 [ 6471.824401] x1 : 0000000000000000 x0 : ffff8000282b6d40 [ 6471.824405] Call trace: [ 6471.824410] downgrade_write+0x2d0/0x350 [ 6471.824439] cache_mngt_core_deinit_attached_meta+0x1f4/0x284 [cas_cache] [ 6471.824464] _ocf_mngt_cache_remove_core_attached+0x40/0x9c [cas_cache] [ 6471.824486] _ocf_pipeline_run_step+0x90/0x100 [cas_cache] [ 6471.824507] ocf_io_handle+0x30/0x54 [cas_cache] [ 6471.824531] ocf_queue_run_single+0x44/0x5c [cas_cache] [ 6471.824552] ocf_queue_run+0x3c/0x64 [cas_cache] [ 6471.824576] _cas_io_queue_thread+0x74/0x110 [cas_cache] [ 6471.824582] kthread+0x108/0x13c [ 6471.824588] ret_from_fork+0x10/0x18 [ 6471.824593] Kernel panic - not syncing: softlockup: hung tasks [ 6471.825279] CPU: 44 PID: 110582 Comm: cas_mngt_1 Kdump: loaded Tainted: G OEL 5.10.0-136.37.0.113.oe2203sp1.aarch64 #1 [ 6471.826392] Hardware name: Inspur CS5260F /YZMB-02006-101 , BIOS 4.0.8 12/22/21 16:05:48 [ 6471.827260] Call trace: [ 6471.827541] dump_backtrace+0x0/0x1e4 [ 6471.827912] show_stack+0x20/0x2c [ 6471.828288] dump_stack+0xd8/0x140 [ 6471.828668] panic+0x168/0x390 [ 6471.829001] watchdog_timer_fn+0x230/0x290 [ 6471.829499] __run_hrtimer+0x98/0x2a0 [ 6471.829942] __hrtimer_run_queues+0xb0/0x134 [ 6471.830415] hrtimer_interrupt+0x13c/0x3c0 [ 6471.830846] arch_timer_handler_phys+0x3c/0x50 [ 6471.831307] handle_percpu_devid_irq+0x90/0x1f4 [ 6471.831769] __handle_domain_irq+0x84/0xf0 [ 6471.832365] gic_handle_irq+0x90/0x2b4 [ 6471.832763] el1_irq+0xb8/0x140 [ 6471.833107] downgrade_write+0x2d0/0x350 [ 6471.833552] cache_mngt_core_deinit_attached_meta+0x1f4/0x284 [cas_cache] [ 6471.834325] _ocf_mngt_cache_remove_core_attached+0x40/0x9c [cas_cache] [ 6471.835080] _ocf_pipeline_run_step+0x90/0x100 [cas_cache] [ 6471.835715] ocf_io_handle+0x30/0x54 [cas_cache] [ 6471.836222] ocf_queue_run_single+0x44/0x5c [cas_cache] [ 6471.836770] ocf_queue_run+0x3c/0x64 [cas_cache] [ 6471.837282] _cas_io_queue_thread+0x74/0x110 [cas_cache] [ 6471.837916] kthread+0x108/0x13c [ 6471.838280] ret_from_fork+0x10/0x18 [ 6471.838678] SMP: stopping secondary CPUs [ 6471.839165] Kernel Offset: 0x3cc836650000 from 0xffff800010000000 [ 6471.840730] PHYS_OFFSET: 0xffffe355c0000000 [ 6471.841909] CPU features: 0x0000,00000002,61800008 [ 6471.843346] Memory Limit: none [ 6471.848292] Starting crashdump kernel... [ 6471.848823] Bye!
open-cas-linux-22.06.2.0723.release
[ 4860.012121] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [cas_mngt_1:147513] [ 4860.013055] Modules linked in: cas_cache(OE) cas_disk(OE) ipt_rpfilter xt_set xt_multiport iptable_raw ip_set_hash_ip ip_set_hash_net ip_set ipip tunnel4 ip_tunnel veth xt_statistic xt_nat xt_addrtype ip6table_nat ip6_tables iptable_mangle xt_physdev xt_conntrack xt_comment xt_mark iptable_filter nf_conntrack_netlink nfnetlink iptable_nat xt_MASQUERADE ip_tables rbd ceph libceph dns_resolver overlay sch_ingress openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c 8021q garp mrp bonding vfat fat dm_multipath ipmi_ssif aes_ce_blk aes_ce_cipher ghash_ce sha1_ce ses enclosure sg ngbe acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel br_netfilter bridge stp llc dm_mod fuse ext4 mbcache jbd2 sd_mod t10_pi ahci libahci sha2_ce sha256_arm64 igb libata ixgbe ast smartpqi drm_vram_helper drm_ttm_helper ttm scsi_transport_sas mdio aes_neon_bs aes_neon_blk crypto_simd cryptd [last unloaded: cas_disk] [ 4860.013183] CPU: 1 PID: 147513 Comm: cas_mngt_1 Kdump: loaded Tainted: G W OE 5.10.0-136.37.0.113.oe2203sp1.aarch64 #1 [ 4860.013186] Hardware name: Inspur CS5260F /YZMB-02006-101 , BIOS 4.0.8 12/22/21 16:05:48 [ 4860.013189] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--) [ 4860.013198] pc : down_write_killable+0x198/0x270 [ 4860.013236] lr : ocf_hb_id_prot_lock_wr+0x2c/0x60 [cas_cache] [ 4860.013238] sp : ffff800025643cb0 [ 4860.013240] x29: ffff800025643cb0 x28: 000000000dd3c376 [ 4860.013244] x27: ffff8000270b6c80 x26: ffffb8da72ba2c90 [ 4860.013248] x25: 0000000000000000 x24: ffff8000271f7080 [ 4860.013252] x23: 000000000d8b2c39 x22: 0000000000000003 [ 4860.013256] x21: 00000000374f0dd6 x20: 000000000d8b2c39 [ 4860.013259] x19: ffff8000270b6c80 x18: 0000000000000000 [ 4860.013263] x17: 0000000000000000 x16: ffffb8da9f4c8f60 [ 4860.013266] x15: 0000000000000000 x14: 0000000000000000 [ 4860.013270] x13: 0000000000000000 x12: ffff79d03fffa2c0 [ 4860.013273] x11: 00000188a35bc70e x10: 0000000000000eb0 [ 4860.013277] x9 : ffffb8da72b8987c x8 : ffff79c459cbc8d0 [ 4860.013280] x7 : 0000000000000024 x6 : ffff800010935478 [ 4860.013283] x5 : ffffffffffffffff x4 : 0000000000000100 [ 4860.013287] x3 : ffff8000270b6c80 x2 : 000000000d8b2c39 [ 4860.013290] x1 : 0000000000000000 x0 : 0000000000000000 [ 4860.013294] Call trace: [ 4860.013298] down_write_killable+0x198/0x270 [ 4860.013322] cache_mngt_core_deinit_attached_meta+0x98/0x290 [cas_cache] [ 4860.013345] _ocf_mngt_cache_remove_core_mapping+0x40/0x90 [cas_cache] [ 4860.013369] _ocf_pipeline_run_step+0x11c/0x1b0 [cas_cache] [ 4860.013392] ocf_io_handle+0x30/0x54 [cas_cache] [ 4860.013414] ocf_queue_run_single+0x44/0x5c [cas_cache] [ 4860.013435] ocf_queue_run+0x3c/0x64 [cas_cache] [ 4860.013459] _cas_io_queue_thread+0x74/0x110 [cas_cache] [ 4860.013464] kthread+0x108/0x13c [ 4860.013469] ret_from_fork+0x10/0x18 [ 4860.013472] Kernel panic - not syncing: softlockup: hung tasks [ 4860.014127] CPU: 1 PID: 147513 Comm: cas_mngt_1 Kdump: loaded Tainted: G W OEL 5.10.0-136.37.0.113.oe2203sp1.aarch64 #1 [ 4860.015225] Hardware name: Inspur CS5260F /YZMB-02006-101 , BIOS 4.0.8 12/22/21 16:05:48 [ 4860.016098] Call trace: [ 4860.016384] dump_backtrace+0x0/0x1e4 [ 4860.016775] show_stack+0x20/0x2c [ 4860.017129] dump_stack+0xd8/0x140 [ 4860.017481] panic+0x168/0x390 [ 4860.017838] watchdog_timer_fn+0x230/0x290 [ 4860.018255] __run_hrtimer+0x98/0x2a0 [ 4860.018632] __hrtimer_run_queues+0xb0/0x134 [ 4860.019084] hrtimer_interrupt+0x13c/0x3c0 [ 4860.019520] arch_timer_handler_phys+0x3c/0x50 [ 4860.019975] handle_percpu_devid_irq+0x90/0x1f4 [ 4860.020421] __handle_domain_irq+0x84/0xf0 [ 4860.020835] gic_handle_irq+0x90/0x2b4 [ 4860.021231] el1_irq+0xb8/0x140 [ 4860.021578] down_write_killable+0x198/0x270 [ 4860.022034] cache_mngt_core_deinit_attached_meta+0x98/0x290 [cas_cache] [ 4860.022788] _ocf_mngt_cache_remove_core_mapping+0x40/0x90 [cas_cache] [ 4860.023437] _ocf_pipeline_run_step+0x11c/0x1b0 [cas_cache] [ 4860.024051] ocf_io_handle+0x30/0x54 [cas_cache] [ 4860.024546] ocf_queue_run_single+0x44/0x5c [cas_cache] [ 4860.025098] ocf_queue_run+0x3c/0x64 [cas_cache] [ 4860.025615] _cas_io_queue_thread+0x74/0x110 [cas_cache] [ 4860.026136] kthread+0x108/0x13c [ 4860.026486] ret_from_fork+0x10/0x18 [ 4860.026867] SMP: stopping secondary CPUs [ 4860.027330] Kernel Offset: 0x38da8e7d0000 from 0xffff800010000000 [ 4860.029006] PHYS_OFFSET: 0xffff873bc0000000 [ 4860.030250] CPU features: 0x0000,00000002,61800008 [ 4860.031900] Memory Limit: none [ 4860.036446] Starting crashdump kernel... [ 4860.037002] Bye!
open-cas-linux-21.06.3.0551.release this is special, the cache metadata was old, so load failed and cache_mngt_core_remove_from_cache and crash.
cache_mngt_core_remove_from_cache
[12618.084423] cache1: Loading cache state... [12618.125080] cache1: Loading Part config WARNING, invalid checksum [12618.125298] cache1: Loading Core config WARNING, invalid checksum [12618.125366] cache1: Loading Core UUID WARNING, invalid checksum [12618.126362] cache1: ERROR: Cache device size mismatch! [12618.150001] Unable to handle kernel paging request at virtual address 00000000001420ac [12618.151013] Mem abort info: [12618.151393] ESR = 0x96000004 [12618.151769] EC = 0x25: DABT (current EL), IL = 32 bits [12618.152340] SET = 0, FnV = 0 [12618.152728] EA = 0, S1PTW = 0 [12618.153128] Data abort info: [12618.153539] ISV = 0, ISS = 0x00000004 [12618.154000] CM = 0, WnR = 0 [12618.154377] user pgtable: 4k pages, 48-bit VAs, pgdp=0000012008c38000 [12618.155015] [00000000001420ac] pgd=0000000000000000, p4d=0000000000000000 [12618.155689] Internal error: Oops: 96000004 [#1] SMP [12618.156189] Modules linked in: cas_cache(OE) cas_disk(OE) ipt_rpfilter xt_set xt_multiport iptable_raw ip_set_hash_ip ip_set_hash_net ip_set ipip tunnel4 ip_tunnel veth xt_statistic xt_nat xt_addrtype ip6table_nat ip6_tables iptable_mangle xt_physdev xt_conntrack xt_comment xt_mark iptable_filter nf_conntrack_netlink nfnetlink sch_ingress iptable_nat xt_MASQUERADE ip_tables rbd ceph libceph dns_resolver overlay openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c 8021q garp mrp bonding vfat fat dm_multipath ipmi_ssif aes_ce_blk aes_ce_cipher ghash_ce sha1_ce ses enclosure sg ngbe acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel br_netfilter bridge stp llc dm_mod fuse ext4 mbcache jbd2 sd_mod t10_pi ahci libahci sha2_ce sha256_arm64 igb libata ixgbe ast smartpqi drm_vram_helper drm_ttm_helper scsi_transport_sas ttm mdio aes_neon_bs aes_neon_blk crypto_simd cryptd [last unloaded: cas_disk] [12618.164112] CPU: 21 PID: 202913 Comm: casadm Kdump: loaded Tainted: G OE 5.10.0-136.37.0.113.oe2203sp1.aarch64 #1 [12618.165226] Hardware name: Inspur CS5260F /YZMB-02006-101 , BIOS 4.0.8 12/22/21 16:05:48 [12618.166121] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) [12618.166784] pc : cache_mngt_core_remove_from_cache+0x4c/0xa0 [cas_cache] [12618.167509] lr : cache_mngt_core_remove_from_cache+0x34/0xa0 [cas_cache] [12618.168199] sp : ffff8000262fbb60 [12618.168548] x29: ffff8000262fbb60 x28: ffffc060b6f9c7c0 [12618.169157] x27: ffff16efe06bdf88 x26: 0000000000000001 [12618.169817] x25: ffff8000262fbd10 x24: ffffc060c97c3798 [12618.170383] x23: ffff8000299a7140 x22: 0000000000000000 [12618.170931] x21: ffff8000299a5000 x20: 0000000000000000 [12618.171473] x19: ffff8000299a7140 x18: 0000000000000000 [12618.172017] x17: 0000000000000000 x16: ffffc060c8a3bc84 [12618.172571] x15: 0000000000000000 x14: 0000000000000000 [12618.173117] x13: 0000000000000000 x12: 0000000000000018 [12618.173664] x11: 00008250d3c1dd20 x10: 0000000000000eb0 [12618.174224] x9 : ffffc060b6f41f64 x8 : ffff16efcd003590 [12618.174774] x7 : 7fffffffffffffff x6 : 000000966e355c45 [12618.175324] x5 : 00000000700f6630 x4 : ffff16efe06bdfa0 [12618.175887] x3 : 0000000000000000 x2 : ffffc060c9d38340 [12618.176435] x1 : 0000000000142000 x0 : 0000000000000000 [12618.176989] Call trace: [12618.177318] cache_mngt_core_remove_from_cache+0x4c/0xa0 [cas_cache] [12618.178003] _ocf_mngt_cache_stop_remove_cores+0x80/0xc4 [cas_cache] [12618.178661] ocf_mngt_cache_stop_detached+0x38/0xf0 [cas_cache] [12618.179411] ocf_mngt_cache_stop+0x120/0x140 [cas_cache] [12618.180006] cache_mngt_init_instance+0x19c/0x370 [cas_cache] [12618.180622] cas_service_ioctl_ctrl+0x2b30/0x49d0 [cas_cache] [12618.181253] __arm64_sys_ioctl+0xb0/0xf4 [12618.181688] invoke_syscall+0x50/0x11c [12618.182094] el0_svc_common.constprop.0+0x158/0x164 [12618.182606] do_el0_svc+0x2c/0x9c [12618.183001] el0_svc+0x20/0x30 [12618.183356] el0_sync_handler+0xb0/0xb4 [12618.183781] el0_sync+0x160/0x180 [12618.184137] Code: 121d7801 39047261 37080180 91450a81 (79415820) [12618.184765] ---[ end trace c4448c973eb33765 ]--- [12618.185262] Kernel panic - not syncing: Oops: Fatal exception [12618.185844] SMP: stopping secondary CPUs [12618.186331] Kernel Offset: 0x4060b86a0000 from 0xffff800010000000 [12618.187788] PHYS_OFFSET: 0xffffea3040000000 [12618.189136] CPU features: 0x0000,00000002,61800008 [12618.190602] Memory Limit: none [12618.195577] Starting crashdump kernel... [12618.196063] Bye!
the core was removed
kernel crash
did you solved it?
I also experienced similar symptoms.
it occured opencas(22.6.3) in rockylinux 8.10
opencas(22.6.1) is worked in rockylinux 8.10
Description
kernel crash when rm core from cache
open-cas-linux-21.06.5.0555.release
open-cas-linux-22.06.2.0723.release
open-cas-linux-21.06.3.0551.release this is special, the cache metadata was old, so load failed and
cache_mngt_core_remove_from_cache
and crash.Expected Behavior
the core was removed
Actual Behavior
kernel crash
Steps to Reproduce
Context
Possible Fix
Logs
Your Environment