Open-CAS / open-cas-linux

Open CAS Linux
https://open-cas.com
BSD 3-Clause "New" or "Revised" License
220 stars 82 forks source link

Kernel BUG at cas_cache/volume/vol_blk_utils.c:146! #865

Closed suxli closed 3 years ago

suxli commented 3 years ago

Description

We are trying to use openCAS in arm system. When this issue happend, cas devices seems to be unaccessable.

Expected Behavior

IO could be proccessed.

Actual Behavior

IO requests sent to opencas device hung(io_prep_pwrite) cli casadm was not responding.

Steps to Reproduce

Run our storage business for some time, this will happen. No repeatable manual steps to reproduce are found.

Context

1)Write IO requests hung. For example:io_prep_pwrite does not return 2)casadm was not responding when this happened.

Possible Fix

No conclusion by now.

Logs

Kernel reported this issue twice in two days. Business runs normally before this happened, here is log in /var/log/kernel.log: Jun 29 16:29:56 node02 kernel: [1208589.136644] -----------[ cut here ]----------- Jun 29 16:29:56 node02 kernel: [1208589.136648] kernel BUG at /root/open-cas-linux/open-cas-linux-v20.03.1.0292/modules/cas_cache/volume/vol_blk_utils.c:146! Jun 29 16:29:56 node02 kernel: [1208589.141824] Internal error: Oops - BUG: 0 1 SMP Jun 29 16:29:56 node02 kernel: [1208589.144464] Modules linked in: arc4 md4 nls_utf8 nfsv3 nfs_acl cifs ccm rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace sunrpc fscache ip6table_filter ip6_tables binfmt_misc xt_nat xt_tcpudp veth qbd_tcp(OE) qbd(OE) knem(OE) mst_pciconf(OE) vsock_diag vsock sctp_diag sctp dccp_diag dccp tcp_diag udp_diag raw_diag inet_diag unix_diag netconsole xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack br_netfilter bridge stp llc scst_vdisk(OE) aufs iscsi_scst(OE) scst(OE) dlm overlay cuse rdma_ucm(OE) ib_ipoib(OE) ib_umad(OE) bonding nls_iso8859_1 ipmi_ssif cas_cache(OE) cas_disk(OE) joydev input_leds shpchp ipmi_si ipmi_devintf ipmi_msghandler nfit Jun 29 16:29:56 node02 kernel: [1208589.178212] cppc_cpufreq sch_fq_codel ib_iser(OE) rdma_cm(OE) iw_cm(OE) ib_cm(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iscsi_target_mod target_core_mod ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib(OE) ib_uverbs(OE) ses enclosure ib_core(OE) hibmc_drm realtek ttm hid_generic drm_kms_helper syscopyarea sysfillrect mlx5_core(OE) mlxfw(OE) psample devlink aes_ce_blk aes_ce_cipher crc32_ce crct10dif_ce ghash_ce sha2_ce mlx_compat(OE) usbhid sysimgblt hns3 nvme sha256_arm64 hisi_sas_v3_hw fb_sys_fops sha1_ce hisi_sas_main ptp drm nvme_core hclge libsas hid ahci megaraid_sas hnae3 libahci scsi_transport_sas pps_core gpio_dwapb aes_neon_bs aes_neon_blk crypto_simd Jun 29 16:29:56 node02 kernel: [1208589.220055] cryptd aes_arm64 [last unloaded: knem] Jun 29 16:29:56 node02 kernel: [1208589.224193] CPU: 11 PID: 1726 Comm: cas_io_cache1_1 Tainted: G OE 4.15.0-58-generic #64+pitrix2 Jun 29 16:29:56 node02 kernel: [1208589.232557] Hardware name: QingStor DS2123V2/BC82AMDGK, BIOS 1.70 01/07/2021 Jun 29 16:29:56 node02 kernel: [1208589.241058] pstate: a0c00009 (NzCv daif +PAN +UAO) Jun 29 16:29:56 node02 kernel: [1208589.245334] pc : cas_io_iter_move+0xa4/0xa8 [cas_cache] Jun 29 16:29:56 node02 kernel: [1208589.249571] lr : _cas_ctx_seek_data+0x78/0x90 [cas_cache] Jun 29 16:29:56 node02 kernel: [1208589.253748] sp : ffff0000225dbd00 Jun 29 16:29:56 node02 kernel: [1208589.257839] x29: ffff0000225dbd00 x28: 0000000000000000 Jun 29 16:29:56 node02 kernel: [1208589.261876] x27: 0000000000000000 x26: ffff000008e0ba98 Jun 29 16:29:56 node02 kernel: [1208589.265814] x25: ffff0000036c6000 x24: ffffa02fb7691980 Jun 29 16:29:56 node02 kernel: [1208589.269662] x23: ffff00002228d000 x22: 0000000000000000 Jun 29 16:29:56 node02 kernel: [1208589.273416] x21: 0000000000000000 x20: ffff802fb8175040 Jun 29 16:29:56 node02 kernel: [1208589.277082] x19: 0000000000000000 x18: 0000ffff9df1fa70 Jun 29 16:29:56 node02 kernel: [1208589.280643] x17: 0000ffff9e1349e0 x16: ffff0000082e80a0 Jun 29 16:29:56 node02 kernel: [1208589.284120] x15: 00002f9ef0000000 x14: 000000000213fae6 Jun 29 16:29:56 node02 kernel: [1208589.287505] x13: 0000ffff00060000 x12: 000fffffffffffff Jun 29 16:29:56 node02 kernel: [1208589.290798] x11: ffff000b00000000 x10: 0000000000000ad0 Jun 29 16:29:56 node02 kernel: [1208589.294010] x9 : 0000000000000000 x8 : ffff802f865fdc08 Jun 29 16:29:56 node02 kernel: [1208589.297135] x7 : 0000000000000000 x6 : 000000000000003f Jun 29 16:29:56 node02 kernel: [1208589.300149] x5 : 0000000000000040 x4 : ffffffffffffffe0 Jun 29 16:29:56 node02 kernel: [1208589.303075] x3 : ffff0000036fff98 x2 : 0000000000000001 Jun 29 16:29:56 node02 kernel: [1208589.305911] x1 : 00000000b81671b0 x0 : 00000000b815f1df Jun 29 16:29:56 node02 kernel: [1208589.308658] Process cas_io_cache1_1 (pid: 1726, stack limit = 0x000000002209033b) Jun 29 16:29:56 node02 kernel: [1208589.314024] Call trace: Jun 29 16:29:56 node02 kernel: [1208589.316620] cas_io_iter_move+0xa4/0xa8 [cas_cache] Jun 29 16:29:56 node02 kernel: [1208589.319182] _cas_ctx_seek_data+0x78/0x90 [cas_cache] Jun 29 16:29:56 node02 kernel: [1208589.321689] metadata_io_restart_req+0x98/0x178 [cas_cache] Jun 29 16:29:56 node02 kernel: [1208589.324197] ocf_io_handle+0x4c/0x60 [cas_cache] Jun 29 16:29:56 node02 kernel: [1208589.326671] ocf_queue_run_single+0x54/0x58 [cas_cache] Jun 29 16:29:56 node02 kernel: [1208589.329130] ocf_queue_run+0x3c/0x68 [cas_cache] Jun 29 16:29:56 node02 kernel: [1208589.331484] _cas_io_queue_thread+0x78/0x128 [cas_cache] Jun 29 16:29:56 node02 kernel: [1208589.333779] kthread+0x134/0x138 Jun 29 16:29:56 node02 kernel: [1208589.336031] ret_from_fork+0x10/0x18 Jun 29 16:29:56 node02 kernel: [1208589.338250] Code: a94153f3 f94013f5 a8c37bfd d65f03c0 (d4210000)

Your Environment

mmichal10 commented 3 years ago

Hi @suxli thank you for reporting this issue. Do you have possibility to verify whether it reproduces with CAS v21.3.2?