SCST-project / scst

SCST is a SCSI target software stack that allows to export any block device or file via iSCSI, FC or RDMA (SRP).
http://scst.sourceforge.net
91 stars 34 forks source link

kpanic in scst_cm_dev_unregister #99

Closed ishioni closed 1 year ago

ishioni commented 1 year ago

I use Truenas Scale which ships scst 3.6.0.8557-~truenas+2. When my kubernetes cluster connects iscsi targets to pods, from time to time i get the following kernel panic. I can trigger it semi-reliably by randomly killing and spawning pods which require the iscsi targets. This happens both on kernels 5.10 and 5.15

[20421.307062] [20891]: scst: Attached to virtual device pvc-227dff34-8712-4b08-a0cd-2e9ee26fe99b (id 19)
[20421.308079] [1120566]: dev_vdisk: T10 device id for device pvc-227dff34-8712-4b08-a0cd-2e9ee26fe99b changed to 974a63b6449af8f
[20421.308116] [1120566]: dev_vdisk: USN for device pvc-227dff34-8712-4b08-a0cd-2e9ee26fe99b changed to 974a63b6449af8f
[20421.308121] list_del corruption. next->prev should be ffff955cb1ea2540, but was ffff955c54a32440
[20421.308128] ------------[ cut here ]------------
[20421.308130] kernel BUG at lib/list_debug.c:54!
[20421.308134] invalid opcode: 0000 [#1] SMP PTI
[20421.308137] CPU: 4 PID: 93251 Comm: kworker/4:1 Tainted: P           OE     5.15.62+truenas #1
[20421.308140] Hardware name: Default string Default string/SKYBAY, BIOS QZ01AR12 09/17/2017
[20421.308143] Workqueue: events vdev_inq_changed_fn [scst_vdisk]
[20421.308151] RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47
[20421.308168] Code: c7 c7 e0 fe 15 bb e8 b4 f3 fe ff 0f 0b 48 89 fe 48 c7 c7 70 ff 15 bb e8 a3 f3 fe ff 0f 0b 48 c7 c7 20 00 16 bb e8 95 f3 fe ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 e0 ff 15 bb e8 81 f3 fe ff 0f 0b
[20421.308172] RSP: 0018:ffffb889443afe38 EFLAGS: 00010246
[20421.308175] RAX: 0000000000000054 RBX: ffffffffc1d15220 RCX: ffff9563ddd20448
[20421.308177] RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9563ddd20440
[20421.308179] RBP: ffff955cb1ea2540 R08: 0000000000000000 R09: ffffb889443afc68
[20421.308181] R10: ffffb889443afc60 R11: ffffffffbb6d3268 R12: 0000000000000000
[20421.308183] R13: dead000000000122 R14: dead000000000100 R15: ffff955e1c6ab100
[20421.308186] FS:  0000000000000000(0000) GS:ffff9563ddd00000(0000) knlGS:0000000000000000
[20421.308188] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20421.308190] CR2: 000055e2327427f8 CR3: 000000037fcc4005 CR4: 00000000003706e0
[20421.308193] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[20421.308195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[20421.308197] Call Trace:
[20421.308199]  <TASK>
[20421.308201]  scst_cm_dev_unregister+0x66/0xd0 [scst]
[20421.308217]  scst_cm_update_dev+0x41/0xc0 [scst]
[20421.308231]  process_one_work+0x1ee/0x390
[20421.308234]  worker_thread+0x53/0x3e0
[20421.308237]  ? process_one_work+0x390/0x390
[20421.308239]  kthread+0x124/0x150
[20421.308241]  ? set_kthread_struct+0x50/0x50
[20421.308244]  ret_from_fork+0x1f/0x30
[20421.308248]  </TASK>
[20421.308249] Modules linked in: scst_vdisk(OE) isert_scst(OE) iscsi_scst(OE) scst(OE) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) dlm(E) rpcsec_gss_krb5(E) wireguard(E) libchacha20poly1305(E) chacha_x86_64(E) poly1305_x86_64(E) curve25519_x86_64(E) libcurve25519_generic(E) libchacha(E) ip6_udp_tunnel(E) udp_tunnel(E) xt_nat(E) xt_tcpudp(E) veth(E) xt_conntrack(E) nft_chain_nat(E) xt_MASQUERADE(E) nf_nat(E) nf_conntrack_netlink(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) xfrm_user(E) xfrm_algo(E) nft_counter(E) xt_addrtype(E) nft_compat(E) nf_tables(E) nfnetlink(E) br_netfilter(E) bridge(E) msr(E) binfmt_misc(E) essiv(E) authenc(E) dm_crypt(E) dm_mod(E) 8021q(E) garp(E) stp(E) mrp(E) llc(E) bonding(E) ntb_netdev(E) ntb_transport(E) ntb_split(E) ntb(E) ioatdma(E) intel_rapl_msr(E) intel_rapl_common(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) snd_hda_codec_hdmi(E) kvm_intel(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) mei_wdt(E) mei_hdcp(E) ledtrig_audio(E)
[20421.308285]  kvm(E) irqbypass(E) rapl(E) intel_cstate(E) evdev(E) intel_uncore(E) snd_hda_intel(E) snd_intel_dspcfg(E) snd_intel_sdw_acpi(E) mei_me(E) snd_hda_codec(E) wdat_wdt(E) i915(E) ir_rc6_decoder(E) pcspkr(E) watchdog(E) snd_hda_core(E) intel_wmi_thunderbolt(E) snd_hwdep(E) ttm(E) snd_pcm(E) snd_timer(E) drm_kms_helper(E) snd(E) soundcore(E) ee1004(E) rc_rc6_mce(E) mei(E) cec(E) sg(E) intel_pch_thermal(E) ite_cir(E) rc_core(E) intel_pmc_core(E) button(E) acpi_pad(E) nfsd(E) auth_rpcgss(E) fuse(E) nfs_acl(E) configfs(E) lockd(E) drm(E) grace(E) sunrpc(E) efivarfs(E) ip_tables(E) x_tables(E) autofs4(E) zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zcommon(POE) znvpair(POE) zavl(POE) icp(POE) spl(OE) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) raid1(E) raid0(E) multipath(E) linear(E) md_mod(E) ses(E) enclosure(E) scsi_transport_sas(E) sd_mod(E) crc32_pclmul(E) crc32c_intel(E)
[20421.308337]  ghash_clmulni_intel(E) nvme(E) igb(E) ahci(E) ahciem(E) i2c_algo_bit(E) xhci_pci(E) dca(E) nvme_core(E) t10_pi(E) crc_t10dif(E) e1000e(E) libahci(E) aesni_intel(E) crypto_simd(E) xhci_hcd(E) ptp(E) i2c_i801(E) intel_lpss_pci(E) cryptd(E) crct10dif_generic(E) libata(E) i2c_smbus(E) crct10dif_pclmul(E) crct10dif_common(E) pps_core(E) scsi_mod(E) intel_lpss(E) scsi_common(E) idma64(E) usbcore(E) usb_common(E) fan(E) wmi(E) video(E)
[20421.308373] ---[ end trace e89eb550d12b0ed7 ]---
lnocturno commented 1 year ago

Hi,

Thank you for the report!

I have created PR https://github.com/SCST-project/scst/pull/100 Could you retest the bug with these four patches?

Thanks, Gleb

lnocturno commented 1 year ago

Hi,

Fix candidate was merged to the master branch. Tell me if you need these patches to be ported to the SCST 3.6 stable branch. If you reproduce this problem again, feel free to reopen the issue.

Gleb.

ishioni commented 1 year ago

Sorry for the lack of reply. Unfortunately I won't be able to test this patch per-se, as for me it's part of a specialized distribution that makes it hard to replace pieces of it. I've alerted the devs if this patch, and hopefully they'll pull it in. Looking at their sources they seem to be pulling in 3.7 for the newest version so a fix for 3.7 should suffice

Thank you for a very speedy fix :)