SCST-project / scst

SCST is a SCSI target software stack that allows to export any block device or file via iSCSI, FC or RDMA (SRP).
http://scst.sourceforge.net
93 stars 34 forks source link

Kernel panic on 3.6.0 #105

Open MJAsadi72 opened 1 year ago

MJAsadi72 commented 1 year ago

Hi

I got this error on our storage server that caused kernel panic:

[10292670.543240] sqatgt(18/0): Registering initiator: pwwn=50:01:43:80:74:ab:7a:aa
[10292670.543256] [20014]: scst: Using security group "51:40:2e:c0:01:c2:c9:02" for initiator "50:01:43:80:74:ab:7a:aa" (target 51:40:2e:c0:01:c2:c9:02)
[10292670.543897] qla2xxx [0000:04:00.1]-d034:18: qla24xx_do_nack_work create sess success 000000000284bc05
[10292693.209939] igb 0000:08:00.0 eno2: igb: eno2 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX
[10292693.210011] igb 0000:08:00.0 eno2: Link Speed was downgraded by SmartSpeed
[10292694.238548] igb 0000:08:00.0 eno2: igb: eno2 NIC Link is Down
[10292719.191912] igb 0000:08:00.0 eno2: igb: eno2 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX
[10292719.191985] igb 0000:08:00.0 eno2: Link Speed was downgraded by SmartSpeed
[10292720.221548] igb 0000:08:00.0 eno2: igb: eno2 NIC Link is Down
[10292745.142919] igb 0000:08:00.0 eno2: igb: eno2 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX
[10292745.142993] igb 0000:08:00.0 eno2: Link Speed was downgraded by SmartSpeed
[10292746.174590] igb 0000:08:00.0 eno2: igb: eno2 NIC Link is Down
[10292749.710836] ------------[ cut here ]------------
[10292749.710840] kernel BUG at /usr/src/packages/BUILD/scst-3.6.0.4/qla2x00t-32gbit/qla_target.c:2478!
[10292749.719996] invalid opcode: 0000 [#1] SMP PTI
[10292749.724612] CPU: 13 PID: 14160 Comm: kworker/13:2 Kdump: loaded Tainted: G        W  OE     5.4.52-HPDS #1
[10292749.734519] Hardware name: Supermicro Super Server/X10DRL-i, BIOS 2.0a 08/25/2016
[10292749.742270] Workqueue: qla_tgt_wq qlt_do_work [qla2xxx_scst]
[10292749.748189] RIP: 0010:qlt_pci_map_calc_cnt+0xbb/0xf0 [qla2xxx_scst]
[10292749.754710] Code: 02 00 00 02 31 c9 8b 43 28 83 f8 01 7e 16 8d 70 03 ba 67 66 66 66 89 f0 c1 fe 1f f7 ea d1 fa 29 f2 01 53 2c 89 c8 5b c3 0f 0b <0f> 0b 0f 0b 48 8b b0 f8 01 00 00 44 8b 88 a0 02 00 00 48 c7 c1 58
[10292749.773707] RSP: 0018:ffffa513c5ccfc68 EFLAGS: 00010246
[10292749.779183] RAX: ffff8ad10de0b220 RBX: ffffa513c5ccfc90 RCX: 0000000000000000
[10292749.786565] RDX: 0000000000000000 RSI: ffff8acd66af11f4 RDI: ffffa513c5ccfc90
[10292749.793950] RBP: ffff8acd5b139800 R08: 0000000000000009 R09: 0000000000000200
[10292749.801335] R10: 000000000000008b R11: 00000000007e7581 R12: ffffa513c5ccfc90
[10292749.808719] R13: ffff8ad1664e67b8 R14: ffff8ad10de0b428 R15: 0000000000000000
[10292749.816103] FS:  0000000000000000(0000) GS:ffff8ad16fb40000(0000) knlGS:0000000000000000
[10292749.824440] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10292749.830437] CR2: 00007fd8dc00b348 CR3: 000000086800a005 CR4: 00000000003606e0
[10292749.837820] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[10292749.845204] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[10292749.852587] Call Trace:
[10292749.855308]  qlt_rdy_to_xfer+0x65/0x280 [qla2xxx_scst]
[10292749.860711]  ? sgv_pool_alloc+0xb8/0x8e0 [scst]
[10292749.865498]  sqa_rdy_to_xfer+0xb2/0x290 [qla2x00tgt]
[10292749.870726]  ? scst_adjust_sg+0x42/0xf0 [scst]
[10292749.875431]  scst_process_active_cmd+0x237/0x1630 [scst]
[10292749.881005]  ? scst_cmd_init_done+0xcd/0x5a0 [scst]
[10292749.886146]  ? scst_alloc_cmd+0x43/0xb0 [scst]
[10292749.890849]  ? __switch_to_asm+0x40/0x70
[10292749.895036]  ? __scst_rx_cmd.isra.35+0x40/0x80 [scst]
[10292749.900344]  sqa_qla2xxx_handle_cmd+0x20b/0x290 [qla2x00tgt]
[10292749.906261]  qlt_do_work+0x16b/0x320 [qla2xxx_scst]
[10292749.911398]  process_one_work+0x165/0x370
[10292749.915667]  worker_thread+0x49/0x3e0
[10292749.919583]  kthread+0xf8/0x130
[10292749.922980]  ? rescuer_thread+0x330/0x330
[10292749.927243]  ? kthread_bind+0x10/0x10
[10292749.931163]  ret_from_fork+0x35/0x40
[10292749.934991] Modules linked in: dm_mirror dm_region_hash dm_log ib_iser libiscsi scsi_transport_iscsi ib_srpt(OE) qla2x00tgt(OE) scst_vdisk(OE) isert_scst(OE) iscsi_scst(OE) scst(OE) dlm rdma_cm iw_cm ib_cm ib_core nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ebtable_filter ebtables ip6table_nat ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle iptable_security iptable_raw iptable_filter rapidstor_rand(OE) rapidstor_lru(OE) rapidstor_fifo(OE) rapidstor(OE) dm_mod intel_rapl_msr intel_rapl_common sb_edac iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cas_cache(OE) cas_disk(OE) ast drm_vram_helper ttm aesni_intel drm_kms_helper syscopyarea glue_helper sysfillrect crypto_simd
[10292749.935031]  sysimgblt cryptd fb_sys_fops pcspkr drm i2c_i801 lpc_ich mfd_core ses mei_me enclosure scsi_transport_sas mei sg ioatdma wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc32c_intel igb ahci qla2xxx_scst(OE) ptp libahci pps_core nvme_fc dca nvme_fabrics i2c_algo_bit libata i2c_core nvme_core megaraid_sas scsi_transport_fc

have any idea what caused this bug? may this be a hardware problem?

lnocturno commented 1 year ago

Hi,

Thank you for the report.

Could you recheck this problem with the SCST master branch?

Thanks, Gleb

MJAsadi72 commented 1 year ago

Hi Gleb,

unfortunately not we use this server in production and can't get it downtime for maintenance anytime soon and beside that we wait for new stable release of scst (version 3.7) and considered that as upgrade path for scst on our server. if you want anything about logs on our server, I can send it to you.

lnocturno commented 1 year ago

SCST 3.7 will be released soon (later this month). It has a lot of fixes for the qlogic driver and I think your problem might already be fixed. What do you think if you retest this issue with SCST 3.7 release and if you reproduce it again I will take a closer look.