GPUOpen-LibrariesAndSDKs / MxGPU-Virtualization

MIT License
182 stars 83 forks source link

kernel panic on 4.9.0 for s7150x2 #30

Open duanwujie opened 4 years ago

duanwujie commented 4 years ago

[ 63.725792] gim: loading out-of-tree module taints kernel. [ 63.728298] gim info:(gim_init:149) Start AMD open source GIM initialization [ 63.728299] gim info:(gim_init:152) GPU IOV MODULE - version 1.1.4 [ 63.728299] gim info:(gim_init:154) Copyright (c) 2014-2017 Advanced Micro Devices, Inc. All rights reserved. [ 63.728305] gim info:(parse_config_file:219) AMD GIM fb_option = 0 [ 63.728305] gim info:(parse_config_file:219) AMD GIM sched_option = 0 [ 63.728306] gim info:(parse_config_file:219) AMD GIM vf_num = 0 [ 63.728306] gim info:(parse_config_file:219) AMD GIM pf_fb = 0 [ 63.728306] gim info:(parse_config_file:219) AMD GIM vf_fb = 0 [ 63.728307] gim info:(parse_config_file:219) AMD GIM sched_interval = 0 [ 63.728307] gim info:(parse_config_file:219) AMD GIM sched_interval_us = 0 [ 63.728308] gim info:(parse_config_file:219) AMD GIM fb_clear = 0 [ 63.728308] gim info:(init_config:341) INIT CONFIG [ 63.773658] gim info:(enumerate_all_pfs:146) pfdev :81d60000 [ 63.773659] gim info:(enumerate_all_pfs:146) pfdev :81d63000 [ 63.773660] dwj pf_count : 2 [ 63.773662] gim info:(set_new_adapter:572) curr allocated at ffffffffc0c05d80 [ 63.773662] gim info:(set_new_adapter:579) SRIOV is supported [ 63.773665] gim info:(set_new_adapter:587) found PCI bridge device [ 63.773667] gim info:(set_new_adapter:591) found: 02:8.0 [ 63.773691] gim info:(set_new_adapter:608) mmio_base = ffffaa7388fc0000 [ 63.773696] gim info:(set_new_adapter:610) doorbell = ffffaa7389e00000 [ 63.773697] gim error:(map_fb:369) can't iomap for BAR 0 [ 63.774281] gim info:(set_new_adapter:612) pf.fb_va = (null) [ 63.774293] gim info:(sriov_is_ari_enabled:164) PCI_SRIOV_CAP = 0x00000002 [ 63.774295] gim info:(sriov_is_ari_enabled:174) PCI_SRIOV_CTRL = 0x00000010 [ 63.774295] gim info:(sriov_is_ari_enabled:177) PCI_SRIOV_CTRL_ARI is set --> ARI is supported [ 63.774298] gim info:(program_ari_mode:441) Read bif_strap8 = 0x00200004 [ 63.774299] gim info:(program_ari_mode:446) program_ari_mode - Set ARI_Mode = PF_BUS [ 63.774299] gim info:(program_ari_mode:456) Write bif_strap8 = 0x00000004 [ 63.774300] gim info:(gim_read_rom_from_reg:181) Reading VBios from ROM [ 63.774419] gim info:(gim_read_vbios:243) VBIOS starts: 0x55, 0xaa [ 63.774420] gim info:(gim_read_vbios:246) VBios size is 0x10000 [ 63.774429] gim info:(gim_read_vbios:249) vbios allocated at ffffaa7383ac1000 [ 63.774429] gim info:(gim_read_rom_from_reg:181) Reading VBios from ROM [ 63.911429] gim info:(gim_read_vbios:257) BIOS Version Major 0xF Minor 0x31 [ 63.911458] gim info:(gim_read_vbios:270) Valid video BIOS image, [ 63.911458] gim info:(gim_read_vbios:271) size = 0x10000, check sum is 0x543c00 [ 63.911464] gim info:(gim_post_vbios:302) Init Parser passed!, continue [ 63.911467] gim info:(atom_chk_asic_status:333) ATOM_CheckAsicStatus - BIOS_SCRATCH_7 = 0x00000000 [ 63.911467] gim info:(atom_chk_asic_status:336) Isolate ATOM_S7_ASIC_INIT_COMPLETE_MASK bit(s) = 0x00000000 [ 63.911469] gim info:(atom_chk_asic_status:339) RLC_CNTL = 0x00000000 [ 63.911469] gim info:(atom_chk_asic_status:341) Isolate RLC_CNTL__RLC_ENABLE_F32_MASK = 0x00000000 [ 63.911469] gim info:(atom_chk_asic_status:348) ATOM_ASIC_NEED_POST [ 63.911470] gim info:(gim_post_vbios:305) Asic needs a VBios post [ 63.911470] gim info:(atom_post_vbios:200) ATOM_PostVBIOS: firmware_info passed [ 63.911470] gim info:(atom_post_vbios:253) asic_init before, engine clock = 7530; memory clock =1e848 [ 64.233696] gim info:(atom_post_vbios:256) asic_init after [ 64.233696] gim info:(atom_post_vbios:263) atom_init_fan_cntl before [ 64.233696] gim info:(atom_post_vbios:265) atom_init_fan_cntl after [ 64.233697] gim info:(gim_post_vbios:311) Post INIT_ASIC successfully! [ 64.233708] gim info:(firmware_requires_update:510) SMU option ROM version 0x111700 [ 64.233708] gim info:(firmware_requires_update:511) versus patch version 0x111a00 [ 64.233720] gim info:(firmware_requires_update:521) RLCV option ROM version 113 versus patch version 129 [ 64.233720] gim info:(firmware_requires_update:526) TOC found, update it [ 64.233721] gim info:(patch_firmware:586) Update smc_init table [ 64.591918] BUG: unable to handle kernel paging request at 0000000000020000 [ 64.592161] IP: [] memcpy_erms+0x6/0x10 [ 64.592398] PGD 0

[ 64.592635] Oops: 0002 [#1] SMP [ 64.592863] Modules linked in: gim(OE+) openvswitch(E) nf_conntrack_ipv6(E) nf_nat_ipv6(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_defrag_ipv6(E) nf_nat(E) nf_conntrack(E) libcrc32c(E) crc32c_generic(E) mptctl(E) mptbase(E) ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) configfs(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) snd_hda_codec_hdmi(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) i915(E) drm_kms_helper(E) drm(E) intel_rapl(E) i2c_algo_bit(E) x86_pkg_temp_thermal(E) hci_uart(E) snd_hda_intel(E) intel_powerclamp(E) snd_hda_codec(E) btbcm(E) btqca(E) snd_hda_core(E) iTCO_wdt(E) snd_hwdep(E) snd_pcm(E) btintel(E) bluetooth(E) eeepc_wmi(E) asus_wmi(E) coretemp(E) iTCO_vendor_support(E) snd_timer(E) intel_lpss_acpi(E) [ 64.594497] sparse_keymap(E) psmouse(E) mxm_wmi(E) serio_raw(E) evdev(E) joydev(E) kvm_intel(E) intel_lpss(E) mfd_core(E) efi_pstore(E) i2c_i801(E) video(E) shpchp(E) mei_me(E) mei(E) snd(E) soundcore(E) battery(E) rfkill(E) efivars(E) i2c_smbus(E) kvm(E) irqbypass(E) pcspkr(E) crct10dif_pclmul(E) crc32_pclmul(E) tpm_tis(E) acpi_als(E) ghash_clmulni_intel(E) acpi_pad(E) tpm_tis_core(E) kfifo_buf(E) industrialio(E) tpm(E) wmi(E) button(E) ipmi_watchdog(E) ipmi_poweroff(E) ipmi_devintf(E) ipmi_msghandler(E) fuse(E) autofs4(E) ext4(E) crc16(E) jbd2(E) fscrypto(E) mbcache(E) hid_generic(E) sg(E) usbhid(E) sd_mod(E) crc32c_intel(E) aesni_intel(E) aes_x86_64(E) glue_helper(E) lrw(E) gf128mul(E) ablk_helper(E) cryptd(E) ahci(E) libahci(E) xhci_pci(E) libata(E) xhci_hcd(E) r8169(E) mii(E) usbcore(E) scsi_mod(E) [ 64.596414] usb_common(E) fan(E) thermal(E) i2c_hid(E) hid(E) fjes(E) [ 64.597078] CPU: 7 PID: 2331 Comm: insmod Tainted: G OE 4.9.0-0.bpo.1-linx-security-amd64 #1 Linx 4.9.2-2~bpo8+1linx2 [ 64.597852] Hardware name: System manufacturer System Product Name/B365M-KYLIN, BIOS 1202 07/15/2019 [ 64.598236] task: ffff9e8680a33000 task.stack: ffffaa7389758000 [ 64.598620] RIP: 0010:[] [] memcpy_erms+0x6/0x10 [ 64.599016] RSP: 0018:ffffaa738975bb98 EFLAGS: 00010206 [ 64.599418] RAX: 0000000000020000 RBX: ffffffffc0c05d80 RCX: 0000000000020000 [ 64.599826] RDX: 0000000000020000 RSI: ffffaa7383f21000 RDI: 0000000000020000 [ 64.600240] RBP: 00000000000006c0 R08: ffff9e867fde0cc0 R09: 0000000000022000 [ 64.600651] R10: 8000000000000163 R11: 00000000000004dc R12: 0000000000000007 [ 64.601067] R13: ffffaa7383f21000 R14: ffffffffc0c05dc0 R15: ffffaa7383acb662 [ 64.601505] FS: 00007ff63fdef700(0000) GS:ffff9e86863c0000(0000) knlGS:0000000000000000 [ 64.601949] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 64.602382] CR2: 0000000000020000 CR3: 000000084000a000 CR4: 00000000003406e0 [ 64.602824] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 64.603269] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 64.603720] Stack: [ 64.604166] ffffffffc0bb7f3d ffffaa7383acb662 f4000000c0bc3a79 ffff9e867fde1d80 [ 64.604630] ffffffffc0c05d80 0000000000000000 ffffffffc0c05d80 ffffffffc0c05d88 [ 64.605097] ffff9e86800c1980 ffffffffc0bab8bd 0000000000000001 0000000000000000 [ 64.605627] Call Trace: [ 64.606101] [] ? patch_firmware+0x2bd/0x4e0 [gim] [ 64.606578] [] ? gim_post_vbios+0x14d/0x200 [gim] [ 64.607057] [] ? set_new_adapter+0x51b/0x9b0 [gim] [ 64.607533] [] ? gim_probe+0x30/0x30 [gim] [ 64.608001] [] ? gim_probe+0xa/0x30 [gim] [ 64.608465] [] ? gim_init+0xbc/0x120 [gim] [ 64.608921] [] ? do_one_initcall+0x4c/0x180 [ 64.609376] [] ? __vunmap+0x6d/0xc0 [ 64.609857] [] ? do_init_module+0x5a/0x1f1 [ 64.610302] [] ? load_module+0x23c9/0x28f0 [ 64.610750] [] ? __symbol_put+0x60/0x60 [ 64.611189] [] ? SYSC_finit_module+0x8e/0xe0 [ 64.611632] [] ? do_syscall_64+0x81/0x190 [ 64.612065] [] ? entry_SYSCALL64_slow_path+0x25/0x25 [ 64.612489] Code: 90 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 [ 64.613416] RIP [] memcpy_erms+0x6/0x10 [ 64.613884] RSP [ 64.614289] CR2: 0000000000020000 [ 64.614686] ---[ end trace 43f95d5155189075 ]-

aracno974 commented 4 years ago

Hi, i've got exactly the same issue (same trace). Hardware is an HP dl380 gen8 and a s7150 x2. Os is Proxmox 5.4 (kernel 4.15.18-24-pve). How can i resolve this. Thank you.

collinwebdesigns commented 3 years ago

Hi,

same issue here on ASUS KGPE-D16 boot with quiet reboot=cold mem=256G rcu_nocbs=0-31 amd_iommu=on iommu=pt pci=realloc enable_mtrr_cleanup=1 video=efifb:off and also with s7150 x2.

Linux a4d8 5.4.73-1-pve #1 SMP PVE 5.4.73-1 (Mon, 16 Nov 2020 10:52:16 +0100) x86_64 GNU/Linux

I also get this messages in dmesg

[    3.339577] pci 0000:04:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
[    3.339578] pci 0000:04:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
[    3.339580] pci 0000:04:00.0: BAR 7: no space for [mem size 0x100000000 64bit pref]
[    3.339581] pci 0000:04:00.0: BAR 7: failed to assign [mem size 0x100000000 64bit pref]
[    3.339583] pci 0000:04:00.0: BAR 9: assigned [mem 0xb4400000-0xb83fffff 64bit pref]
[    3.339587] pci 0000:04:00.0: BAR 12: no space for [mem size 0x04000000]
[    3.339588] pci 0000:04:00.0: BAR 12: failed to assign [mem size 0x04000000]
[    3.339589] pci 0000:04:00.0: BAR 2: assigned [mem 0xb4200000-0xb43fffff 64bit pref]
[    3.339597] pci 0000:04:00.0: BAR 5: no space for [mem size 0x00040000]
[    3.339598] pci 0000:04:00.0: BAR 5: failed to assign [mem size 0x00040000]
[    3.339600] pci 0000:04:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
[    3.339602] pci 0000:04:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
[    3.339603] pci 0000:04:00.0: BAR 2: assigned [mem 0xb4200000-0xb43fffff 64bit pref]
[    3.339611] pci 0000:04:00.0: BAR 5: assigned [mem 0xb4400000-0xb443ffff]
[    3.339615] pci 0000:04:00.0: BAR 12: no space for [mem size 0x04000000]
[    3.339616] pci 0000:04:00.0: BAR 12: failed to assign [mem size 0x04000000]
[    3.339617] pci 0000:04:00.0: BAR 9: no space for [mem size 0x04000000 64bit pref]
[    3.339618] pci 0000:04:00.0: BAR 9: failed to assign [mem size 0x04000000 64bit pref]
[    3.339620] pci 0000:04:00.0: BAR 7: no space for [mem size 0x100000000 64bit pref]
[    3.339621] pci 0000:04:00.0: BAR 7: failed to assign [mem size 0x100000000 64bit pref]

I'am using GIM from https://github.com/kasperlewau/MxGPU-Virtualization.