GPUOpen-LibrariesAndSDKs / MxGPU-Virtualization

MIT License
187 stars 87 forks source link

GIM initialization error: "Failed to INIT PF for initial register 'init-state'" #40

Open Wr0ngName opened 3 years ago

Wr0ngName commented 3 years ago

Hello,

We have been trying for months now to get our AMD Firepro S7150x2 working with Linux. Getting new motherboard and digging into UEFI capabilities. Now I have a better understanding of what has been wrong. But the difficulties we kept encountering could not be overcome.

Here is the setup:

Prerequisites (BIOS Setup, Hardware,..) listed on the VMware/Citrix deployment guides have been completed. Motherboard is set to use UEFI with AMD-v, IOMMU, ARI, memory mapping >4GB.

Both GPU and the 32 virtual ones are visible using LSPCI. Despite having blacklisted other modules, GIM is not loaded at startup probably because of the error. Its output from dmesg:

[  423.323272] gim: loading out-of-tree module taints kernel.
[  423.323559] gim: module verification failed: signature and/or required key missing - tainting kernel
[  423.324700] gim info:(gim_init:149) Start AMD open source GIM initialization
[  423.324701] gim info:(gim_init:151) GPU IOV MODULE - version 0.0
[  423.324702] gim info:(gim_init:154) Copyright (c) 2014-2017 Advanced Micro Devices, Inc. All rights reserved.
[  423.325204] gim info:(parse_config_file:217) AMD GIM fb_option = 0
[  423.325205] gim info:(parse_config_file:217) AMD GIM sched_option = 0
[  423.325206] gim info:(parse_config_file:217) AMD GIM vf_num = 0
[  423.325207] gim info:(parse_config_file:217) AMD GIM pf_fb = 0
[  423.325208] gim info:(parse_config_file:217) AMD GIM vf_fb = 0
[  423.325209] gim info:(parse_config_file:217) AMD GIM sched_interval = 0
[  423.325210] gim info:(parse_config_file:217) AMD GIM sched_interval_us = 0
[  423.325210] gim info:(parse_config_file:217) AMD GIM fb_clear = 0
[  423.325212] gim info:(init_config:341) INIT CONFIG
[  423.332074] gim info:(set_new_adapter:572) curr allocated at 000000004a1b0a82
[  423.332076] gim info:(set_new_adapter:579) SRIOV is supported
[  423.332078] gim info:(set_new_adapter:587) found PCI bridge device
[  423.332079] gim info:(set_new_adapter:588) found: 09:8.0
[  423.332102] gim info:(set_new_adapter:608) mmio_base = 000000003eaa75fe
[  423.332145] gim info:(set_new_adapter:610) doorbell = 00000000d82889b5
[  423.332150] gim info:(set_new_adapter:612) pf.fb_va = 00000000d6a506c9
[  423.332167] gim info:(sriov_is_ari_enabled:164) PCI_SRIOV_CAP = 0x00000002
[  423.332168] gim info:(sriov_is_ari_enabled:174) PCI_SRIOV_CTRL = 0x00000010
[  423.332169] gim info:(sriov_is_ari_enabled:177) PCI_SRIOV_CTRL_ARI is set --> ARI is supported
[  423.332172] gim info:(program_ari_mode:441) Read bif_strap8 = 0x00200004
[  423.332172] gim info:(program_ari_mode:446) program_ari_mode - Set ARI_Mode = PF_BUS
[  423.332173] gim info:(program_ari_mode:456) Write bif_strap8 = 0x00000004
[  423.332174] gim info:(gim_read_rom_from_reg:181) Reading VBios from ROM
[  423.332331] gim info:(gim_read_vbios:243) VBIOS starts:  0x55, 0xaa
[  423.332332] gim info:(gim_read_vbios:246) VBios size is 0x10000
[  423.332340] gim info:(gim_read_vbios:249) vbios allocated at 0000000060fb6aa2
[  423.332341] gim info:(gim_read_rom_from_reg:181) Reading VBios from ROM
[  423.514787] gim info:(gim_read_vbios:255) BIOS Version Major 0xF Minor 0x31
[  423.514847] gim info:(gim_read_vbios:270) Valid video BIOS image, 
[  423.514848] gim info:(gim_read_vbios:271) size = 0x10000, check sum is 0x541300
[  423.514851] gim info:(set_new_adapter:661) Scheduler Time interval is per-vf from XL
[  423.514851] gim info:(set_new_adapter:662) config file
[  423.514852] gim info:(enable_sriov:298) Enable SRIOV
[  423.514853] gim info:(enable_sriov:299) Enable SRIOV vfs count = 16
[  423.620462] pci 0000:0a:02.0: [1002:692f] type 00 class 0x030000
[  423.620480] pci 0000:0a:02.0: enabling Extended Tags
[  423.620626] pci 0000:0a:02.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  423.620828] pci 0000:0a:02.0: Adding to iommu group 33
[  423.620903] pci 0000:0a:02.1: [1002:692f] type 00 class 0x030000
[  423.620918] pci 0000:0a:02.1: enabling Extended Tags
[  423.621029] pci 0000:0a:02.1: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  423.621228] pci 0000:0a:02.1: Adding to iommu group 34
[  423.621264] pci 0000:0a:02.2: [1002:692f] type 00 class 0x030000
[  423.621279] pci 0000:0a:02.2: enabling Extended Tags
[  423.621389] pci 0000:0a:02.2: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  423.621571] pci 0000:0a:02.2: Adding to iommu group 35
[  423.621605] pci 0000:0a:02.3: [1002:692f] type 00 class 0x030000
[  423.621619] pci 0000:0a:02.3: enabling Extended Tags
[  423.621726] pci 0000:0a:02.3: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  423.621921] pci 0000:0a:02.3: Adding to iommu group 36
[  423.621956] pci 0000:0a:02.4: [1002:692f] type 00 class 0x030000
[  423.621971] pci 0000:0a:02.4: enabling Extended Tags
[  423.622084] pci 0000:0a:02.4: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  423.622283] pci 0000:0a:02.4: Adding to iommu group 37
[  423.622321] pci 0000:0a:02.5: [1002:692f] type 00 class 0x030000
[  423.622336] pci 0000:0a:02.5: enabling Extended Tags
[  423.622441] pci 0000:0a:02.5: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  423.622644] pci 0000:0a:02.5: Adding to iommu group 38
[  423.622679] pci 0000:0a:02.6: [1002:692f] type 00 class 0x030000
[  423.622694] pci 0000:0a:02.6: enabling Extended Tags
[  423.622827] pci 0000:0a:02.6: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  423.623027] pci 0000:0a:02.6: Adding to iommu group 39
[  423.623065] pci 0000:0a:02.7: [1002:692f] type 00 class 0x030000
[  423.623080] pci 0000:0a:02.7: enabling Extended Tags
[  423.623206] pci 0000:0a:02.7: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  423.623423] pci 0000:0a:02.7: Adding to iommu group 40
[  423.623479] pci 0000:0a:03.0: [1002:692f] type 00 class 0x030000
[  423.623497] pci 0000:0a:03.0: enabling Extended Tags
[  423.623646] pci 0000:0a:03.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  423.623864] pci 0000:0a:03.0: Adding to iommu group 41
[  423.623915] pci 0000:0a:03.1: [1002:692f] type 00 class 0x030000
[  423.623933] pci 0000:0a:03.1: enabling Extended Tags
[  423.624043] pci 0000:0a:03.1: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  423.624264] pci 0000:0a:03.1: Adding to iommu group 42
[  423.624329] pci 0000:0a:03.2: [1002:692f] type 00 class 0x030000
[  423.624344] pci 0000:0a:03.2: enabling Extended Tags
[  423.624600] pci 0000:0a:03.2: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  423.624832] pci 0000:0a:03.2: Adding to iommu group 43
[  423.624887] pci 0000:0a:03.3: [1002:692f] type 00 class 0x030000
[  423.624902] pci 0000:0a:03.3: enabling Extended Tags
[  423.625085] pci 0000:0a:03.3: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  423.625314] pci 0000:0a:03.3: Adding to iommu group 44
[  423.625353] pci 0000:0a:03.4: [1002:692f] type 00 class 0x030000
[  423.625368] pci 0000:0a:03.4: enabling Extended Tags
[  423.625545] pci 0000:0a:03.4: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  423.625755] pci 0000:0a:03.4: Adding to iommu group 45
[  423.625803] pci 0000:0a:03.5: [1002:692f] type 00 class 0x030000
[  423.625819] pci 0000:0a:03.5: enabling Extended Tags
[  423.625990] pci 0000:0a:03.5: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  423.626204] pci 0000:0a:03.5: Adding to iommu group 46
[  423.626244] pci 0000:0a:03.6: [1002:692f] type 00 class 0x030000
[  423.626259] pci 0000:0a:03.6: enabling Extended Tags
[  423.626477] pci 0000:0a:03.6: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  423.626716] pci 0000:0a:03.6: Adding to iommu group 47
[  423.626783] pci 0000:0a:03.7: [1002:692f] type 00 class 0x030000
[  423.626800] pci 0000:0a:03.7: enabling Extended Tags
[  423.626983] pci 0000:0a:03.7: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  423.627204] pci 0000:0a:03.7: Adding to iommu group 48
[  423.627308] gim info:(enumerate_vfs:122) vf found: 0a:2.0
[  423.627343] gim info:(enumerate_vfs:122) vf found: 0a:2.1
[  423.627376] gim info:(enumerate_vfs:122) vf found: 0a:2.2
[  423.627408] gim info:(enumerate_vfs:122) vf found: 0a:2.3
[  423.627443] gim info:(enumerate_vfs:122) vf found: 0a:2.4
[  423.627475] gim info:(enumerate_vfs:122) vf found: 0a:2.5
[  423.627507] gim info:(enumerate_vfs:122) vf found: 0a:2.6
[  423.627540] gim info:(enumerate_vfs:122) vf found: 0a:2.7
[  423.627572] gim info:(enumerate_vfs:122) vf found: 0a:3.0
[  423.627606] gim info:(enumerate_vfs:122) vf found: 0a:3.1
[  423.627638] gim info:(enumerate_vfs:122) vf found: 0a:3.2
[  423.627671] gim info:(enumerate_vfs:122) vf found: 0a:3.3
[  423.627703] gim info:(enumerate_vfs:122) vf found: 0a:3.4
[  423.627735] gim info:(enumerate_vfs:122) vf found: 0a:3.5
[  423.627768] gim info:(enumerate_vfs:122) vf found: 0a:3.6
[  423.627803] gim info:(enumerate_vfs:122) vf found: 0a:3.7
[  423.627908] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0a:2.0
[  423.627911] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  423.627918] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  423.627991] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0a:2.1
[  423.627993] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  423.628000] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  423.628038] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0a:2.2
[  423.628041] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  423.628048] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  423.628087] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0a:2.3
[  423.628090] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  423.628097] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  423.628136] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0a:2.4
[  423.628139] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  423.628145] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  423.628188] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0a:2.5
[  423.628190] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  423.628197] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  423.628237] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0a:2.6
[  423.628239] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  423.628246] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  423.628283] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0a:2.7
[  423.628286] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  423.628293] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  423.628330] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0a:3.0
[  423.628333] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  423.628341] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  423.628381] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0a:3.1
[  423.628384] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  423.628391] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  423.628473] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0a:3.2
[  423.628481] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  423.628488] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  423.628536] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0a:3.3
[  423.628539] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  423.628546] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  423.628590] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0a:3.4
[  423.628594] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  423.628601] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  423.628646] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0a:3.5
[  423.628649] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  423.628659] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  423.628701] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0a:3.6
[  423.628704] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  423.628713] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  423.628755] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0a:3.7
[  423.628758] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  423.628766] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  423.628790] gim info:(pci_gpu_iov_init:87) total_fb_available = 8190

[  423.628792] gim info:(pci_gpu_iov_init:88) AMD GIM pci_gpu_iov_init pos = 400
[  423.628793] gim info:(pci_gpu_iov_init:89) AMD GIM pci_gpu_iov_init total_fb_available = 1ffe
[  423.628794] gim info:(init_frame_buffer_partition:218) PCI defined PF FB size = 16 MB
[  423.628795] gim info:(init_frame_buffer_partition:222) PCI defined VF FB size = 256 MB
[  423.628796] gim info:(init_frame_buffer_partition:226) Total FB Available = 8190 MB, CSA = 8 MB
[  423.628797] gim info:(init_frame_buffer_partition:228) Max Remaining FB Size = 8160
[  423.628798] gim info:(init_frame_buffer_partition:240) PF FB size after checking limits from config file = 16MB
[  423.628799] gim info:(init_frame_buffer_partition:243) PF rounded down to nearest 16MB boundary = 16
[  423.628800] gim info:(init_pf_fb:59) total framebuffer available = 1ffe
[  423.628801] gim info:(init_pf_fb:61) pf framebuffer = 10
[  423.628802] gim info:(init_pf_fb:62) total framebuffer consumed = 1fee
[  423.628804] gim info:(init_frame_buffer_partition:251) CSA starts at offset 16MB
[  423.628805] gim info:(init_context_save_area:41) AMD GIM init_context_save_area: base =10 size=1.
[  423.628807] gim info:(init_frame_buffer_partition:257) VF FB base = 32MB (16 + 16)
[  423.628808] gim info:(init_frame_buffer_partition:261) VF FB Size = 8144MB (8160 - 16)
[  423.628810] gim info:(init_fb_static:145) AMD GIM init_fb_static: num_vf = 16, base= 32, total_size=8144, min_size=256
[  423.628811] gim info:(init_fb_static:166) AMD GIM init_fb_static: vf_fb_size = 496, base= 32
[  423.628812] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 0 base =32,size= 496
[  423.628815] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 1 base =528,size= 496
[  423.628817] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 2 base =1024,size= 496
[  423.628819] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 3 base =1520,size= 496
[  423.628821] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 4 base =2016,size= 496
[  423.628823] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 5 base =2512,size= 496
[  423.628825] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 6 base =3008,size= 496
[  423.628827] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 7 base =3504,size= 496
[  423.628829] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 8 base =4000,size= 496
[  423.628831] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 9 base =4496,size= 496
[  423.628833] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 10 base =4992,size= 496
[  423.628835] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 11 base =5488,size= 496
[  423.628837] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 12 base =5984,size= 496
[  423.628839] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 13 base =6480,size= 496
[  423.628841] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 14 base =6976,size= 496
[  423.628843] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 15 base =7472,size= 496
[  423.628863] gim info:(set_new_adapter:707) enable MSI
[  423.628932] gim info:(ih_iv_ring_disable:383) disable iv ring successfully
[  423.628933] gim info:(alloc_iv_ring:99) ih->ivr_num_entries = 256
[  423.628934] gim info:(alloc_iv_ring:102) ih->ivr_size_in_bytes = 4096
[  423.628934] gim info:(alloc_iv_ring:107) ih->ivr_alloc_size_in_bytes = 4100
[  423.628935] gim info:(alloc_iv_ring:110) iv ring page_cnt = 2
[  423.628940] gim info:(alloc_iv_ring:141) ih->ivr_va = 0000000007ef87cd
[  423.628941] gim info:(alloc_iv_ring:147) ih->ivr_ma.quad_part = 0x17b3c16000
[  423.628941] gim info:(alloc_iv_ring:151) ih->ivr_wptr_wb = 000000002918af4a
[  423.628942] gim info:(alloc_iv_ring:157) ih->ivr_wptr_wa.quad_part = 0x17b18e5000
[  423.628943] gim info:(alloc_iv_ring:163) update rptr via doorbell
[  423.628943] gim info:(ih_iv_ring_init:291) ih->rptr_doorbell = 00000000f6d73993
[  423.628944] gim info:(ih_iv_ring_init:292) ih->rptr_doorbell_offset = 0x1e8
[  423.628947] gim info:(ih_iv_ring_hw_init:184) the physical address of ring buffer: 0x17b3c160
[  423.628959] gim info:(ih_iv_ring_setup_rptr:450) write mmBIF_DOORBELL_APER_EN: 0x1
[  423.628961] gim info:(ih_iv_ring_enable:350) ih->ivr_wptr_reg = 0x0
[  423.628961] gim info:(ih_iv_ring_enable:352) ih->ivr_wptr = 0
[  423.628962] gim info:(ih_iv_ring_enable:354) ih->ivr_rptr_reg = 0x0
[  423.628962] gim info:(ih_iv_ring_enable:356) ih->ivr_rptr = 0
[  423.628964] gim info:(ih_iv_ring_enable:358) *(ih->rptr_doorbell) = 0x0
[  423.628966] gim info:(ih_iv_ring_init:299) init iv ring successfully
[  423.628967] gim info:(set_new_adapter:720) init work
[  423.628967] gim info:(set_new_adapter:726) register interrupt
[  423.628997] gim info:(ih_irq_source_enable:583) IH: read 0x00000000 from mask_reg 0x14d1
[  423.628998] gim info:(ih_irq_source_enable:589) IH: write 0x00000001 to mask_reg 0x14d1
[  423.628999] gim info:(ih_irq_source_enable:592) irq sourceID 0x89 get enabled
[  423.629001] gim info:(ih_irq_source_enable:583) IH: read 0x00000001 from mask_reg 0x14d1
[  423.629001] gim info:(ih_irq_source_enable:589) IH: write 0x00000003 to mask_reg 0x14d1
[  423.629002] gim info:(ih_irq_source_enable:592) irq sourceID 0x88 get enabled
[  426.629156] gim error:(wait_cmd_complete:1683)  wait_cmd_complete -- time out after 3.000034937 sec
[  426.629203] gim error:(wait_cmd_complete:1692)   Cmd = 0x17, Status = 0x0
[  426.629234] gim error:(dump_gpu_status:1417) **** dump gpu status begin for struct adapter 10:00.00
[  426.629278] gim info:(check_base_addrs:1408) CP_MQD_BASE_ADDR = 0x0:00000000
[  426.629314] gim error:(dump_gpu_status:1457)  mmGRBM_STATUS = 0x3028
[  426.629344] gim error:(dump_gpu_status:1460)  mmGRBM_STATUS2 = 0x8
[  426.629372] gim error:(dump_gpu_status:1463)  mmSRBM_STATUS = 0x20000040
[  426.629403] gim error:(dump_gpu_status:1466)  mmSRBM_STATUS2 = 0x0
[  426.629432] gim error:(dump_gpu_status:1469)  mmSDMA0_STATUS_REG = 0x46dee557
[  426.629466] gim error:(dump_gpu_status:1472)  mmSDMA1_STATUS_REG = 0x46dee557
[  426.629507] gim info:(check_me_cntl:1386) CP_ME_CNTL = 0x15000000 GPU dump
[  426.629508] gim error:(check_me_cntl:1388)   ME HALTED!
[  426.629531] gim error:(check_me_cntl:1391)   PFP HALTED!
[  426.629554] gim error:(check_me_cntl:1394)   CE HALTED!
[  426.629578] gim error:(dump_gpu_status:1588) **** dump gpu status end
[  426.629608] gim error:(init_register_init_state:3641) Failed to INIT PF for initial register 'init-state'
[  426.629616] gim info:(gim_clear_all_errors:357) PCIE cap pos 58
[  426.629661] gim info:(gim_clear_all_errors:362) AER ext cap pos 150
[  426.629663] gim info:(gim_clear_all_errors:369) DevStatus = 0x9
[  426.629667] gim info:(gim_clear_all_errors:387) PCIE unrecoverable error = 0x2000
[  426.629678] gim info:(resume_scheduler:136) No functions on the runlist.
[  426.629679] gim info:(resume_scheduler:137) Don't need to restart the scheduler
[  426.629680] gim info:(gim_probe:91) AMD GIM probe: pf_count = 1
[  426.629681] gim info:(set_new_adapter:572) curr allocated at 000000008b6e4aca
[  426.629682] gim info:(set_new_adapter:579) SRIOV is supported
[  426.629686] gim info:(set_new_adapter:587) found PCI bridge device
[  426.629687] gim info:(set_new_adapter:588) found: 09:10.0
[  426.629716] gim info:(set_new_adapter:608) mmio_base = 00000000277ee992
[  426.629724] gim info:(set_new_adapter:610) doorbell = 000000002f1729d2
[  426.629732] gim info:(set_new_adapter:612) pf.fb_va = 0000000015bb8b4b
[  426.629750] gim info:(sriov_is_ari_enabled:164) PCI_SRIOV_CAP = 0x00000002
[  426.629752] gim info:(sriov_is_ari_enabled:174) PCI_SRIOV_CTRL = 0x00000010
[  426.629753] gim info:(sriov_is_ari_enabled:177) PCI_SRIOV_CTRL_ARI is set --> ARI is supported
[  426.629756] gim info:(program_ari_mode:441) Read bif_strap8 = 0x00200004
[  426.629761] gim info:(program_ari_mode:446) program_ari_mode - Set ARI_Mode = PF_BUS
[  426.629768] gim info:(program_ari_mode:456) Write bif_strap8 = 0x00000004
[  426.629772] gim info:(gim_read_rom_from_reg:181) Reading VBios from ROM
[  426.629933] gim info:(gim_read_vbios:243) VBIOS starts:  0x55, 0xaa
[  426.629934] gim info:(gim_read_vbios:246) VBios size is 0x10000
[  426.629949] gim info:(gim_read_vbios:249) vbios allocated at 0000000058607f70
[  426.629950] gim info:(gim_read_rom_from_reg:181) Reading VBios from ROM
[  426.812947] gim info:(gim_read_vbios:255) BIOS Version Major 0xF Minor 0x31
[  426.812977] gim info:(gim_read_vbios:270) Valid video BIOS image, 
[  426.812978] gim info:(gim_read_vbios:271) size = 0x10000, check sum is 0x541300
[  426.812980] gim info:(set_new_adapter:661) Scheduler Time interval is per-vf from XL
[  426.812980] gim info:(set_new_adapter:662) config file
[  426.812981] gim info:(enable_sriov:298) Enable SRIOV
[  426.812982] gim info:(enable_sriov:299) Enable SRIOV vfs count = 16
[  426.920482] pci 0000:0c:02.0: [1002:692f] type 00 class 0x030000
[  426.920500] pci 0000:0c:02.0: enabling Extended Tags
[  426.920648] pci 0000:0c:02.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  426.920863] pci 0000:0c:02.0: Adding to iommu group 49
[  426.920929] pci 0000:0c:02.1: [1002:692f] type 00 class 0x030000
[  426.920944] pci 0000:0c:02.1: enabling Extended Tags
[  426.921054] pci 0000:0c:02.1: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  426.921229] pci 0000:0c:02.1: Adding to iommu group 50
[  426.921266] pci 0000:0c:02.2: [1002:692f] type 00 class 0x030000
[  426.921281] pci 0000:0c:02.2: enabling Extended Tags
[  426.921393] pci 0000:0c:02.2: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  426.921593] pci 0000:0c:02.2: Adding to iommu group 51
[  426.921630] pci 0000:0c:02.3: [1002:692f] type 00 class 0x030000
[  426.921644] pci 0000:0c:02.3: enabling Extended Tags
[  426.921757] pci 0000:0c:02.3: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  426.921943] pci 0000:0c:02.3: Adding to iommu group 52
[  426.921979] pci 0000:0c:02.4: [1002:692f] type 00 class 0x030000
[  426.921994] pci 0000:0c:02.4: enabling Extended Tags
[  426.922109] pci 0000:0c:02.4: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  426.922310] pci 0000:0c:02.4: Adding to iommu group 53
[  426.922348] pci 0000:0c:02.5: [1002:692f] type 00 class 0x030000
[  426.922363] pci 0000:0c:02.5: enabling Extended Tags
[  426.922515] pci 0000:0c:02.5: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  426.922708] pci 0000:0c:02.5: Adding to iommu group 54
[  426.922743] pci 0000:0c:02.6: [1002:692f] type 00 class 0x030000
[  426.922758] pci 0000:0c:02.6: enabling Extended Tags
[  426.922879] pci 0000:0c:02.6: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  426.923075] pci 0000:0c:02.6: Adding to iommu group 55
[  426.923116] pci 0000:0c:02.7: [1002:692f] type 00 class 0x030000
[  426.923131] pci 0000:0c:02.7: enabling Extended Tags
[  426.923240] pci 0000:0c:02.7: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  426.923440] pci 0000:0c:02.7: Adding to iommu group 56
[  426.923475] pci 0000:0c:03.0: [1002:692f] type 00 class 0x030000
[  426.923492] pci 0000:0c:03.0: enabling Extended Tags
[  426.923622] pci 0000:0c:03.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  426.923854] pci 0000:0c:03.0: Adding to iommu group 57
[  426.923889] pci 0000:0c:03.1: [1002:692f] type 00 class 0x030000
[  426.923904] pci 0000:0c:03.1: enabling Extended Tags
[  426.924026] pci 0000:0c:03.1: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  426.924223] pci 0000:0c:03.1: Adding to iommu group 58
[  426.924265] pci 0000:0c:03.2: [1002:692f] type 00 class 0x030000
[  426.924281] pci 0000:0c:03.2: enabling Extended Tags
[  426.924403] pci 0000:0c:03.2: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  426.924622] pci 0000:0c:03.2: Adding to iommu group 59
[  426.924663] pci 0000:0c:03.3: [1002:692f] type 00 class 0x030000
[  426.924680] pci 0000:0c:03.3: enabling Extended Tags
[  426.924808] pci 0000:0c:03.3: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  426.925017] pci 0000:0c:03.3: Adding to iommu group 60
[  426.925057] pci 0000:0c:03.4: [1002:692f] type 00 class 0x030000
[  426.925072] pci 0000:0c:03.4: enabling Extended Tags
[  426.925213] pci 0000:0c:03.4: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  426.925423] pci 0000:0c:03.4: Adding to iommu group 61
[  426.925464] pci 0000:0c:03.5: [1002:692f] type 00 class 0x030000
[  426.925479] pci 0000:0c:03.5: enabling Extended Tags
[  426.925604] pci 0000:0c:03.5: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  426.925807] pci 0000:0c:03.5: Adding to iommu group 62
[  426.925843] pci 0000:0c:03.6: [1002:692f] type 00 class 0x030000
[  426.925859] pci 0000:0c:03.6: enabling Extended Tags
[  426.926057] pci 0000:0c:03.6: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  426.926263] pci 0000:0c:03.6: Adding to iommu group 63
[  426.926305] pci 0000:0c:03.7: [1002:692f] type 00 class 0x030000
[  426.926321] pci 0000:0c:03.7: enabling Extended Tags
[  426.926474] pci 0000:0c:03.7: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  426.926680] pci 0000:0c:03.7: Adding to iommu group 64
[  426.927276] gim info:(enumerate_vfs:122) vf found: 0c:2.0
[  426.927308] gim info:(enumerate_vfs:122) vf found: 0c:2.1
[  426.927344] gim info:(enumerate_vfs:122) vf found: 0c:2.2
[  426.927375] gim info:(enumerate_vfs:122) vf found: 0c:2.3
[  426.927408] gim info:(enumerate_vfs:122) vf found: 0c:2.4
[  426.927444] gim info:(enumerate_vfs:122) vf found: 0c:2.5
[  426.927476] gim info:(enumerate_vfs:122) vf found: 0c:2.6
[  426.927507] gim info:(enumerate_vfs:122) vf found: 0c:2.7
[  426.927539] gim info:(enumerate_vfs:122) vf found: 0c:3.0
[  426.927571] gim info:(enumerate_vfs:122) vf found: 0c:3.1
[  426.927602] gim info:(enumerate_vfs:122) vf found: 0c:3.2
[  426.927634] gim info:(enumerate_vfs:122) vf found: 0c:3.3
[  426.927669] gim info:(enumerate_vfs:122) vf found: 0c:3.4
[  426.927702] gim info:(enumerate_vfs:122) vf found: 0c:3.5
[  426.927734] gim info:(enumerate_vfs:122) vf found: 0c:3.6
[  426.927766] gim info:(enumerate_vfs:122) vf found: 0c:3.7
[  426.927823] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0c:2.0
[  426.927826] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  426.927833] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  426.927871] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0c:2.1
[  426.927874] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  426.927880] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  426.927916] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0c:2.2
[  426.927919] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  426.927926] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  426.927965] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0c:2.3
[  426.927967] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  426.927975] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  426.928013] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0c:2.4
[  426.928015] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  426.928022] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  426.928059] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0c:2.5
[  426.928062] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  426.928069] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  426.928105] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0c:2.6
[  426.928108] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  426.928115] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  426.928151] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0c:2.7
[  426.928155] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  426.928162] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  426.928201] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0c:3.0
[  426.928203] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  426.928210] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  426.928246] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0c:3.1
[  426.928249] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  426.928255] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  426.928291] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0c:3.2
[  426.928294] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  426.928300] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  426.928336] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0c:3.3
[  426.928339] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  426.928346] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  426.928381] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0c:3.4
[  426.928384] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  426.928391] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  426.928426] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0c:3.5
[  426.928429] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  426.928449] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  426.928488] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0c:3.6
[  426.928494] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  426.928504] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  426.928542] gim info:(pci_disable_error_reporting:731) Disable error reporting for device: 0c:3.7
[  426.928545] gim info:(pci_disable_error_reporting:738) Mask before -> corr = 0x00000000, uncorr = 0x00000000
[  426.928552] gim info:(pci_disable_error_reporting:749) Mask after -> corr = 0x00000000, uncorr = 0x00000000
[  426.928573] gim info:(pci_gpu_iov_init:87) total_fb_available = 8190

[  426.928574] gim info:(pci_gpu_iov_init:88) AMD GIM pci_gpu_iov_init pos = 400
[  426.928575] gim info:(pci_gpu_iov_init:89) AMD GIM pci_gpu_iov_init total_fb_available = 1ffe
[  426.928576] gim info:(init_frame_buffer_partition:218) PCI defined PF FB size = 16 MB
[  426.928577] gim info:(init_frame_buffer_partition:222) PCI defined VF FB size = 256 MB
[  426.928578] gim info:(init_frame_buffer_partition:226) Total FB Available = 8190 MB, CSA = 8 MB
[  426.928579] gim info:(init_frame_buffer_partition:228) Max Remaining FB Size = 8160
[  426.928580] gim info:(init_frame_buffer_partition:240) PF FB size after checking limits from config file = 16MB
[  426.928580] gim info:(init_frame_buffer_partition:243) PF rounded down to nearest 16MB boundary = 16
[  426.928581] gim info:(init_pf_fb:59) total framebuffer available = 1ffe
[  426.928582] gim info:(init_pf_fb:61) pf framebuffer = 10
[  426.928583] gim info:(init_pf_fb:62) total framebuffer consumed = 1fee
[  426.928584] gim info:(init_frame_buffer_partition:251) CSA starts at offset 16MB
[  426.928585] gim info:(init_context_save_area:41) AMD GIM init_context_save_area: base =10 size=1.
[  426.928587] gim info:(init_frame_buffer_partition:257) VF FB base = 32MB (16 + 16)
[  426.928588] gim info:(init_frame_buffer_partition:261) VF FB Size = 8144MB (8160 - 16)
[  426.928589] gim info:(init_fb_static:145) AMD GIM init_fb_static: num_vf = 16, base= 32, total_size=8144, min_size=256
[  426.928590] gim info:(init_fb_static:166) AMD GIM init_fb_static: vf_fb_size = 496, base= 32
[  426.928591] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 0 base =32,size= 496
[  426.928593] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 1 base =528,size= 496
[  426.928594] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 2 base =1024,size= 496
[  426.928597] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 3 base =1520,size= 496
[  426.928603] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 4 base =2016,size= 496
[  426.928613] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 5 base =2512,size= 496
[  426.928619] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 6 base =3008,size= 496
[  426.928624] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 7 base =3504,size= 496
[  426.928630] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 8 base =4000,size= 496
[  426.928635] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 9 base =4496,size= 496
[  426.928640] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 10 base =4992,size= 496
[  426.928646] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 11 base =5488,size= 496
[  426.928651] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 12 base =5984,size= 496
[  426.928653] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 13 base =6480,size= 496
[  426.928655] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 14 base =6976,size= 496
[  426.928656] gim info:(init_fb_static:176) AMD GIM init_fb_static: partition 15 base =7472,size= 496
[  426.928671] gim info:(set_new_adapter:707) enable MSI
[  426.928726] gim info:(ih_iv_ring_disable:383) disable iv ring successfully
[  426.928727] gim info:(alloc_iv_ring:99) ih->ivr_num_entries = 256
[  426.928728] gim info:(alloc_iv_ring:102) ih->ivr_size_in_bytes = 4096
[  426.928728] gim info:(alloc_iv_ring:107) ih->ivr_alloc_size_in_bytes = 4100
[  426.928729] gim info:(alloc_iv_ring:110) iv ring page_cnt = 2
[  426.928733] gim info:(alloc_iv_ring:141) ih->ivr_va = 00000000b2889390
[  426.928734] gim info:(alloc_iv_ring:147) ih->ivr_ma.quad_part = 0x17bf985000
[  426.928735] gim info:(alloc_iv_ring:151) ih->ivr_wptr_wb = 00000000dce433d1
[  426.928736] gim info:(alloc_iv_ring:157) ih->ivr_wptr_wa.quad_part = 0x17bf986000
[  426.928737] gim info:(alloc_iv_ring:163) update rptr via doorbell
[  426.928737] gim info:(ih_iv_ring_init:291) ih->rptr_doorbell = 0000000064604923
[  426.928738] gim info:(ih_iv_ring_init:292) ih->rptr_doorbell_offset = 0x1e8
[  426.928741] gim info:(ih_iv_ring_hw_init:184) the physical address of ring buffer: 0x17bf9850
[  426.928753] gim info:(ih_iv_ring_setup_rptr:450) write mmBIF_DOORBELL_APER_EN: 0x1
[  426.928755] gim info:(ih_iv_ring_enable:350) ih->ivr_wptr_reg = 0x0
[  426.928756] gim info:(ih_iv_ring_enable:352) ih->ivr_wptr = 0
[  426.928756] gim info:(ih_iv_ring_enable:354) ih->ivr_rptr_reg = 0x0
[  426.928757] gim info:(ih_iv_ring_enable:356) ih->ivr_rptr = 0
[  426.928759] gim info:(ih_iv_ring_enable:358) *(ih->rptr_doorbell) = 0x0
[  426.928761] gim info:(ih_iv_ring_init:299) init iv ring successfully
[  426.928761] gim info:(set_new_adapter:720) init work
[  426.928762] gim info:(set_new_adapter:726) register interrupt
[  426.928790] gim info:(ih_irq_source_enable:583) IH: read 0x00000000 from mask_reg 0x14d1
[  426.928790] gim info:(ih_irq_source_enable:589) IH: write 0x00000001 to mask_reg 0x14d1
[  426.928792] gim info:(ih_irq_source_enable:592) irq sourceID 0x89 get enabled
[  426.928794] gim info:(ih_irq_source_enable:583) IH: read 0x00000001 from mask_reg 0x14d1
[  426.928794] gim info:(ih_irq_source_enable:589) IH: write 0x00000003 to mask_reg 0x14d1
[  426.928795] gim info:(ih_irq_source_enable:592) irq sourceID 0x88 get enabled
[  429.928951] gim error:(wait_cmd_complete:1683)  wait_cmd_complete -- time out after 3.000036976 sec
[  429.928999] gim error:(wait_cmd_complete:1692)   Cmd = 0x17, Status = 0x0
[  429.929030] gim error:(dump_gpu_status:1417) **** dump gpu status begin for struct adapter 12:00.00
[  429.929075] gim info:(check_base_addrs:1408) CP_MQD_BASE_ADDR = 0x0:00000000
[  429.929111] gim error:(dump_gpu_status:1457)  mmGRBM_STATUS = 0x3028
[  429.929140] gim error:(dump_gpu_status:1460)  mmGRBM_STATUS2 = 0x8
[  429.929169] gim error:(dump_gpu_status:1463)  mmSRBM_STATUS = 0x20000040
[  429.929200] gim error:(dump_gpu_status:1466)  mmSRBM_STATUS2 = 0x0
[  429.929229] gim error:(dump_gpu_status:1469)  mmSDMA0_STATUS_REG = 0x46dee557
[  429.929262] gim error:(dump_gpu_status:1472)  mmSDMA1_STATUS_REG = 0x46dee557
[  429.929302] gim info:(check_me_cntl:1386) CP_ME_CNTL = 0x15000000 GPU dump
[  429.929303] gim error:(check_me_cntl:1388)   ME HALTED!
[  429.929326] gim error:(check_me_cntl:1391)   PFP HALTED!
[  429.929350] gim error:(check_me_cntl:1394)   CE HALTED!
[  429.929374] gim error:(dump_gpu_status:1588) **** dump gpu status end
[  429.929403] gim error:(init_register_init_state:3641) Failed to INIT PF for initial register 'init-state'
[  429.929412] gim info:(gim_clear_all_errors:357) PCIE cap pos 58
[  429.929456] gim info:(gim_clear_all_errors:362) AER ext cap pos 150
[  429.929458] gim info:(gim_clear_all_errors:369) DevStatus = 0x9
[  429.929462] gim info:(gim_clear_all_errors:387) PCIE unrecoverable error = 0x2000
[  429.929474] gim info:(resume_scheduler:136) No functions on the runlist.
[  429.929474] gim info:(resume_scheduler:137) Don't need to restart the scheduler
[  429.929475] gim info:(gim_probe:91) AMD GIM probe: pf_count = 2

Regarding system, we tried using both Ubuntu server 16.04 and 20.04 (the latter without applying the kernel patch as they have been integrated, but using the fork for kernels 4.5+). Despite the multiple kernels, systems, and setups, the final ouput remains gim error:(init_register_init_state:3641) Failed to INIT PF for initial register 'init-state'.

This is visible until attaching one of the 32 listed virtual cards to the VM and starting it. Then none of the cards is visible anymore via LSPCI. I have read on the internet that it is related to the card not initializing properly using UEFI, but turning on Compatibility Module (CSM), and the motherboard does not allow for booting up. The boot phase hangs when initializing the display, and I haven't been able to overcome the issue...

As we don't have another GPU available, and the CPU does not have an integrated one, we are stuck with the 2060 SUPER otherwise the system has no video output. Swapping the slots did not improve the situation.

Power cables are correctly connected to the power supply and the card shows up correctly in the UEFI settings.

I am still really hopeful as the card is recognized and the system is up and running. But I am not able to dig further into it... I hope you have all the information, but please ask if you need anything else.

Thanks for your time and precious help.

Best regards, Florent

rcmart3q commented 1 year ago

Hello Florent, I am currently digging into this as well, and am also getting the Failed to INIT PF for initial register 'init-state' error on my server. Our system is running the following: Intel(R) Xeon(R) CPU E5-2650 v3 Aspeed AST2400 graphics 32GB DDR4 AMD FirePro s7150 (Not S7150x2)

It has been an absolute brutal path for me to even get to this point as I am trying to do this on a Chinese X99 motherboard.

It does feel like I am getting somewhere as I have found even the smallest of misconfiguration issues can stop this card from working correctly. If I can crack this I will share with you the results so that you and others may also get your systems running.