GPUOpen-LibrariesAndSDKs / MxGPU-Virtualization

MIT License
182 stars 83 forks source link

Host Crash qemu/kvm #13

Closed flumm closed 5 years ago

flumm commented 5 years ago

uname -a: Linux myhost 4.15.18-7-pve #1 SMP PVE 4.15.18-26 (Thu, 04 Oct 2018 11:03:06 +0200) x86_64 GNU/Linux

even though i use kernel 4.15, compiling went without problems a modprobe produces following output though:

[ 230.259767] gim info:(gim_init:149) Start AMD open source GIM initialization [ 230.259770] gim info:(gim_init:152) GPU IOV MODULE - version 0.0 [ 230.259771] gim info:(gim_init:154) Copyright (c) 2014-2018 AMD Corporation. [ 230.260618] gim info:(parse_config_file:219) AMD GIM fb_option = 0 [ 230.260619] gim info:(parse_config_file:219) AMD GIM sched_option = 1 [ 230.260621] gim info:(parse_config_file:219) AMD GIM vf_num = 1 [ 230.260622] gim info:(parse_config_file:219) AMD GIM pf_fb = 256 [ 230.260623] gim info:(parse_config_file:219) AMD GIM vf_fb = 256 [ 230.260625] gim info:(parse_config_file:219) AMD GIM sched_interval = 0 [ 230.260626] gim info:(parse_config_file:219) AMD GIM sched_interval_us = 0 [ 230.260627] gim info:(parse_config_file:219) AMD GIM fb_clear = 0 [ 230.260629] gim info:(init_config:341) INIT CONFIG [ 230.268677] gim info:(set_new_adapter:572) curr allocated at 0000000007ecc9e8 [ 230.268678] gim info:(set_new_adapter:579) SRIOV is supported [ 230.268683] gim info:(set_new_adapter:587) found PCI bridge device [ 230.268684] gim info:(set_new_adapter:591) found: 00:3.0 [ 230.268718] gim info:(set_new_adapter:608) mmio_base = 0000000073a47b6a [ 230.268726] gim info:(set_new_adapter:610) doorbell = 00000000b86d6846 [ 230.268737] gim info:(set_new_adapter:612) pf.fb_va = 000000006a5f6588 [ 230.268746] gim info:(sriov_is_ari_enabled:164) PCI_SRIOV_CAP = 0x00000002 [ 230.268748] gim info:(sriov_is_ari_enabled:174) PCI_SRIOV_CTRL = 0x00000010 [ 230.268749] gim info:(sriov_is_ari_enabled:177) PCI_SRIOV_CTRL_ARI is set --> ARI is supported [ 230.268751] gim info:(program_ari_mode:441) Read bif_strap8 = 0x00200004 [ 230.268752] gim info:(program_ari_mode:446) program_ari_mode - Set ARI_Mode = PF_BUS [ 230.268753] gim info:(program_ari_mode:456) Write bif_strap8 = 0x00000004 [ 230.268754] gim info:(gim_read_rom_from_reg:181) Reading VBios from ROM [ 230.268903] gim info:(gim_read_vbios:243) VBIOS starts: 0x55, 0xaa [ 230.268904] gim info:(gim_read_vbios:246) VBios size is 0x10000 [ 230.268915] gim info:(gim_read_vbios:249) vbios allocated at 0000000011c744ba [ 230.268915] gim info:(gim_read_rom_from_reg:181) Reading VBios from ROM [ 230.441512] gim info:(gim_read_vbios:257) BIOS Version Major 0xF Minor 0x31 [ 230.441561] gim info:(gim_read_vbios:270) Valid video BIOS image, [ 230.441563] gim info:(gim_read_vbios:271) size = 0x10000, check sum is 0x53b100 [ 230.441565] gim info:(set_new_adapter:661) Scheduler Time interval is per-vf from XL [ 230.441566] gim info:(set_new_adapter:662) config file [ 230.441567] gim info:(enable_sriov:299) Enable SRIOV [ 230.441568] gim info:(enable_sriov:300) Enable SRIOV vfs count = 2 [ 230.548564] gim info:(enumerate_vfs:123) vf found: 05:2.0 [ 230.548577] gim info:(enumerate_vfs:123) vf found: 05:2.1 [ 230.548611] gim info:(pci_disable_error_reporting:735) Disable error reporting for device: 05:2.0 [ 230.548613] gim info:(pci_disable_error_reporting:740) Mask before -> corr = 0x00000000, uncorr = 0x00000000 [ 230.548617] gim info:(pci_disable_error_reporting:751) Mask after -> corr = 0x00000000, uncorr = 0x00000000 [ 230.548633] gim info:(pci_disable_error_reporting:735) Disable error reporting for device: 05:2.1 [ 230.548635] gim info:(pci_disable_error_reporting:740) Mask before -> corr = 0x00000000, uncorr = 0x00000000 [ 230.548639] gim info:(pci_disable_error_reporting:751) Mask after -> corr = 0x00000000, uncorr = 0x00000000 [ 230.548648] gim info:(pci_gpu_iov_init:87) total_fb_available = 8190 [ 230.548649] gim info:(pci_gpu_iov_init:88) AMD GIM pci_gpu_iov_init pos = 400 [ 230.548650] gim info:(pci_gpu_iov_init:90) AMD GIM pci_gpu_iov_init total_fb_available = 1ffe [ 230.548652] gim info:(init_frame_buffer_partition:218) PCI defined PF FB size = 256 MB [ 230.548653] gim info:(init_frame_buffer_partition:222) PCI defined VF FB size = 256 MB [ 230.548654] gim info:(init_frame_buffer_partition:227) Total FB Available = 8190 MB, CSA = 8 MB [ 230.548655] gim info:(init_frame_buffer_partition:228) Max Remaining FB Size = 8160 [ 230.548656] gim info:(init_frame_buffer_partition:241) PF FB size after checking limits from config file = 256MB [ 230.548657] gim info:(init_frame_buffer_partition:244) PF rounded down to nearest 16MB boundary = 256 [ 230.548658] gim info:(init_pf_fb:60) total framebuffer available = 1ffe [ 230.548659] gim info:(init_pf_fb:61) pf framebuffer = 100 [ 230.548660] gim info:(init_pf_fb:63) total framebuffer consumed = 1efe [ 230.548661] gim info:(init_frame_buffer_partition:251) CSA starts at offset 256MB [ 230.548663] gim info:(init_context_save_area:42) AMD GIM init_context_save_area: base =100 size=1. [ 230.548664] gim info:(init_frame_buffer_partition:258) VF FB base = 272MB (256 + 16) [ 230.548666] gim info:(init_frame_buffer_partition:262) VF FB Size = 7904MB (8160 - 256) [ 230.548667] gim info:(init_fb_static:146) AMD GIM init_fb_static: num_vf = 2, base= 272, total_size=7904, min_size=256 [ 230.548669] gim info:(init_fb_static:158) VF FB size specified as 256MB, min_size = 256 [ 230.548670] gim info:(init_fb_static:167) AMD GIM init_fb_static: vf_fb_size = 256, base= 272 [ 230.548671] gim info:(init_fb_static:178) AMD GIM init_fb_static: partition 0 base =272,size= 256 [ 230.548673] gim info:(init_fb_static:178) AMD GIM init_fb_static: partition 1 base =528,size= 256 [ 230.548694] gim info:(set_new_adapter:707) enable MSI [ 230.548735] gim info:(ih_iv_ring_disable:383) disable iv ring successfully [ 230.548736] gim info:(alloc_iv_ring:99) ih->ivr_num_entries = 256 [ 230.548737] gim info:(alloc_iv_ring:102) ih->ivr_size_in_bytes = 4096 [ 230.548738] gim info:(alloc_iv_ring:108) ih->ivr_alloc_size_in_bytes = 4100 [ 230.548739] gim info:(alloc_iv_ring:110) iv ring page_cnt = 2 [ 230.548742] gim info:(alloc_iv_ring:141) ih->ivr_va = 0000000006512d73 [ 230.548862] gim info:(alloc_iv_ring:147) ih->ivr_ma.quad_part = 0xfffff000 [ 230.548863] gim info:(alloc_iv_ring:151) ih->ivr_wptr_wb = 000000004c4df8ef [ 230.548867] gim info:(alloc_iv_ring:158) ih->ivr_wptr_wa.quad_part = 0xffffe000 [ 230.548868] gim info:(alloc_iv_ring:163) update rptr via doorbell [ 230.548869] gim info:(ih_iv_ring_init:291) ih->rptr_doorbell = 0000000052363f63 [ 230.548870] gim info:(ih_iv_ring_init:292) ih->rptr_doorbell_offset = 0x1e8 [ 230.548872] gim info:(ih_iv_ring_hw_init:185) the physical address of ring buffer: 0xfffff0 [ 230.548878] gim info:(ih_iv_ring_setup_rptr:451) write mmBIF_DOORBELL_APER_EN: 0x1 [ 230.548880] gim info:(ih_iv_ring_enable:350) ih->ivr_wptr_reg = 0x0 [ 230.548881] gim info:(ih_iv_ring_enable:352) ih->ivr_wptr = 0 [ 230.548882] gim info:(ih_iv_ring_enable:354) ih->ivr_rptr_reg = 0x0 [ 230.548882] gim info:(ih_iv_ring_enable:356) ih->ivr_rptr = 0 [ 230.548884] gim info:(ih_iv_ring_enable:358) *(ih->rptr_doorbell) = 0x0 [ 230.548885] gim info:(ih_iv_ring_init:299) init iv ring successfully [ 230.548886] gim info:(set_new_adapter:720) init work [ 230.548887] gim info:(set_new_adapter:726) register interrupt [ 230.548923] gim info:(ih_irq_source_enable:584) IH: read 0x00000000 from mask_reg 0x14d1 [ 230.548925] gim info:(ih_irq_source_enable:590) IH: write 0x00000001 to mask_reg 0x14d1 [ 230.548926] gim info:(ih_irq_source_enable:593) irq sourceID 0x89 get enabled [ 230.548928] gim info:(ih_irq_source_enable:584) IH: read 0x00000001 from mask_reg 0x14d1 [ 230.548929] gim info:(ih_irq_source_enable:590) IH: write 0x00000003 to mask_reg 0x14d1 [ 230.548930] gim info:(ih_irq_source_enable:593) irq sourceID 0x88 get enabled [ 233.549138] gim error:(wait_cmd_complete:1681) wait_cmd_complete -- time out after 3.000038872 sec [ 233.549181] gim error:(wait_cmd_complete:1688) Cmd = 0x17, Status = 0x0 [ 233.549210] gim error:(dump_gpu_status:1420) dump gpu status begin for struct adapter 5:00.00 [ 233.549248] gim info:(check_base_addrs:1408) CP_MQD_BASE_ADDR = 0x0:00000000 [ 233.549271] gim error:(dump_gpu_status:1457) mmGRBM_STATUS = 0x3028 [ 233.549297] gim error:(dump_gpu_status:1460) mmGRBM_STATUS2 = 0x8 [ 233.549323] gim error:(dump_gpu_status:1463) mmSRBM_STATUS = 0x20000040 [ 233.549351] gim error:(dump_gpu_status:1466) mmSRBM_STATUS2 = 0x0 [ 233.549377] gim error:(dump_gpu_status:1469) mmSDMA0_STATUS_REG = 0x46dee557 [ 233.549406] gim error:(dump_gpu_status:1472) mmSDMA1_STATUS_REG = 0x46dee557 [ 233.549439] gim info:(check_me_cntl:1386) CP_ME_CNTL = 0x15000000 GPU dump [ 233.549440] gim error:(check_me_cntl:1388) ME HALTED! [ 233.549462] gim error:(check_me_cntl:1391) PFP HALTED! [ 233.549484] gim error:(check_me_cntl:1394) CE HALTED! [ 233.549507] gim error:(dump_gpu_status:1588) dump gpu status end [ 233.549534] gim error:(init_register_init_state:3624) Failed to INIT PF for initial register 'init-state' [ 233.549541] gim info:(gim_clear_all_errors:357) PCIE cap pos 58 [ 233.549580] gim info:(gim_clear_all_errors:362) AER ext cap pos 150 [ 233.549581] gim info:(gim_clear_all_errors:369) DevStatus = 0x9 [ 233.549584] gim info:(gim_clear_all_errors:387) PCIE unrecoverable error = 0x2000 [ 233.549589] gim info:(add_func_to_run_list:2459) Add VF0 to the runlist [ 233.549591] gim info:(alloc_fn_list_node:3837) New Function List Node allocated at 00000000b55b812e index 0 [ 233.549592] gim info:(add_func_to_run_list:2459) Add VF1 to the runlist [ 233.549593] gim info:(alloc_fn_list_node:3837) New Function List Node allocated at 00000000ce52204e index 1 [ 233.549595] gim warning:(resume_scheduler:155) Restart the Scheduler for 7.000msec [ 233.549597] gim info:(gim_probe:91) AMD GIM probe: pf_count = 1

especially this:

[ 233.549534] gim error:(init_register_init_state:3624) Failed to INIT PF for initial register 'init-state'

after this, lspci shows the vgpu:

lspci | grep AMD

05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT GL [FirePro S7150] 05:02.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V] 05:02.1 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]

even passing through 05:02.0 works, up to the point where i install the guest driver (win10 x64) then i am greeted with following output in dmesg and the host crashes:

[ 696.091017] gim info:(ih_irq_process:713) AMD ISR is being invoked
[ 696.091019] gim info:(ih_iv_ring_get_pointers:482) ih_iv_ring_get_pointers
[ 696.091021] gim info:(ih_iv_ring_get_pointers:483) ih->ivr_wptr_wb = 0x000000004c4df8ef
[ 696.091023] gim info:(ih_iv_ring_get_pointers:485) write offset: *(ih->ivr_wptr_wb) = 0x10
[ 696.091024] gim info:(ih_iv_ring_get_pointers:486) read idx: ih->ivr_rptr = 0x0
[ 696.091025] gim info:(ih_iv_ring_get_pointers:488) Rx at entry 0 in the ring
[ 696.091026] gim info:(ih_iv_ring_get_pointers:490) iv_ring_entry.source_id = 137
[ 696.091027] gim info:(ih_iv_ring_get_pointers:491) iv_ring_entry.source_data = 0
[ 696.091029] gim info:(ih_irq_process:734) received 1 irqs in one ISR
[ 696.091031] gim info:(ih_irq_process:757) VF_PF_MSGBUF_VALID received(Received msg from VF)
[ 696.091033] gim info:(mailbox_update_index:836) write mmMAILBOX_INDEX: 0x0
[ 696.091037] gim info:(mailbox_msg_rcv:882) read mmMAILBOX_MSGBUF_RCV_DW0: 0x1
[ 696.091038] gim info:(mailbox_msg_rcv:883) read mmMAILBOX_MSGBUF_RCV_DW1: 0
[ 696.091039] gim info:(mailbox_msg_rcv:884) read mmMAILBOX_MSGBUF_RCV_DW2: 0x0
[ 696.091040] gim info:(mailbox_msg_rcv:885) read mmMAILBOX_MSGBUF_RCV_DW3: 0x0
[ 696.091043] gim info:(mailbox_ack_receipt:1008) write mmMAILBOX_CONTROL: 0x300 to MAILBOX_INDEX 0x0
[ 696.091044] gim info:(ih_irq_process:778) GPU access flag = 0x0
[ 696.091046] gim info:(idh_queue:656) new idh: task->event = 1, task->func_id = 0
[ 696.091048] gim info:(mailbox_update_index:836) write mmMAILBOX_INDEX: 0x0
[ 696.091053] gim info:(ih_iv_ring_update_rptr:546) update the new rptr: ih->ivr_rptr_reg = 0x10
[ 696.091054] gim info:(ih_iv_ring_update_rptr:554) update rptr via doorbell: 0x10
[ 696.091055] gim info:(ih_iv_ring_update_rptr:556) current wptr: 0x10
[ 696.091055] gim info:(ih_irq_process:823) AMD ISR is complete
[ 696.091079] gim info:(signal_scheduler:1561) Invoked the task scheduler thread. Process IRQ activity
[ 696.091080] gim info:(signal_scheduler:1573) Got a REQ_GPU_ACCESS task for VFindex 0
[ 696.091081] gim info:(signal_scheduler:1577) req_gpu_task --> Event = 1;FuncID = 0
[ 696.091082] gim info:(signal_scheduler:1663) IDH_REQ_GPU_INIT_ACCESS
[ 696.091084] gim info:(handle_req_gpu_init_access:1184) handle req_gpu_init_access event for VF0, reset flag = 0
[ 696.091085] gim info:(alloc_new_vf:2321) alloc_new_vf with pid 0
[ 696.091087] gim info:(load_vbios:4031) FB.start = 0x43ee0000000, FB.size = 0x43eefffffff
[ 696.091089] gim info:(load_vbios:4045) VBios -> 55 AA 80 E9 29 3 0 0
[ 696.091091] gim info:(map_vf_fb:381) Map region 0x43ee0000000 for length 268435456
[ 696.091148] gim info:(load_vbios:4048) FB VA = 00000000f4a04997
[ 696.091152] gim info:(load_vbios:4052) Copy VBios from 0000000011c744ba to 00000000f4a04997 for length 0x10000
[ 696.097027] gim info:(load_vbios:4055) UnMap FB VA 00000000f4a04997
[ 696.097079] gim info:(alloc_new_vf:2337) sav_res_list_size = -166651663
[ 697.030272] gim info:(alloc_new_vf:2427) Defer adding VF to runlist until Mailbox work queue
[ 697.030274] gim info:(load_vbios:4031) FB.start = 0x43ee0000000, FB.size = 0x43eefffffff
[ 697.030277] gim info:(load_vbios:4045) VBios -> 55 AA 80 E9 29 3 0 0
[ 697.030279] gim info:(map_vf_fb:381) Map region 0x43ee0000000 for length 268435456
[ 697.030315] gim info:(load_vbios:4048) FB VA = 0000000005ed60dd
[ 697.030317] gim info:(load_vbios:4052) Copy VBios from 0000000011c744ba to 0000000005ed60dd for length 0x10000
[ 697.036194] gim info:(load_vbios:4055) UnMap FB VA 0000000005ed60dd
[ 697.036229] gim info:(alloc_new_vf:2441) end of alloc_new_vf
[ 697.036231] gim info:(pause_scheduler:124) Stop the Scheduler
[ 697.036232] gim info:(stop_current_vf:2097) stop current vf[1]
[ 697.036233] gim warning:(stop_current_vf:2121) runlist is not scheduled, ignore stop
[ 697.036234] gim info:(handle_req_gpu_init_access:1229) init and run new VF
[ 697.036236] gim info:(run_sdma:3696) Check SDMA for bdf 0x500
[ 697.036238] gim info:(run_sdma:3699) SDMA0 is HALTED, unhalt it
[ 697.036239] gim info:(run_sdma:3709) SDMA1 is HALTED, unhalt it
[ 697.036242] gim info:(run_sdma:3720) SDMA0_F32_CNTL = 0x0000, SDMA1_F32_CNTL = 0x0000
[ 697.036245] gim info:(run_sdma:3724) SDMA0 WPTR/RPTR = 0x0000 / 0x0000
[ 697.036246] gim info:(run_sdma:3696) Check SDMA for bdf 0x510 [ 697.036248] gim info:(run_sdma:3704) SDMA0 is already running;It doesn't need to be unhalted [ 697.036249] gim info:(run_sdma:3714) SDMA1 is already running;It doesn't need to be unhalted [ 697.036252] gim info:(run_sdma:3720) SDMA0_F32_CNTL = 0x0000, SDMA1_F32_CNTL = 0x0000 [ 697.036254] gim info:(run_sdma:3724) SDMA0 WPTR/RPTR = 0x0000 / 0x0000 [ 699.036429] gim error:(wait_cmd_complete:1681) wait_cmd_complete -- time out after 2.000068956 sec [ 699.036471] gim error:(wait_cmd_complete:1688) Cmd = 0x13, Status = 0x0 [ 699.036500] gim error:(dump_gpu_status:1420) dump gpu status begin for struct adapter 5:00.00 [ 699.036538] gim info:(check_base_addrs:1408) CP_MQD_BASE_ADDR = 0x0:00000000 [ 699.036565] gim error:(dump_gpu_status:1457) mmGRBM_STATUS = 0x3028 [ 699.036592] gim error:(dump_gpu_status:1460) mmGRBM_STATUS2 = 0x8 [ 699.036617] gim error:(dump_gpu_status:1463) mmSRBM_STATUS = 0x20000040 [ 699.036645] gim error:(dump_gpu_status:1466) mmSRBM_STATUS2 = 0x0 [ 699.036671] gim error:(dump_gpu_status:1469) mmSDMA0_STATUS_REG = 0x46dee557 [ 699.036701] gim error:(dump_gpu_status:1472) mmSDMA1_STATUS_REG = 0x46dee557 [ 699.036736] gim info:(check_me_cntl:1386) CP_ME_CNTL = 0x15000000 GPU dump [ 699.036737] gim error:(check_me_cntl:1388) ME HALTED! [ 699.036759] gim error:(check_me_cntl:1391) PFP HALTED! [ 699.036781] gim error:(check_me_cntl:1394) CE HALTED! [ 699.036803] gim error:(dump_gpu_status:1588) dump gpu status end [ 699.036830] gim error:(load_register_init_state:3666) Failed to LOAD register 'init-state' [ 699.036832] gim info:(mark_func_scheduled:2614) Mark VF0 is scheduled [ 699.036865] gim warning:(add_func_to_run_list:2488) VF0 is already on the run_list. [ 702.037033] gim error:(wait_cmd_complete:1681) wait_cmd_complete -- time out after 3.000063346 sec [ 702.037108] gim error:(wait_cmd_complete:1688) Cmd = 0x17, Status = 0x0 [ 702.037186] gim error:(dump_gpu_status:1420) dump gpu status begin for struct adapter 5:00.00 [ 702.037258] gim info:(check_base_addrs:1408) CP_MQD_BASE_ADDR = 0x0:00000000 [ 702.037285] gim error:(dump_gpu_status:1457) mmGRBM_STATUS = 0x3028 [ 702.037335] gim error:(dump_gpu_status:1460) mmGRBM_STATUS2 = 0x8 [ 702.037383] gim error:(dump_gpu_status:1463) mmSRBM_STATUS = 0x20000040 [ 702.037435] gim error:(dump_gpu_status:1466) mmSRBM_STATUS2 = 0x0 [ 702.037484] gim error:(dump_gpu_status:1469) mmSDMA0_STATUS_REG = 0x46dee557 [ 702.037539] gim error:(dump_gpu_status:1472) mmSDMA1_STATUS_REG = 0x46dee557 [ 702.038959] gim info:(check_me_cntl:1386) CP_ME_CNTL = 0x15000000 GPU dump [ 702.038960] gim error:(check_me_cntl:1388) ME HALTED! [ 702.040385] gim error:(check_me_cntl:1391) PFP HALTED! [ 702.041795] gim error:(check_me_cntl:1394) CE HALTED! [ 702.043154] gim error:(dump_gpu_status:1588) dump gpu status end [ 702.044508] gim info:(load_vbios:4031) FB.start = 0x43ee0000000, FB.size = 0x43eefffffff [ 702.044511] gim info:(load_vbios:4045) VBios -> 55 AA 80 E9 29 3 0 0 [ 702.044513] gim info:(map_vf_fb:381) Map region 0x43ee0000000 for length 268435456 [ 702.044570] gim info:(load_vbios:4048) FB VA = 0000000005ed60dd [ 702.044573] gim info:(load_vbios:4052) Copy VBios from 0000000011c744ba to 0000000005ed60dd for length 0x10000 [ 702.050448] gim info:(load_vbios:4055) UnMap FB VA 0000000005ed60dd [ 702.050496] gim info:(run_sdma:3696) Check SDMA for bdf 0x500 [ 702.050498] gim info:(run_sdma:3704) SDMA0 is already running;It doesn't need to be unhalted [ 702.050499] gim info:(run_sdma:3714) SDMA1 is already running;It doesn't need to be unhalted [ 702.050502] gim info:(run_sdma:3720) SDMA0_F32_CNTL = 0x0000, SDMA1_F32_CNTL = 0x0000 [ 702.050505] gim info:(run_sdma:3724) SDMA0 WPTR/RPTR = 0x0000 / 0x0000 [ 702.050507] gim info:(run_sdma:3696) Check SDMA for bdf 0x510 [ 702.050509] gim info:(run_sdma:3704) SDMA0 is already running;It doesn't need to be unhalted [ 702.050511] gim info:(run_sdma:3714) SDMA1 is already running;It doesn't need to be unhalted [ 702.050513] gim info:(run_sdma:3720) SDMA0_F32_CNTL = 0x0000, SDMA1_F32_CNTL = 0x0000 [ 702.050517] gim info:(run_sdma:3724) SDMA0 WPTR/RPTR = 0x0000 / 0x0000 [ 704.050654] gim error:(wait_cmd_complete:1681) wait_cmd_complete -- time out after 2.000031965 sec [ 704.051388] gim error:(wait_cmd_complete:1688) Cmd = 0x14, Status = 0x0 [ 704.052108] gim error:(dump_gpu_status:1420) dump gpu status begin for struct adapter 5:00.00 [ 704.052853] gim info:(check_base_addrs:1408) CP_MQD_BASE_ADDR = 0x0:00000000 [ 704.052879] gim error:(dump_gpu_status:1457) mmGRBM_STATUS = 0x3028 [ 704.053639] gim error:(dump_gpu_status:1460) mmGRBM_STATUS2 = 0x8 [ 704.054405] gim error:(dump_gpu_status:1463) mmSRBM_STATUS = 0x20000040 [ 704.055115] gim error:(dump_gpu_status:1466) mmSRBM_STATUS2 = 0x0 [ 704.055792] gim error:(dump_gpu_status:1469) mmSDMA0_STATUS_REG = 0x46dee557 [ 704.056450] gim error:(dump_gpu_status:1472) mmSDMA1_STATUS_REG = 0x46dee557 [ 704.057092] gim info:(check_me_cntl:1386) CP_ME_CNTL = 0x15000000 GPU dump [ 704.057093] gim error:(check_me_cntl:1388) ME HALTED! [ 704.057731] gim error:(check_me_cntl:1391) PFP HALTED! [ 704.058409] gim error:(check_me_cntl:1394) CE HALTED! [ 704.059039] gim error:(dump_gpu_status:1588) dump gpu status end [ 706.059855] gim error:(wait_cmd_complete:1681) wait_cmd_complete -- time out after 2.000070472 sec [ 706.061148] gim error:(wait_cmd_complete:1688) Cmd = 0x18, Status = 0x0 [ 706.062436] gim error:(dump_gpu_status:1420) dump gpu status begin for struct adapter 5:00.00 [ 706.063755] gim info:(check_base_addrs:1408) CP_MQD_BASE_ADDR = 0x0:00000000 [ 706.063782] gim error:(dump_gpu_status:1457) mmGRBM_STATUS = 0x3028 [ 706.065101] gim error:(dump_gpu_status:1460) mmGRBM_STATUS2 = 0x8 [ 706.066439] gim error:(dump_gpu_status:1463) mmSRBM_STATUS = 0x20000040 [ 706.067178] gim error:(dump_gpu_status:1466) mmSRBM_STATUS2 = 0x0 [ 706.067845] gim error:(dump_gpu_status:1469) mmSDMA0_STATUS_REG = 0x46dee557 [ 706.068496] gim error:(dump_gpu_status:1472) mmSDMA1_STATUS_REG = 0x46dee557 [ 706.069153] gim info:(check_me_cntl:1386) CP_ME_CNTL = 0x15000000 GPU dump [ 706.069154] gim error:(check_me_cntl:1388) ME HALTED! [ 706.069798] gim error:(check_me_cntl:1391) PFP HALTED! [ 706.070411] gim error:(check_me_cntl:1394) CE HALTED! [ 706.071025] gim error:(dump_gpu_status:1588) dump gpu status end [ 706.071653] gim info:(save_rlcv_state:1970) SAVE_RLCV_STATE failed on VF0 [ 706.071655] gim info:(gim_save_vddgfx_state:116) RLCV responced SAVE_RLCV_STATE [ 706.072015] gim info:(gim_save_vddgfx_state:123) mmRLC_GPU_IOV_SCRATCH_ADDR = 0 [ 706.072222] gim info:(gim_save_vddgfx_state:130) RLCV scratch saved [ 706.072223] gim info:(handle_req_gpu_init_access:1277) amdgpuv_save_vddgfx_state: VF 0 vddgfx state is saved [ 706.072226] gim info:(mailbox_update_index:836) write mmMAILBOX_INDEX: 0x0 [ 706.072230] gim info:(mailbox_msg_trn:936) write mmMAILBOX_MSGBUF_TRN_DW0: 0x1 to MAILBOX_INDEX 0x0 [ 706.072232] gim info:(mailbox_msg_trn:952) write mmMAILBOX_CONTROL: 0x1 [ 706.072241] gim info:(ih_irq_process:713) AMD ISR is being invoked [ 706.072242] gim info:(ih_iv_ring_get_pointers:482) ih_iv_ring_get_pointers [ 706.072244] gim info:(ih_iv_ring_get_pointers:483) ih->ivr_wptr_wb = 0x000000004c4df8ef [ 706.072245] gim info:(ih_iv_ring_get_pointers:485) write offset: *(ih->ivr_wptr_wb) = 0x20 [ 706.072246] gim info:(ih_iv_ring_get_pointers:486) read idx: ih->ivr_rptr = 0x1 [ 706.072247] gim info:(ih_iv_ring_get_pointers:488) Rx at entry 1 in the ring [ 706.072248] gim info:(ih_iv_ring_get_pointers:490) iv_ring_entry.source_id = 136 [ 706.072249] gim info:(ih_iv_ring_get_pointers:491) iv_ring_entry.source_data = 0 [ 706.072250] gim info:(ih_irq_process:734) received 1 irqs in one ISR [ 706.072252] gim info:(ih_irq_process:792) PF_VF_MSGBUF_ACK received(VF has ACK'd the msg) [ 706.072254] gim info:(mailbox_update_index:836) write mmMAILBOX_INDEX: 0x0 [ 706.072255] gim info:(mailbox_clear_msg_valid:1022) write mmMAILBOX_CONTROL: 0x2 [ 706.072257] gim info:(mailbox_update_index:836) write mmMAILBOX_INDEX: 0x0 [ 706.072259] gim info:(ih_iv_ring_update_rptr:546) update the new rptr: ih->ivr_rptr_reg = 0x20 [ 706.072260] gim info:(ih_iv_ring_update_rptr:554) update rptr via doorbell: 0x20 [ 706.072261] gim info:(ih_iv_ring_update_rptr:556) current wptr: 0x20 [ 706.072262] gim info:(ih_irq_process:823) AMD ISR is complete

any ideas?

kzytaruk commented 5 years ago

Is this consistently reproducible or random?

Thanks,

Kelly

From: flumm [mailto:notifications@github.com] Sent: Friday, October 12, 2018 6:36 AM To: GPUOpen-LibrariesAndSDKs/MxGPU-Virtualization MxGPU-Virtualization@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [GPUOpen-LibrariesAndSDKs/MxGPU-Virtualization] Host Crash qemu/kvm (#13)

uname -a: Linux myhost 4.15.18-7-pve #1 https://github.com/GPUOpen-LibrariesAndSDKs/MxGPU-Virtualization/issues/1 SMP PVE 4.15.18-26 (Thu, 04 Oct 2018 11:03:06 +0200) x86_64 GNU/Linux

even though i use kernel 4.15, compiling went without problems a modprobe produces following output though:

[ 230.259767] gim info:(gim_init:149) Start AMD open source GIM initialization [ 230.259770] gim info:(gim_init:152) GPU IOV MODULE - version 0.0 [ 230.259771] gim info:(gim_init:154) Copyright (c) 2014-2018 AMD Corporation. [ 230.260618] gim info:(parse_config_file:219) AMD GIM fb_option = 0 [ 230.260619] gim info:(parse_config_file:219) AMD GIM sched_option = 1 [ 230.260621] gim info:(parse_config_file:219) AMD GIM vf_num = 1 [ 230.260622] gim info:(parse_config_file:219) AMD GIM pf_fb = 256 [ 230.260623] gim info:(parse_config_file:219) AMD GIM vf_fb = 256 [ 230.260625] gim info:(parse_config_file:219) AMD GIM sched_interval = 0 [ 230.260626] gim info:(parse_config_file:219) AMD GIM sched_interval_us = 0 [ 230.260627] gim info:(parse_config_file:219) AMD GIM fb_clear = 0 [ 230.260629] gim info:(init_config:341) INIT CONFIG [ 230.268677] gim info:(set_new_adapter:572) curr allocated at 0000000007ecc9e8 [ 230.268678] gim info:(set_new_adapter:579) SRIOV is supported [ 230.268683] gim info:(set_new_adapter:587) found PCI bridge device [ 230.268684] gim info:(set_new_adapter:591) found: 00:3.0 [ 230.268718] gim info:(set_new_adapter:608) mmio_base = 0000000073a47b6a [ 230.268726] gim info:(set_new_adapter:610) doorbell = 00000000b86d6846 [ 230.268737] gim info:(set_new_adapter:612) pf.fb_va = 000000006a5f6588 [ 230.268746] gim info:(sriov_is_ari_enabled:164) PCI_SRIOV_CAP = 0x00000002 [ 230.268748] gim info:(sriov_is_ari_enabled:174) PCI_SRIOV_CTRL = 0x00000010 [ 230.268749] gim info:(sriov_is_ari_enabled:177) PCI_SRIOV_CTRL_ARI is set --> ARI is supported [ 230.268751] gim info:(program_ari_mode:441) Read bif_strap8 = 0x00200004 [ 230.268752] gim info:(program_ari_mode:446) program_ari_mode - Set ARI_Mode = PF_BUS [ 230.268753] gim info:(program_ari_mode:456) Write bif_strap8 = 0x00000004 [ 230.268754] gim info:(gim_read_rom_from_reg:181) Reading VBios from ROM [ 230.268903] gim info:(gim_read_vbios:243) VBIOS starts: 0x55, 0xaa [ 230.268904] gim info:(gim_read_vbios:246) VBios size is 0x10000 [ 230.268915] gim info:(gim_read_vbios:249) vbios allocated at 0000000011c744ba [ 230.268915] gim info:(gim_read_rom_from_reg:181) Reading VBios from ROM [ 230.441512] gim info:(gim_read_vbios:257) BIOS Version Major 0xF Minor 0x31 [ 230.441561] gim info:(gim_read_vbios:270) Valid video BIOS image, [ 230.441563] gim info:(gim_read_vbios:271) size = 0x10000, check sum is 0x53b100 [ 230.441565] gim info:(set_new_adapter:661) Scheduler Time interval is per-vf from XL [ 230.441566] gim info:(set_new_adapter:662) config file [ 230.441567] gim info:(enable_sriov:299) Enable SRIOV [ 230.441568] gim info:(enable_sriov:300) Enable SRIOV vfs count = 2 [ 230.548564] gim info:(enumerate_vfs:123) vf found: 05:2.0 [ 230.548577] gim info:(enumerate_vfs:123) vf found: 05:2.1 [ 230.548611] gim info:(pci_disable_error_reporting:735) Disable error reporting for device: 05:2.0 [ 230.548613] gim info:(pci_disable_error_reporting:740) Mask before -> corr = 0x00000000, uncorr = 0x00000000 [ 230.548617] gim info:(pci_disable_error_reporting:751) Mask after -> corr = 0x00000000, uncorr = 0x00000000 [ 230.548633] gim info:(pci_disable_error_reporting:735) Disable error reporting for device: 05:2.1 [ 230.548635] gim info:(pci_disable_error_reporting:740) Mask before -> corr = 0x00000000, uncorr = 0x00000000 [ 230.548639] gim info:(pci_disable_error_reporting:751) Mask after -> corr = 0x00000000, uncorr = 0x00000000 [ 230.548648] gim info:(pci_gpu_iov_init:87) total_fb_available = 8190 [ 230.548649] gim info:(pci_gpu_iov_init:88) AMD GIM pci_gpu_iov_init pos = 400 [ 230.548650] gim info:(pci_gpu_iov_init:90) AMD GIM pci_gpu_iov_init total_fb_available = 1ffe [ 230.548652] gim info:(init_frame_buffer_partition:218) PCI defined PF FB size = 256 MB [ 230.548653] gim info:(init_frame_buffer_partition:222) PCI defined VF FB size = 256 MB [ 230.548654] gim info:(init_frame_buffer_partition:227) Total FB Available = 8190 MB, CSA = 8 MB [ 230.548655] gim info:(init_frame_buffer_partition:228) Max Remaining FB Size = 8160 [ 230.548656] gim info:(init_frame_buffer_partition:241) PF FB size after checking limits from config file = 256MB [ 230.548657] gim info:(init_frame_buffer_partition:244) PF rounded down to nearest 16MB boundary = 256 [ 230.548658] gim info:(init_pf_fb:60) total framebuffer available = 1ffe [ 230.548659] gim info:(init_pf_fb:61) pf framebuffer = 100 [ 230.548660] gim info:(init_pf_fb:63) total framebuffer consumed = 1efe [ 230.548661] gim info:(init_frame_buffer_partition:251) CSA starts at offset 256MB [ 230.548663] gim info:(init_context_save_area:42) AMD GIM init_context_save_area: base =100 size=1. [ 230.548664] gim info:(init_frame_buffer_partition:258) VF FB base = 272MB (256 + 16) [ 230.548666] gim info:(init_frame_buffer_partition:262) VF FB Size = 7904MB (8160 - 256) [ 230.548667] gim info:(init_fb_static:146) AMD GIM init_fb_static: num_vf = 2, base= 272, total_size=7904, min_size=256 [ 230.548669] gim info:(init_fb_static:158) VF FB size specified as 256MB, min_size = 256 [ 230.548670] gim info:(init_fb_static:167) AMD GIM init_fb_static: vf_fb_size = 256, base= 272 [ 230.548671] gim info:(init_fb_static:178) AMD GIM init_fb_static: partition 0 base =272,size= 256 [ 230.548673] gim info:(init_fb_static:178) AMD GIM init_fb_static: partition 1 base =528,size= 256 [ 230.548694] gim info:(set_new_adapter:707) enable MSI [ 230.548735] gim info:(ih_iv_ring_disable:383) disable iv ring successfully [ 230.548736] gim info:(alloc_iv_ring:99) ih->ivr_num_entries = 256 [ 230.548737] gim info:(alloc_iv_ring:102) ih->ivr_size_in_bytes = 4096 [ 230.548738] gim info:(alloc_iv_ring:108) ih->ivr_alloc_size_in_bytes = 4100 [ 230.548739] gim info:(alloc_iv_ring:110) iv ring page_cnt = 2 [ 230.548742] gim info:(alloc_iv_ring:141) ih->ivr_va = 0000000006512d73 [ 230.548862] gim info:(alloc_iv_ring:147) ih->ivr_ma.quad_part = 0xfffff000 [ 230.548863] gim info:(alloc_iv_ring:151) ih->ivr_wptr_wb = 000000004c4df8ef [ 230.548867] gim info:(alloc_iv_ring:158) ih->ivr_wptr_wa.quad_part = 0xffffe000 [ 230.548868] gim info:(alloc_iv_ring:163) update rptr via doorbell [ 230.548869] gim info:(ih_iv_ring_init:291) ih->rptr_doorbell = 0000000052363f63 [ 230.548870] gim info:(ih_iv_ring_init:292) ih->rptr_doorbell_offset = 0x1e8 [ 230.548872] gim info:(ih_iv_ring_hw_init:185) the physical address of ring buffer: 0xfffff0 [ 230.548878] gim info:(ih_iv_ring_setup_rptr:451) write mmBIF_DOORBELL_APER_EN: 0x1 [ 230.548880] gim info:(ih_iv_ring_enable:350) ih->ivr_wptr_reg = 0x0 [ 230.548881] gim info:(ih_iv_ring_enable:352) ih->ivr_wptr = 0 [ 230.548882] gim info:(ih_iv_ring_enable:354) ih->ivr_rptr_reg = 0x0 [ 230.548882] gim info:(ih_iv_ring_enable:356) ih->ivr_rptr = 0 [ 230.548884] gim info:(ih_iv_ring_enable:358) *(ih->rptr_doorbell) = 0x0 [ 230.548885] gim info:(ih_iv_ring_init:299) init iv ring successfully [ 230.548886] gim info:(set_new_adapter:720) init work [ 230.548887] gim info:(set_new_adapter:726) register interrupt [ 230.548923] gim info:(ih_irq_source_enable:584) IH: read 0x00000000 from mask_reg 0x14d1 [ 230.548925] gim info:(ih_irq_source_enable:590) IH: write 0x00000001 to mask_reg 0x14d1 [ 230.548926] gim info:(ih_irq_source_enable:593) irq sourceID 0x89 get enabled [ 230.548928] gim info:(ih_irq_source_enable:584) IH: read 0x00000001 from mask_reg 0x14d1 [ 230.548929] gim info:(ih_irq_source_enable:590) IH: write 0x00000003 to mask_reg 0x14d1 [ 230.548930] gim info:(ih_irq_source_enable:593) irq sourceID 0x88 get enabled [ 233.549138] gim error:(wait_cmd_complete:1681) wait_cmd_complete -- time out after 3.000038872 sec [ 233.549181] gim error:(wait_cmd_complete:1688) Cmd = 0x17, Status = 0x0 [ 233.549210] gim error:(dump_gpu_status:1420) dump gpu status begin for struct adapter 5:00.00 [ 233.549248] gim info:(check_base_addrs:1408) CP_MQD_BASE_ADDR = 0x0:00000000 [ 233.549271] gim error:(dump_gpu_status:1457) mmGRBM_STATUS = 0x3028 [ 233.549297] gim error:(dump_gpu_status:1460) mmGRBM_STATUS2 = 0x8 [ 233.549323] gim error:(dump_gpu_status:1463) mmSRBM_STATUS = 0x20000040 [ 233.549351] gim error:(dump_gpu_status:1466) mmSRBM_STATUS2 = 0x0 [ 233.549377] gim error:(dump_gpu_status:1469) mmSDMA0_STATUS_REG = 0x46dee557 [ 233.549406] gim error:(dump_gpu_status:1472) mmSDMA1_STATUS_REG = 0x46dee557 [ 233.549439] gim info:(check_me_cntl:1386) CP_ME_CNTL = 0x15000000 GPU dump [ 233.549440] gim error:(check_me_cntl:1388) ME HALTED! [ 233.549462] gim error:(check_me_cntl:1391) PFP HALTED! [ 233.549484] gim error:(check_me_cntl:1394) CE HALTED! [ 233.549507] gim error:(dump_gpu_status:1588) dump gpu status end [ 233.549534] gim error:(init_register_init_state:3624) Failed to INIT PF for initial register 'init-state' [ 233.549541] gim info:(gim_clear_all_errors:357) PCIE cap pos 58 [ 233.549580] gim info:(gim_clear_all_errors:362) AER ext cap pos 150 [ 233.549581] gim info:(gim_clear_all_errors:369) DevStatus = 0x9 [ 233.549584] gim info:(gim_clear_all_errors:387) PCIE unrecoverable error = 0x2000 [ 233.549589] gim info:(add_func_to_run_list:2459) Add VF0 to the runlist [ 233.549591] gim info:(alloc_fn_list_node:3837) New Function List Node allocated at 00000000b55b812e index 0 [ 233.549592] gim info:(add_func_to_run_list:2459) Add VF1 to the runlist [ 233.549593] gim info:(alloc_fn_list_node:3837) New Function List Node allocated at 00000000ce52204e index 1 [ 233.549595] gim warning:(resume_scheduler:155) Restart the Scheduler for 7.000msec [ 233.549597] gim info:(gim_probe:91) AMD GIM probe: pf_count = 1

especially this:

[ 233.549534] gim error:(init_register_init_state:3624) Failed to INIT PF for initial register 'init-state'

after this, lspci shows the vgpu:

lspci | grep AMD

05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT GL [FirePro S7150] 05:02.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V] 05:02.1 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]

even passing through 05:02.0 works, up to the point where i install the guest driver (win10 x64) then i am greeted with following output in dmesg and the host crashes:

[ 696.091017] gim info:(ih_irq_process:713) AMD ISR is being invoked [ 696.091019] gim info:(ih_iv_ring_get_pointers:482) ih_iv_ring_get_pointers [ 696.091021] gim info:(ih_iv_ring_get_pointers:483) ih->ivr_wptr_wb = 0x000000004c4df8ef [ 696.091023] gim info:(ih_iv_ring_get_pointers:485) write offset: *(ih->ivr_wptr_wb) = 0x10 [ 696.091024] gim info:(ih_iv_ring_get_pointers:486) read idx: ih->ivr_rptr = 0x0 [ 696.091025] gim info:(ih_iv_ring_get_pointers:488) Rx at entry 0 in the ring [ 696.091026] gim info:(ih_iv_ring_get_pointers:490) iv_ring_entry.source_id = 137 [ 696.091027] gim info:(ih_iv_ring_get_pointers:491) iv_ring_entry.source_data = 0 [ 696.091029] gim info:(ih_irq_process:734) received 1 irqs in one ISR [ 696.091031] gim info:(ih_irq_process:757) VF_PF_MSGBUF_VALID received(Received msg from VF) [ 696.091033] gim info:(mailbox_update_index:836) write mmMAILBOX_INDEX: 0x0 [ 696.091037] gim info:(mailbox_msg_rcv:882) read mmMAILBOX_MSGBUF_RCV_DW0: 0x1 [ 696.091038] gim info:(mailbox_msg_rcv:883) read mmMAILBOX_MSGBUF_RCV_DW1: 0 [ 696.091039] gim info:(mailbox_msg_rcv:884) read mmMAILBOX_MSGBUF_RCV_DW2: 0x0 [ 696.091040] gim info:(mailbox_msg_rcv:885) read mmMAILBOX_MSGBUF_RCV_DW3: 0x0 [ 696.091043] gim info:(mailbox_ack_receipt:1008) write mmMAILBOX_CONTROL: 0x300 to MAILBOX_INDEX 0x0 [ 696.091044] gim info:(ih_irq_process:778) GPU access flag = 0x0 [ 696.091046] gim info:(idh_queue:656) new idh: task->event = 1, task->func_id = 0 [ 696.091048] gim info:(mailbox_update_index:836) write mmMAILBOX_INDEX: 0x0 [ 696.091053] gim info:(ih_iv_ring_update_rptr:546) update the new rptr: ih->ivr_rptr_reg = 0x10 [ 696.091054] gim info:(ih_iv_ring_update_rptr:554) update rptr via doorbell: 0x10 [ 696.091055] gim info:(ih_iv_ring_update_rptr:556) current wptr: 0x10 [ 696.091055] gim info:(ih_irq_process:823) AMD ISR is complete [ 696.091079] gim info:(signal_scheduler:1561) Invoked the task scheduler thread. Process IRQ activity [ 696.091080] gim info:(signal_scheduler:1573) Got a REQ_GPU_ACCESS task for VFindex 0 [ 696.091081] gim info:(signal_scheduler:1577) req_gpu_task --> Event = 1;FuncID = 0 [ 696.091082] gim info:(signal_scheduler:1663) IDH_REQ_GPU_INIT_ACCESS [ 696.091084] gim info:(handle_req_gpu_init_access:1184) handle req_gpu_init_access event for VF0, reset flag = 0 [ 696.091085] gim info:(alloc_new_vf:2321) alloc_new_vf with pid 0 [ 696.091087] gim info:(load_vbios:4031) FB.start = 0x43ee0000000, FB.size = 0x43eefffffff [ 696.091089] gim info:(load_vbios:4045) VBios -> 55 AA 80 E9 29 3 0 0 [ 696.091091] gim info:(map_vf_fb:381) Map region 0x43ee0000000 for length 268435456 [ 696.091148] gim info:(load_vbios:4048) FB VA = 00000000f4a04997 [ 696.091152] gim info:(load_vbios:4052) Copy VBios from 0000000011c744ba to 00000000f4a04997 for length 0x10000 [ 696.097027] gim info:(load_vbios:4055) UnMap FB VA 00000000f4a04997 [ 696.097079] gim info:(alloc_new_vf:2337) sav_res_list_size = -166651663 [ 697.030272] gim info:(alloc_new_vf:2427) Defer adding VF to runlist until Mailbox work queue [ 697.030274] gim info:(load_vbios:4031) FB.start = 0x43ee0000000, FB.size = 0x43eefffffff [ 697.030277] gim info:(load_vbios:4045) VBios -> 55 AA 80 E9 29 3 0 0 [ 697.030279] gim info:(map_vf_fb:381) Map region 0x43ee0000000 for length 268435456 [ 697.030315] gim info:(load_vbios:4048) FB VA = 0000000005ed60dd [ 697.030317] gim info:(load_vbios:4052) Copy VBios from 0000000011c744ba to 0000000005ed60dd for length 0x10000 [ 697.036194] gim info:(load_vbios:4055) UnMap FB VA 0000000005ed60dd [ 697.036229] gim info:(alloc_new_vf:2441) end of alloc_new_vf [ 697.036231] gim info:(pause_scheduler:124) Stop the Scheduler [ 697.036232] gim info:(stop_current_vf:2097) stop current vf[1] [ 697.036233] gim warning:(stop_current_vf:2121) runlist is not scheduled, ignore stop [ 697.036234] gim info:(handle_req_gpu_init_access:1229) init and run new VF [ 697.036236] gim info:(run_sdma:3696) Check SDMA for bdf 0x500 [ 697.036238] gim info:(run_sdma:3699) SDMA0 is HALTED, unhalt it [ 697.036239] gim info:(run_sdma:3709) SDMA1 is HALTED, unhalt it [ 697.036242] gim info:(run_sdma:3720) SDMA0_F32_CNTL = 0x0000, SDMA1_F32_CNTL = 0x0000 [ 697.036245] gim info:(run_sdma:3724) SDMA0 WPTR/RPTR = 0x0000 / 0x0000 [ 697.036246] gim info:(run_sdma:3696) Check SDMA for bdf 0x510 [ 697.036248] gim info:(run_sdma:3704) SDMA0 is already running;It doesn't need to be unhalted [ 697.036249] gim info:(run_sdma:3714) SDMA1 is already running;It doesn't need to be unhalted [ 697.036252] gim info:(run_sdma:3720) SDMA0_F32_CNTL = 0x0000, SDMA1_F32_CNTL = 0x0000 [ 697.036254] gim info:(run_sdma:3724) SDMA0 WPTR/RPTR = 0x0000 / 0x0000 [ 699.036429] gim error:(wait_cmd_complete:1681) wait_cmd_complete -- time out after 2.000068956 sec [ 699.036471] gim error:(wait_cmd_complete:1688) Cmd = 0x13, Status = 0x0 [ 699.036500] gim error:(dump_gpu_status:1420) dump gpu status begin for struct adapter 5:00.00 [ 699.036538] gim info:(check_base_addrs:1408) CP_MQD_BASE_ADDR = 0x0:00000000 [ 699.036565] gim error:(dump_gpu_status:1457) mmGRBM_STATUS = 0x3028 [ 699.036592] gim error:(dump_gpu_status:1460) mmGRBM_STATUS2 = 0x8 [ 699.036617] gim error:(dump_gpu_status:1463) mmSRBM_STATUS = 0x20000040 [ 699.036645] gim error:(dump_gpu_status:1466) mmSRBM_STATUS2 = 0x0 [ 699.036671] gim error:(dump_gpu_status:1469) mmSDMA0_STATUS_REG = 0x46dee557 [ 699.036701] gim error:(dump_gpu_status:1472) mmSDMA1_STATUS_REG = 0x46dee557 [ 699.036736] gim info:(check_me_cntl:1386) CP_ME_CNTL = 0x15000000 GPU dump [ 699.036737] gim error:(check_me_cntl:1388) ME HALTED! [ 699.036759] gim error:(check_me_cntl:1391) PFP HALTED! [ 699.036781] gim error:(check_me_cntl:1394) CE HALTED! [ 699.036803] gim error:(dump_gpu_status:1588) dump gpu status end [ 699.036830] gim error:(load_register_init_state:3666) Failed to LOAD register 'init-state' [ 699.036832] gim info:(mark_func_scheduled:2614) Mark VF0 is scheduled [ 699.036865] gim warning:(add_func_to_run_list:2488) VF0 is already on the run_list. [ 702.037033] gim error:(wait_cmd_complete:1681) wait_cmd_complete -- time out after 3.000063346 sec [ 702.037108] gim error:(wait_cmd_complete:1688) Cmd = 0x17, Status = 0x0 [ 702.037186] gim error:(dump_gpu_status:1420) dump gpu status begin for struct adapter 5:00.00 [ 702.037258] gim info:(check_base_addrs:1408) CP_MQD_BASE_ADDR = 0x0:00000000 [ 702.037285] gim error:(dump_gpu_status:1457) mmGRBM_STATUS = 0x3028 [ 702.037335] gim error:(dump_gpu_status:1460) mmGRBM_STATUS2 = 0x8 [ 702.037383] gim error:(dump_gpu_status:1463) mmSRBM_STATUS = 0x20000040 [ 702.037435] gim error:(dump_gpu_status:1466) mmSRBM_STATUS2 = 0x0 [ 702.037484] gim error:(dump_gpu_status:1469) mmSDMA0_STATUS_REG = 0x46dee557 [ 702.037539] gim error:(dump_gpu_status:1472) mmSDMA1_STATUS_REG = 0x46dee557 [ 702.038959] gim info:(check_me_cntl:1386) CP_ME_CNTL = 0x15000000 GPU dump [ 702.038960] gim error:(check_me_cntl:1388) ME HALTED! [ 702.040385] gim error:(check_me_cntl:1391) PFP HALTED! [ 702.041795] gim error:(check_me_cntl:1394) CE HALTED! [ 702.043154] gim error:(dump_gpu_status:1588) dump gpu status end [ 702.044508] gim info:(load_vbios:4031) FB.start = 0x43ee0000000, FB.size = 0x43eefffffff [ 702.044511] gim info:(load_vbios:4045) VBios -> 55 AA 80 E9 29 3 0 0 [ 702.044513] gim info:(map_vf_fb:381) Map region 0x43ee0000000 for length 268435456 [ 702.044570] gim info:(load_vbios:4048) FB VA = 0000000005ed60dd [ 702.044573] gim info:(load_vbios:4052) Copy VBios from 0000000011c744ba to 0000000005ed60dd for length 0x10000 [ 702.050448] gim info:(load_vbios:4055) UnMap FB VA 0000000005ed60dd [ 702.050496] gim info:(run_sdma:3696) Check SDMA for bdf 0x500 [ 702.050498] gim info:(run_sdma:3704) SDMA0 is already running;It doesn't need to be unhalted [ 702.050499] gim info:(run_sdma:3714) SDMA1 is already running;It doesn't need to be unhalted [ 702.050502] gim info:(run_sdma:3720) SDMA0_F32_CNTL = 0x0000, SDMA1_F32_CNTL = 0x0000 [ 702.050505] gim info:(run_sdma:3724) SDMA0 WPTR/RPTR = 0x0000 / 0x0000 [ 702.050507] gim info:(run_sdma:3696) Check SDMA for bdf 0x510 [ 702.050509] gim info:(run_sdma:3704) SDMA0 is already running;It doesn't need to be unhalted [ 702.050511] gim info:(run_sdma:3714) SDMA1 is already running;It doesn't need to be unhalted [ 702.050513] gim info:(run_sdma:3720) SDMA0_F32_CNTL = 0x0000, SDMA1_F32_CNTL = 0x0000 [ 702.050517] gim info:(run_sdma:3724) SDMA0 WPTR/RPTR = 0x0000 / 0x0000 [ 704.050654] gim error:(wait_cmd_complete:1681) wait_cmd_complete -- time out after 2.000031965 sec [ 704.051388] gim error:(wait_cmd_complete:1688) Cmd = 0x14, Status = 0x0 [ 704.052108] gim error:(dump_gpu_status:1420) dump gpu status begin for struct adapter 5:00.00 [ 704.052853] gim info:(check_base_addrs:1408) CP_MQD_BASE_ADDR = 0x0:00000000 [ 704.052879] gim error:(dump_gpu_status:1457) mmGRBM_STATUS = 0x3028 [ 704.053639] gim error:(dump_gpu_status:1460) mmGRBM_STATUS2 = 0x8 [ 704.054405] gim error:(dump_gpu_status:1463) mmSRBM_STATUS = 0x20000040 [ 704.055115] gim error:(dump_gpu_status:1466) mmSRBM_STATUS2 = 0x0 [ 704.055792] gim error:(dump_gpu_status:1469) mmSDMA0_STATUS_REG = 0x46dee557 [ 704.056450] gim error:(dump_gpu_status:1472) mmSDMA1_STATUS_REG = 0x46dee557 [ 704.057092] gim info:(check_me_cntl:1386) CP_ME_CNTL = 0x15000000 GPU dump [ 704.057093] gim error:(check_me_cntl:1388) ME HALTED! [ 704.057731] gim error:(check_me_cntl:1391) PFP HALTED! [ 704.058409] gim error:(check_me_cntl:1394) CE HALTED! [ 704.059039] gim error:(dump_gpu_status:1588) dump gpu status end [ 706.059855] gim error:(wait_cmd_complete:1681) wait_cmd_complete -- time out after 2.000070472 sec [ 706.061148] gim error:(wait_cmd_complete:1688) Cmd = 0x18, Status = 0x0 [ 706.062436] gim error:(dump_gpu_status:1420) dump gpu status begin for struct adapter 5:00.00 [ 706.063755] gim info:(check_base_addrs:1408) CP_MQD_BASE_ADDR = 0x0:00000000 [ 706.063782] gim error:(dump_gpu_status:1457) mmGRBM_STATUS = 0x3028 [ 706.065101] gim error:(dump_gpu_status:1460) mmGRBM_STATUS2 = 0x8 [ 706.066439] gim error:(dump_gpu_status:1463) mmSRBM_STATUS = 0x20000040 [ 706.067178] gim error:(dump_gpu_status:1466) mmSRBM_STATUS2 = 0x0 [ 706.067845] gim error:(dump_gpu_status:1469) mmSDMA0_STATUS_REG = 0x46dee557 [ 706.068496] gim error:(dump_gpu_status:1472) mmSDMA1_STATUS_REG = 0x46dee557 [ 706.069153] gim info:(check_me_cntl:1386) CP_ME_CNTL = 0x15000000 GPU dump [ 706.069154] gim error:(check_me_cntl:1388) ME HALTED! [ 706.069798] gim error:(check_me_cntl:1391) PFP HALTED! [ 706.070411] gim error:(check_me_cntl:1394) CE HALTED! [ 706.071025] gim error:(dump_gpu_status:1588) dump gpu status end [ 706.071653] gim info:(save_rlcv_state:1970) SAVE_RLCV_STATE failed on VF0 [ 706.071655] gim info:(gim_save_vddgfx_state:116) RLCV responced SAVE_RLCV_STATE [ 706.072015] gim info:(gim_save_vddgfx_state:123) mmRLC_GPU_IOV_SCRATCH_ADDR = 0 [ 706.072222] gim info:(gim_save_vddgfx_state:130) RLCV scratch saved [ 706.072223] gim info:(handle_req_gpu_init_access:1277) amdgpuv_save_vddgfx_state: VF 0 vddgfx state is saved [ 706.072226] gim info:(mailbox_update_index:836) write mmMAILBOX_INDEX: 0x0 [ 706.072230] gim info:(mailbox_msg_trn:936) write mmMAILBOX_MSGBUF_TRN_DW0: 0x1 to MAILBOX_INDEX 0x0 [ 706.072232] gim info:(mailbox_msg_trn:952) write mmMAILBOX_CONTROL: 0x1 [ 706.072241] gim info:(ih_irq_process:713) AMD ISR is being invoked [ 706.072242] gim info:(ih_iv_ring_get_pointers:482) ih_iv_ring_get_pointers [ 706.072244] gim info:(ih_iv_ring_get_pointers:483) ih->ivr_wptr_wb = 0x000000004c4df8ef [ 706.072245] gim info:(ih_iv_ring_get_pointers:485) write offset: *(ih->ivr_wptr_wb) = 0x20 [ 706.072246] gim info:(ih_iv_ring_get_pointers:486) read idx: ih->ivr_rptr = 0x1 [ 706.072247] gim info:(ih_iv_ring_get_pointers:488) Rx at entry 1 in the ring [ 706.072248] gim info:(ih_iv_ring_get_pointers:490) iv_ring_entry.source_id = 136 [ 706.072249] gim info:(ih_iv_ring_get_pointers:491) iv_ring_entry.source_data = 0 [ 706.072250] gim info:(ih_irq_process:734) received 1 irqs in one ISR [ 706.072252] gim info:(ih_irq_process:792) PF_VF_MSGBUF_ACK received(VF has ACK'd the msg) [ 706.072254] gim info:(mailbox_update_index:836) write mmMAILBOX_INDEX: 0x0 [ 706.072255] gim info:(mailbox_clear_msg_valid:1022) write mmMAILBOX_CONTROL: 0x2 [ 706.072257] gim info:(mailbox_update_index:836) write mmMAILBOX_INDEX: 0x0 [ 706.072259] gim info:(ih_iv_ring_update_rptr:546) update the new rptr: ih->ivr_rptr_reg = 0x20 [ 706.072260] gim info:(ih_iv_ring_update_rptr:554) update rptr via doorbell: 0x20 [ 706.072261] gim info:(ih_iv_ring_update_rptr:556) current wptr: 0x20 [ 706.072262] gim info:(ih_irq_process:823) AMD ISR is complete

any ideas?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/GPUOpen-LibrariesAndSDKs/MxGPU-Virtualization/issues/13 , or mute the thread https://github.com/notifications/unsubscribe-auth/ASfsU7ng3L80xwmTP2IfBuSqocJw7RAYks5ukHCWgaJpZM4XZOWr . https://github.com/notifications/beacon/ASfsU0BYe544S1exUz5VHEykBpuih-82ks5ukHCWgaJpZM4XZOWr.gif

flumm commented 5 years ago

This happened every time, I did not get it to work at all for now.

Is there any information I can give that might be helpful?

I am at the machine again on Monday

Thanks

flumm commented 5 years ago

is there anything i can provide to further debug this issue? i am still not able to use this cards virtual functions

vigchand2705 commented 5 years ago

[ 233.549181] gim error:(wait_cmd_complete:1688) Cmd = 0x17, Status = 0x0 [ 233.549534] gim error:(init_register_init_state:3624) Failed to INIT PF for initial register 'init-state'

I think either the status should be an error (i.e. non-zero) or the CMD_EXECUTE bit would be cleared in cmd and we would stop waiting. Looks like the FW is not active at all.

How was the board obtained and did this also occur on previous versions of GIM? My guess right now is around some mismatch in vBIOS.

flumm commented 5 years ago

How was the board obtained and did this also occur on previous versions of GIM? My guess right now is around some mismatch in vBIOS.

we bought the card via local hardware shop, nothing special since we had the card for just some weeks, i did not try older versions of gim, but i can do it this week (tomorrow maybe)

is there maybe a firmware update for the card? i did not find any on amds support/download website

flumm commented 5 years ago

hi,

i tried the older version of gim, but that did not work as well, but i found the cause of this. your comment about firmware led me to the bios settings and i had set it to load the efi rom (which the card does not have). Setting this to 'Legacy' worked.

Thanks for your help