Open flintcq opened 6 years ago
if only by add and changed code i mentioned up, it will cause a stack problem
[16045.295678] Thread overran stack, or stack corrupted
[16045.295683] Oops: 0000 [#1] SMP
[16045.295858] Modules linked in: gim(O) vfio_pci vfio_iommu_type1 vfio_virqfd vfio xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables binfmt_misc snd_hda_codec_hdmi input_leds bridge stp llc snd_hda_codec_realtek snd_hda_codec_generic intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_intel coretemp snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd sb_edac edac_core ioatdma soundcore shpchp lpc_ich mac_hid kvm_intel kvm irqbypass ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456
[16045.295948] async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic nouveau crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 mxm_wmi lrw video gf128mul glue_helper ttm ablk_helper cryptd drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm igb dca usbhid ptp pps_core ahci hid i2c_algo_bit libahci fjes wmi
[16045.295956] CPU: 11 PID: 1051 Comm: kworker/11:2 Tainted: G O 4.4.1174.4.0-vgpu #1
[16045.295960] Hardware name: Supermicro X10DAi/X10DAI, BIOS 3.0a 02/05/2018
[16045.296005] Workqueue: events sched_work_handler [gim]
[16045.296010] task: ffff8808542d0e00 ti: ffff880855a40000 task.ti: ffff880855a40000
[16045.296020] RIP: 0010:[
the source code is in function validate_link_status, lines from 405~421 of file gim_reset.c
do { // to get position of a capability kcl_pci_read_config_byte(adapt->p2p_bridge_dev, pos, &data_8); if (data_8 == 0)//i guess, no capabilities left, then stop break;
whose id is PCI_CAP_ID_EXP, but unfortunately, when cap data is broken, the pos of a cap is always 0xff, which will cause a infinite loop, a workaround is the add a limitation which i borrowed from function"vfio_cap_init" in file vfio_pci_config.c, by adding :
and change "} while (1);" to "} while (loops--);"
but this will cause following condition to be unpredictable: