ClangBuiltLinux / linux

Linux kernel source tree
Other
241 stars 14 forks source link

Issue with Qualcomm kernel modules and BTI since at least LLVM 17 #2022

Open Gelbpunkt opened 2 months ago

Gelbpunkt commented 2 months ago

Hi, I'm not 100% sure if this is the correct place for the specific issue we're running into. My knowledge of BTI is very limited, so I can't say that it's not the kernel module code causing this.

Issue description

We're building the Google/Qualcomm provided android12-5.10 kernel for Xiaomi Qualcomm SM8450 devices (our kernel sources can be found here). The Google GKI defconfig enables BTI implicitly via some security-related configs and builds + runs fine if compiled with clang-r450784e from the Google prebuilts, which is based on LLVM 14.

Ever since we started building Android 14, which defaults to clang-r498229b (based on LLVM 17), devices are failing to boot to system unless BTI is disabled, which isn't an option for us because it makes it impossible to boot with a Google certified GKI boot image.

This is due to a kernel panic caused by a BTI violation:

[    2.672885][  T200] ipa 3e00000.qcom,ipa: Direct firmware load for ipa_fws.b04 failed with error -2
[    2.672894][  T200] ipa 3e00000.qcom,ipa: Falling back to sysfs fallback for: ipa_fws.b04
[    2.691145][  T784] type=1400 audit(42815046.691:11): avc:  denied  { read } for  comm="getprop" name="u:object_r:default_prop:s0" dev="tmpfs" ino=152 scontext=u:r:vendor_qti_init_shell:s0 tcontext=u:object_r:default_prop:s0 tclass=file permissive=0
[    2.733226][  T981] ipa ipa3_ioctl:2754 IPA not ready, waiting for init completion
[    2.753188][  T990] QSEECOM: qseecom_load_app: App (eseservice) does'nt exist, loading apps for first time
[    2.796023][  T990] QSEECOM: qseecom_load_app: App with id 4 (eseservice) now loaded
[    2.905986][  T200] ipa 3e00000.qcom,ipa: Direct firmware load for ipa_fws.b01 failed with error -2
[    2.905996][  T200] ipa 3e00000.qcom,ipa: Falling back to sysfs fallback for: ipa_fws.b01
[    2.908003][  T200] ipa 3e00000.qcom,ipa: Direct firmware load for ipa_fws.b02 failed with error -2
[    2.908009][  T200] ipa 3e00000.qcom,ipa: Falling back to sysfs fallback for: ipa_fws.b02
[    2.909632][  T200] ipa 3e00000.qcom,ipa: Direct firmware load for ipa_fws.b03 failed with error -2
[    2.909637][  T200] ipa 3e00000.qcom,ipa: Falling back to sysfs fallback for: ipa_fws.b03
[    2.914255][  T200] gsi soc:qcom,msm_gsi: gsi_register_device:1554 GSI irq is wake enabled 38
[    2.916797][  T200] ipa ipa3_rmnet_ctl_register_pm_client:743 rmnet_ctl register done
[    2.916816][  T200] ipa ipa3_rmnet_ll_register_pm_client:964 rmnet_ll register done
[    2.916932][  T200] ipa-lnx-stats ipa_spearhead_stats_ioctl_init:1949 IPA ipa_lnx_stats_ioctl major(465) initial ok :>>>>
[    2.916936][  T200] ipa-lnx-stats ipa_spearhead_stats_init:1977 IPA_LNX_STATS_IOCTL init success
[    2.917217][  T637] ipa-wan ipa3_wwan_register_netdev_pm_client:3477 rmnet_ipa%d register done
[    2.917414][  T637] ipa-wan ipa3_wwan_probe:3672 rmnet_ipa completed initialization
[    2.918361][    C0] Unexpected kernel BRK exception at EL1
[    2.918373][    C0] Internal error: BRK handler: f2005502 [#1] PREEMPT SMP
[    2.918399][    C0] Skip md ftrace buffer dump for: 0x1609e0
[    2.918408][    C0] Modules linked in: qca_cld3_qca6490(O) msm_drm(O) qdss_bridge usb_f_gsi coresight_tmc usb_f_qdss ipa_clientsm(O) machine_dlkm(O) wsa881x_dlkm(O) ipanetm(O) rndisipam(O) camera(O) swr_dmic_dlkm(O) wcd938x_dlkm(O) wcd937x_dlkm(O) mbhc_dlkm(O) ipam(O) lpass_cdc_rx_macro_dlkm(O) rmnet_perf(O) lpass_cdc_wsa_macro_dlkm(O) lpass_cdc_wsa2_macro_dlkm(O) lpass_cdc_va_macro_dlkm(O) lpass_cdc_tx_macro_dlkm(O) leds_qti_flash pinctrl_lpi_dlkm(O) msm_eva(O) wcd9xxx_dlkm(O) audio_pkt_dlkm(O) adsp_loader_dlkm(O) lpass_cdc_dlkm(O) swr_ctrl_dlkm(O) audio_prm_dlkm(O) slim_qcom_ngd_ctrl rmnet_aps(O) fts_touch_spi wcd937x_slave_dlkm(O) cs35l41_dlkm(O) rmnet_offload(O) icnss2 rmnet_shs(O) rmnet_perf_tether(O) wsa883x_dlkm(O) aw8697_haptic stm_p_basic charger_ulog_glink qti_battery_charger_main mhi_cntrl_qcom cnss2 qcedev_mod mhi_dev_drv mtdblock stm_p_ost spmi_glink_debug fsa4480_i2c spf_core_dlkm(O) msm_memshare coresight_stm bt_fm_slim gpr_dlkm(O) gsim(O) qti_battery_debug
[    2.918528][    C0]  qti_amoled_ecm stm_ftrace swr_haptics_dlkm(O) aw882xx_dlkm(O) msm_cvp(O) rmnet_core(O) q6_notifier_dlkm(O) stmvl53l5 msm_video(O) qcom_esoc stm_console coresight_tpdm wcd938x_slave_dlkm(O) qcom_pm8008_regulator goodix_fod coresight_funnel q6_dlkm(O) coresight_dummy qrtr_gunyah qcom_q6v5_pas rmnet_wlan(O) coresight_hwevent rmnet_sch(O) ep_pcie_drv msm_mmrm(O) hdmi_dlkm(O) coresight_remote_etm qcom_q6v5 coresight_replicator rmnet_ctl(O) ucsi_glink coresight_csr frpc_adsprpc altmode_glink wcd_core_dlkm(O) qcom_pil_info qrtr_mhi swr_dlkm(O) coresight_tpda synx_driver snd_usb_audio_qmi qrtr_smd coresight_tgu stm_core block2mtd hwmon chipreg memlat ir_spi qnoc_parrot nb7vpq904m cnss_plat_ipc_qmi_svc ofpart msm_sharedmem qcom_soc_wdt qce50 hdcp usb_bam cdsprm dwc3_msm heap_mem_ext_v01 snd_event_dlkm(O) q6_pdr_dlkm(O) coresight_cti stub_dlkm(O) audpkt_ion_dlkm(O) microdump_collector usb_f_diag msm_show_epoch wlan_firmware_service qcom_pon core_hang_detect qcom_i2c_pmic
[    2.918664][    C0]  radio_i2c_rtc6226_qca smp2p_sleepstate cnss_prealloc sys_pm_vx gh_virtio_backend atmel_mxt_ts nt36xxx_spi slimbus mtd_blkdevs qcom_spmi_adc5 icc_test coresight smp2p ipa_fmwk debugcc_diwali mi_thermal_interface sdhci_msm rdbg qsee_ipc_irq_bridge qcom_vadc_common rq_stats videocc_waipio msm_ext_display leds_qpnp_vibrator_ldo cpu_voltage_cooling dcc_v2 camcc_diwali rimps_log tz_log leds_qti_tri_led sysmon_subsystem_stats cdsp_loader phy_qcom_emu nt36xxx_i2c qti_cpufreq_cdev qcom_edac qcom_lpm qcom_sysmon guestvm_loader qcom_cpufreq_hw_debug subsystem_sleep_stats camcc_waipio phy_msm_ssusb_qmp mhi_dev_netdev qti_qmi_cdev sensors_ssc mhi_dev_net qti_devfreq_cdev sps_drv pm8941_pwrkey focaltech_fts qcom_cpuss_sleep_stats pinctrl_spmi_gpio bcl_soc msm_lmh_dcvs adsp_sleepmon eud f_fs_ipc_log glink_probe qcom_spmi_temp_alarm phy_qcom_ufs_qmp_14nm pwm_qti_lpg sdpm_clk qti_adc_tm qti_qmi_sensor_v2 policy_engine usb_f_ccid usb_f_cdev pmic_pon_log ddr_cdev hyp_core_ctl
[    2.918793][    C0]  qcom_logbuf_vendor_hooks hvc_gunyah msm_rng zram pinctrl_spmi_mpp i2c_msm_geni btpower gh_mem_notifier mtdoops phy_msm_snps_eusb2 pmic_glink repeater_i2c_eusb2 mhi_dev_uci qbt_handler qcom_ramdump sg qcom_ipc_lite pci_edma soc_sleep_stats cnss_nl qti_userspace_cdev i3c_master_msm_geni gplaf_scmi lzo gh_irq_lend plh_scmi shared_rail_scmi mhi_dev_dtr smcinvoke_mod plh_vendor shared_rail_vendor mhi gplaf_vendor xiaomi_touch boot_stats lt9611uxc hung_task_enh glink_pkt bam_dma qpnp_amoled_regulator spmi_pmic_arb_debug asix mtd qcom_sync_file phy_qcom_ufs_qmp_v4_parrot qcom_iommu_debug debugcc_waipio ehset pdr_interface phy_msm_snps_hs spi_msm_geni repeater synaptics_dsx ax88179_178a qmi_helpers qseecom_mod redriver panel_event_notifier qfprom_sys lvstest qpnp_pbs pci_msm_drv rproc_qcom_common cnss_utils qcom_smd phy_qcom_ufs_qmp_v3 gpucc_diwali videocc_diwali gpi qcom_glink_smem msm_show_resume_irq lzo_rle phy_qcom_ufs_qmp_v4_anarok phy_qcom_ufs_qmp_v4_lahaina
[    2.918927][    C0]  gpucc_waipio qcom_glink zsmalloc msm_sysstats msm_kgsl nvmem_qfprom msm_performance mdt_loader hwid nfc_i2c bcl_pmic5 c1dcvs_scmi c1dcvs_vendor qcom_rimps msm_qmp qcom_aoss mem_offline arm_smmu ufs_qcom ufshcd_crypto_qti stub_regulator rtc_pm8xxx qrtr qnoc_diwali qcom_reboot_reason spmi_pmic_arb qcom_spmi_pmic regmap_spmi qti_regmap_debugfs pmu_scmi pmu_vendor qcom_pmu_lib qcom_llcc_pmu qcom_iommu_util qcom_gic_intr_routing qcom_dload_mode pinctrl_diwali pinctrl_cape phy_qcom_ufs_qmp_v4_cape phy_qcom_ufs_qmp_v4_diwali phy_qcom_ufs_qmp_v4_waipio phy_qcom_ufs phy_generic nvmem_qcom_spmi_sdam msm_geni_serial msm_geni_se reboot_mode qti_fixed_regulator qnoc_waipio qnoc_qos pinctrl_waipio pinctrl_msm memory_dump_v2 mem_hooks mem_buf qcom_dma_heaps msm_dma_iommu_mapping mem_buf_dev secure_buffer mac80211 llcc_qcom kryo_arm64_edac iommu_logger gh_rm_drv gh_msgq gh_dbl gh_ctrl gh_arm_drv gcc_diwali dispcc_waipio dispcc_diwali cqhci crypto_qti_common crypto_qti_hwkm
[    2.919068][    C0]  hwkm tmecom_intf cfg80211 cpu_hotplug thermal_pause sched_walt qcom_cpufreq_hw bwmon qcom_dcvs dcvs_fp rpmh_regulator qcom_tsens qcom_pdc qcom_ipcc icc_rpmh socinfo icc_debug icc_bcm_voter gcc_waipio clk_dummy clk_qcom gdsc_regulator proxy_consumer debug_regulator clk_rpmh qcom_rpmh cmd_db qcom_ipc_logging qcom_cpu_vendor_hooks gh_virt_wdt qcom_wdt_core qcom_scm minidump smem qcom_hwspinlock
[    2.919139][    C0] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G S         O      5.10.205-gki-gc0dadeabae56 #2
[    2.919148][    C0] Hardware name: thor based on Qualcomm Technologies, Inc SM8475 (DT)
[    2.919155][    C0] pstate: 32400005 (nzCV daif +PAN -UAO +TCO BTYPE=--)
[    2.919510][    C0] pc : __cfi_check_fail+0x20/0x24 [ipam]
[    2.919524][    C0] lr : __cfi_slowpath+0x150/0x1ac
[    2.919530][    C0] sp : ffffffc008003c10
[    2.919537][    C0] x29: ffffffc008003c10 x28: 0000000000000040 
[    2.919547][    C0] x27: 000000000000004d x26: ffffffee006e4000 
[    2.919557][    C0] x25: 0000000000008300 x24: 0000000000000001 
[    2.919566][    C0] x23: ffffffee00511000 x22: ffffffee0024e000 
[    2.919575][    C0] x21: 0000000000000001 x20: 5ce0fbde194bb406 
[    2.919584][    C0] x19: ffffffee0064018c x18: ffffffc008005060 
[    2.919593][    C0] x17: 0000000000000001 x16: 00000000000002d5 
[    2.919602][    C0] x15: 0000000493e0eea8 x14: 00000000c4e2c6eb 
[    2.919611][    C0] x13: 0000000000007ffb x12: 0000000000004df4 
[    2.919621][    C0] x11: 000000000000ffff x10: 0000000000004cc5 
[    2.919630][    C0] x9 : 000ffffffee00511 x8 : 8000000000000000 
[    2.919639][    C0] x7 : 0000000000000000 x6 : ffffffc008003b40 
[    2.919648][    C0] x5 : 0000000000000028 x4 : 0000000000000000 
[    2.919657][    C0] x3 : ffffff8051389638 x2 : 0000000000000000 
[    2.919666][    C0] x1 : ffffffee0064018c x0 : 0000000000000000 
[    2.919675][    C0] Call trace:
[    2.920032][    C0]  __cfi_check_fail+0x20/0x24 [ipam]
[    2.920270][    C0]  ipa_pkt_status_parse_v5_0+0xf8/0x100 [ipam]
[    2.920507][    C0]  ipahal_pkt_status_parse+0xb4/0x170 [ipam]
[    2.920742][    C0]  ipa3_lan_rx_pyld_hdlr+0x17c/0x1138 [ipam]
[    2.920979][    C0]  ipa3_wq_rx_common+0x58/0x258 [ipam]
[    2.921214][    C0]  ipa3_lan_rx_poll+0x19c/0x444 [ipam]
[    2.921450][    C0]  ipa3_lan_poll+0x2c/0x3c [ipam]
[    2.921460][    C0]  net_rx_action+0x144/0x4f8
[    2.921469][    C0]  __do_softirq+0x12c/0x4cc
[    2.921477][    C0]  __irq_exit_rcu.llvm.430401720403190533+0xd4/0xec
[    2.921485][    C0]  __handle_domain_irq+0xa0/0x158
[    2.921491][    C0]  gic_handle_irq+0x5c/0x134
[    2.921498][    C0]  el1_irq+0xe4/0x1c0
[    2.921506][    C0]  cpuidle_enter_state+0x1d0/0x60c
[    2.921512][    C0]  cpuidle_enter+0x40/0x5c
[    2.921520][    C0]  do_idle.llvm.1008059615773262600+0x1e4/0x2c4
[    2.921527][    C0]  cpu_startup_entry+0x2c/0x30
[    2.921535][    C0]  kernel_init+0x0/0x1a0
[    2.921543][    C0]  start_kernel+0x0/0x4f8
[    2.921549][    C0]  start_kernel+0x3d8/0x4f8
[    2.921558][    C0] Code: 7100151f d50323bf 54000043 d65f03c0 (d42aa040) 
[    2.921564][    C0] ---[ end trace aa2532f5dc3038d4 ]---
[    2.921573][    C0] Kernel panic - not syncing: BRK handler: Fatal exception in interrupt
[    2.921583][    C0] SMP: stopping secondary CPUs

The relevant code for this is Qualcomm's IPA implementation, here's a link to the specific method in the Qualcomm sources and one to it in our module sources. I don't see any obvious issues here, and I believe this is likely to be a false positive by LLVM?

Just to be sure, I've also compiled the kernel with clang-r522817, which is based on LLVM 18, and the issue persists there.

Again, if this is the wrong place to report this, I'd appreciate if you could direct me towards where to report this issue. Thanks!

nathanchance commented 2 months ago

@samitolvanen does this issue ring a bell? I am not sure if it is just a straight CFI failure or if there is something else here. Based on the source, this is not kCFI, so maybe regular CFI regressed in clang?

samitolvanen commented 2 months ago

This looks like a CFI failure to me, nothing suggests it's related to BTI. It's possible that previous Clang versions inlined the function being called, or converted the indirect call into direct call, and newer versions no longer do that, thus tripping CFI. I realize the error message isn't very informative, but can you identify which function is getting called here?

ArianK16a commented 2 months ago

I got a decoded stacktrace here. However, at least i, don't really gain more information from this:

[    3.116724][  T201] ipa 3e00000.qcom,ipa: Falling back to sysfs fallback for: ipa_fws.b01
[    3.117979][  T201] ipa 3e00000.qcom,ipa: Direct firmware load for ipa_fws.b02 failed with error -2
[    3.117983][  T201] ipa 3e00000.qcom,ipa: Falling back to sysfs fallback for: ipa_fws.b02
[    3.119170][  T201] ipa 3e00000.qcom,ipa: Direct firmware load for ipa_fws.b03 failed with error -2
[    3.119177][  T201] ipa 3e00000.qcom,ipa: Falling back to sysfs fallback for: ipa_fws.b03
[    3.125174][  T201] gsi soc:qcom,msm_gsi: gsi_register_device:1554 GSI irq is wake enabled 38
[    3.125260][  T201] gsi soc:qcom,msm_gsi: saved msi 0 msg data 0 addr 0x0000000017150040
[    3.125291][  T201] gsi soc:qcom,msm_gsi: saved msi 1 msg data 1 addr 0x0000000017150040
[    3.127880][  T201] ipa ipa3_rmnet_ctl_register_pm_client:743 rmnet_ctl register done
[    3.127906][  T201] ipa ipa3_rmnet_ll_register_pm_client:964 rmnet_ll register done
[    3.128008][  T201] ipa-lnx-stats ipa_spearhead_stats_ioctl_init:1953 IPA ipa_lnx_stats_ioctl major(465) initial ok :>>>>
[    3.128013][  T201] ipa-lnx-stats ipa_spearhead_stats_init:1981 IPA_LNX_STATS_IOCTL init success
[    3.128563][  T712] ipa-wan ipa3_wwan_register_netdev_pm_client:3477 rmnet_ipa%d register done
[    3.128886][  T712] ipa-wan ipa3_wwan_probe:3672 rmnet_ipa completed initialization
[    3.129763][    C0] Unexpected kernel BRK exception at EL1
[    3.129778][    C0] Internal error: BRK handler: f2005502 [#1] PREEMPT SMP
[    3.129806][    C0] Skip md ftrace buffer dump for: 0x1609e0
[    3.129817][    C0] Modules linked in: qca_cld3_qca6490(O) ipa_clientsm(O) machine_dlkm(O) ipanetm(O) swr_dmic_dlkm(O) wsa881x_dlkm(O) rndisipam(O) msm_drm(O) camera(O) wcd938x_dlkm(O) wcd937x_dlkm(O) mbhc_dlkm(O) coresight_tmc lpass_cdc_rx_macro_dlkm(O) lpass_cdc_wsa_macro_dlkm(O) lpass_cdc_va_macro_dlkm(O) lpass_cdc_wsa2_macro_dlkm(O) qdss_bridge usb_f_gsi usb_f_qdss lpass_cdc_tx_macro_dlkm(O) hdcp rmnet_perf(O) wcd9xxx_dlkm(O) adsp_loader_dlkm(O) swr_ctrl_dlkm(O) audio_pkt_dlkm(O) pinctrl_lpi_dlkm(O) lpass_cdc_dlkm(O) audio_prm_dlkm(O) coresight_tpdm cs35l41_dlkm(O) rmnet_offload(O) rmnet_shs(O) msm_eva(O) rmnet_perf_tether(O) rmnet_aps(O) spf_core_dlkm(O) qcedev_mod gpr_dlkm(O) wsa883x_dlkm(O) msm_cvp(O) nt36xxx_i2c i3c_master_msm_geni pm8941_pwrkey stm_p_ost stm_p_basic qrtr_gunyah icnss2 mhi_dev_drv audpkt_ion_dlkm(O) shared_rail_scmi coresight_remote_etm leds_qti_tri_led stm_console ofpart smcinvoke_mod aw882xx_dlkm(O) focaltech_fts mtdblock wcd938x_slave_dlkm(O)
[    3.129957][    C0]  msm_video(O) rmnet_wlan(O) ipam(O) q6_notifier_dlkm(O) goodix_fod rmnet_sch(O) fts_touch_spi qcom_pm8008_regulator qnoc_parrot rmnet_core(O) qrtr_smd stm_ftrace wcd937x_slave_dlkm(O) coresight_replicator coresight_stm aw8697_haptic hdmi_dlkm(O) qrtr_mhi qcom_q6v5_pas coresight_tpda coresight_hwevent coresight_cti swr_haptics_dlkm(O) bt_fm_slim rmnet_ctl(O) qcom_pon usb_bam stub_dlkm(O) qcom_cpufreq_hw_debug gsim(O) q6_dlkm(O) qcom_q6v5 qti_devfreq_cdev snd_usb_audio_qmi wcd_core_dlkm(O) qcom_soc_wdt snd_event_dlkm(O) block2mtd qti_userspace_cdev usb_f_diag spmi_glink_debug qcom_logbuf_vendor_hooks q6_pdr_dlkm(O) coresight_csr msm_mmrm(O) coresight_funnel mtdoops mtd_blkdevs chipreg icc_test nb7vpq904m leds_qti_flash stm_core qce50 qti_qmi_sensor_v2 qcom_spmi_adc5 shared_rail_vendor ddr_cdev mhi_dev_net qcom_lpm usb_f_cdev cpu_voltage_cooling dcc_v2 qti_amoled_ecm fsa4480_i2c ehset radio_i2c_rtc6226_qca gh_mem_notifier charger_ulog_glink qti_battery_charger_main
[    3.130109][    C0]  rdbg mhi_cntrl_qcom swr_dlkm(O) coresight_dummy coresight_tgu dwc3_msm msm_lmh_dcvs policy_engine coresight slim_qcom_ngd_ctrl ipa_fmwk tz_log sdpm_clk qcom_vadc_common btpower qcom_pil_info pmic_pon_log bcl_soc sdhci_msm qfprom_sys mtd slimbus qti_battery_debug qcom_spmi_temp_alarm sys_pm_vx qsee_ipc_irq_bridge hwmon zram nt36xxx_spi hvc_gunyah qseecom_mod gh_virtio_backend synx_driver qcom_edac lzo sps_drv mi_thermal_interface ep_pcie_drv smp2p_sleepstate msm_rng qcom_esoc pwm_qti_lpg ucsi_glink qpnp_amoled_regulator camcc_waipio qti_cpufreq_cdev cdsp_loader cnss2 gh_irq_lend altmode_glink qti_adc_tm sg leds_qpnp_vibrator_ldo debugcc_diwali qcom_ipc_lite msm_sharedmem rq_stats i2c_msm_geni smp2p adsp_sleepmon cnss_nl synaptics_dsx ax88179_178a sysmon_subsystem_stats xiaomi_touch qcom_sysmon msm_show_epoch usb_f_ccid cdsprm qpnp_pbs boot_stats wlan_firmware_service mhi_dev_uci mhi_dev_netdev frpc_adsprpc cnss_plat_ipc_qmi_svc phy_msm_snps_eusb2 mhi_dev_dtr
[    3.130274][    C0]  pmic_glink mhi microdump_collector qti_qmi_cdev ir_spi atmel_mxt_ts repeater_i2c_eusb2 phy_msm_ssusb_qmp repeater qcom_ramdump subsystem_sleep_stats msm_memshare pinctrl_spmi_mpp f_fs_ipc_log debugcc_waipio phy_msm_snps_hs pdr_interface qcom_iommu_debug heap_mem_ext_v01 redriver videocc_waipio phy_qcom_ufs_qmp_v3 plh_scmi lvstest asix cnss_utils phy_qcom_emu cnss_prealloc gpucc_waipio eud memlat rimps_log qcom_i2c_pmic qcom_cpuss_sleep_stats soc_sleep_stats glink_probe plh_vendor spi_msm_geni msm_ext_display guestvm_loader videocc_diwali qmi_helpers qcom_sync_file core_hang_detect glink_pkt spmi_pmic_arb_debug sensors_ssc hung_task_enh panel_event_notifier pinctrl_spmi_gpio camcc_diwali phy_qcom_ufs_qmp_v4_lahaina pci_msm_drv phy_qcom_ufs_qmp_14nm gpucc_diwali msm_show_resume_irq lt9611uxc rproc_qcom_common gplaf_scmi pci_edma hyp_core_ctl phy_qcom_ufs_qmp_v4_parrot phy_qcom_ufs_qmp_v4_anarok qcom_smd gplaf_vendor bam_dma gpi lzo_rle qbt_handler qcom_glink_smem
[    3.130424][    C0]  qcom_glink zsmalloc msm_sysstats msm_kgsl nvmem_qfprom msm_performance mdt_loader hwid nfc_i2c bcl_pmic5 c1dcvs_scmi c1dcvs_vendor qcom_rimps msm_qmp qcom_aoss mem_offline arm_smmu ufs_qcom ufshcd_crypto_qti stub_regulator rtc_pm8xxx qrtr qnoc_diwali qcom_reboot_reason spmi_pmic_arb qcom_spmi_pmic regmap_spmi qti_regmap_debugfs pmu_scmi pmu_vendor qcom_pmu_lib qcom_llcc_pmu qcom_iommu_util qcom_gic_intr_routing qcom_dload_mode pinctrl_diwali pinctrl_cape phy_qcom_ufs_qmp_v4_cape phy_qcom_ufs_qmp_v4_diwali phy_qcom_ufs_qmp_v4_waipio phy_qcom_ufs phy_generic nvmem_qcom_spmi_sdam msm_geni_serial msm_geni_se reboot_mode qti_fixed_regulator qnoc_waipio qnoc_qos pinctrl_waipio pinctrl_msm memory_dump_v2 mem_hooks mem_buf qcom_dma_heaps msm_dma_iommu_mapping mem_buf_dev secure_buffer mac80211 llcc_qcom kryo_arm64_edac iommu_logger gh_rm_drv gh_msgq gh_dbl gh_ctrl gh_arm_drv gcc_diwali dispcc_waipio dispcc_diwali cqhci crypto_qti_common crypto_qti_hwkm hwkm tmecom_intf
[    3.130580][    C0]  cfg80211 cpu_hotplug thermal_pause sched_walt qcom_cpufreq_hw bwmon qcom_dcvs dcvs_fp rpmh_regulator qcom_tsens qcom_pdc qcom_ipcc icc_rpmh socinfo icc_debug icc_bcm_voter gcc_waipio clk_dummy clk_qcom gdsc_regulator proxy_consumer debug_regulator clk_rpmh qcom_rpmh cmd_db qcom_ipc_logging qcom_cpu_vendor_hooks gh_virt_wdt qcom_wdt_core qcom_scm minidump smem qcom_hwspinlock
[    3.130659][    C0] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G S         O      5.10.209-gki-g93023ce89c2e #4
[    3.130666][    C0] Hardware name: Zeus based on Qualcomm Technologies, Inc SM8450 (DT)
[    3.130675][    C0] pstate: 32400005 (nzCV daif +PAN -UAO +TCO BTYPE=--)
[ 3.130991][ C0] pc : __cfi_check_fail+0x20/0x24 ipam
[ 3.131005][ C0] lr : __cfi_slowpath ([...]/kernel/xiaomi/sm8450/kernel/cfi.c:330) 
[    3.131012][    C0] sp : ffffffc008003c10
[    3.131018][    C0] x29: ffffffc008003c10 x28: 0000000000000040
[    3.131029][    C0] x27: 000000000000004d x26: ffffffdf8f2a6000
[    3.131039][    C0] x25: 0000000000008300 x24: 0000000000000001
[    3.131049][    C0] x23: ffffffdf8f0d3000 x22: ffffffdf9305e000
[    3.131059][    C0] x21: 0000000000000001 x20: 5ce0fbde194bb406
[    3.131069][    C0] x19: ffffffdf8f202194 x18: ffffffc008005060
[    3.131079][    C0] x17: 0000000000000174 x16: 0000000000000174
[    3.131088][    C0] x15: 000000000000000b x14: ffffffdf9302da18
[    3.131098][    C0] x13: 0000000000007ffb x12: 0000000000002a7c
[    3.131108][    C0] x11: 000000000000ffff x10: 000000000000294d
[    3.131117][    C0] x9 : 000ffffffdf8f0d3 x8 : 8000000000000000
[    3.131127][    C0] x7 : 0000000000000000 x6 : ffffff804636215d
[    3.131136][    C0] x5 : 0000000000000028 x4 : 0000000000000000
[    3.131145][    C0] x3 : ffffff804bba7e08 x2 : 0000000000000000
[    3.131154][    C0] x1 : ffffffdf8f202194 x0 : 0000000000000000
[    3.131164][    C0] Call trace:
[ 3.131446][ C0] __cfi_check_fail+0x20/0x24 ipam
[ 3.131730][ C0] ipa_pkt_status_parse_v5_0 ([...]/kernel/xiaomi/sm8450-modules/qcom/opensource/dataipa/drivers/platform/msm/ipa/ipa_v3/ipahal/ipahal.c:0) ipam
[ 3.132012][ C0] ipahal_pkt_status_parse ([...]/kernel/xiaomi/sm8450-modules/qcom/opensource/dataipa/drivers/platform/msm/ipa/ipa_v3/ipahal/ipahal.c:1651) ipam
[ 3.132291][ C0] ipa3_lan_rx_pyld_hdlr ([...]/kernel/xiaomi/sm8450-modules/qcom/opensource/dataipa/drivers/platform/msm/ipa/ipa_v3/ipa_dp.c:3811) ipam
[ 3.132572][ C0] ipa3_wq_rx_common ([...]/kernel/xiaomi/sm8450-modules/qcom/opensource/dataipa/drivers/platform/msm/ipa/ipa_v3/ipa_dp.c:4659) ipam
[ 3.132851][ C0] ipa3_lan_rx_poll ([...]/kernel/xiaomi/sm8450-modules/qcom/opensource/dataipa/drivers/platform/msm/ipa/ipa_v3/ipa_dp.c:0) ipam
[ 3.133131][ C0] ipa3_lan_poll ([...]/kernel/xiaomi/sm8450-modules/qcom/opensource/dataipa/drivers/platform/msm/ipa/ipa_v3/ipa.c:8769) ipam
[ 3.133141][ C0] net_rx_action ([...]/kernel/xiaomi/sm8450/net/core/dev.c:6852 [...]/kernel/xiaomi/sm8450/net/core/dev.c:6922) 
[ 3.133150][ C0] __do_softirq ([...]/kernel/xiaomi/sm8450/arch/arm64/include/asm/jump_label.h:21 [...]/kernel/xiaomi/sm8450/include/linux/jump_label.h:223 [...]/kernel/xiaomi/sm8450/include/trace/events/irq.h:142 [...]/kernel/xiaomi/sm8450/kernel/softirq.c:306) 
[ 3.133160][ C0] __irq_exit_rcu.llvm.10073472537251660674 ([...]/kernel/xiaomi/sm8450/include/linux/interrupt.h:0 [...]/kernel/xiaomi/sm8450/kernel/softirq.c:402 [...]/kernel/xiaomi/sm8450/kernel/softirq.c:432) 
[ 3.133168][ C0] __handle_domain_irq ([...]/kernel/xiaomi/sm8450/kernel/softirq.c:457 [...]/kernel/xiaomi/sm8450/kernel/irq/irqdesc.c:699) 
[ 3.133176][ C0] gic_handle_irq ([...]/kernel/xiaomi/sm8450/include/linux/irqdesc.h:170 [...]/kernel/xiaomi/sm8450/drivers/irqchip/irq-gic.c:372) 
[ 3.133183][ C0] el1_irq ([...]/kernel/xiaomi/sm8450/arch/arm64/kernel/entry.S:777) 
[ 3.133193][ C0] cpuidle_enter_state ([...]/kernel/xiaomi/sm8450/drivers/cpuidle/cpuidle.c:272) 
[ 3.133200][ C0] cpuidle_enter ([...]/kernel/xiaomi/sm8450/drivers/cpuidle/cpuidle.c:366) 
[ 3.133210][ C0] do_idle.llvm.17894169910830688745 ([...]/kernel/xiaomi/sm8450/kernel/sched/idle.c:160 [...]/kernel/xiaomi/sm8450/kernel/sched/idle.c:241 [...]/kernel/xiaomi/sm8450/kernel/sched/idle.c:302) 
[ 3.133217][ C0] cpu_startup_entry ([...]/kernel/xiaomi/sm8450/kernel/sched/idle.c:397) 
[ 3.133226][ C0] kernel_init ([...]/kernel/xiaomi/sm8450/init/main.c:1392) 
[ 3.133235][ C0] start_kernel ([...]/kernel/xiaomi/sm8450/init/main.c:841) 
[ 3.133242][ C0] start_kernel ([...]/kernel/xiaomi/sm8450/init/main.c:1041) 
[ 3.133252][ C0] Code: 7100151f d50323bf 54000043 d65f03c0 (d42aa040)
All code
========
   0:   1f                      (bad)
   1:   15 00 71 bf 23          adc    $0x23bf7100,%eax
   6:   03 d5                   add    %ebp,%edx
   8:   43 00 00                rex.XB add %al,(%r8)
   b:   54                      push   %rsp
   c:   c0 03 5f                rolb   $0x5f,(%rbx)
   f:   d6                      (bad)
  10:*  40                      rex     <-- trapping instruction
  11:   a0                      .byte 0xa0
  12:   2a d4                   sub    %ah,%dl

Code starting with the faulting instruction
===========================================
   0:   40                      rex
   1:   a0                      .byte 0xa0
   2:   2a d4                   sub    %ah,%dl
[    3.133261][    C0] ---[ end trace 1a0cd8d5ea32a76c ]---
[    3.133270][    C0] Kernel panic - not syncing: BRK handler: Fatal exception in interrupt

Unsetting CONFIG_CFI_CLANG and keeping CONFIG_ARM64_BTI enabled is also sufficient for the device to boot again, without said panic.

samitolvanen commented 2 months ago

Looks like ipa_pkt_status_parse_v5_0 attempts to perform an indirect call where the target function type doesn't match the function pointer type. In order to fix this, you'll need to figure out which function it attempts to call and change the function signature to match the pointer. If you can't find the function just by reading the source code, you can always add a simple pr_warn("calling function %pS", function_pointer) statement before the call.

1: 15 00 71 bf 23 adc $0x23bf7100,%eax

I think you'd want to decode this into arm64 assembly instead. However, since the faulting instruction is in the CFI error handler in the module, this probably won't be of much help anyway.

Gelbpunkt commented 2 months ago

Looks like ipa_pkt_status_parse_v5_0 attempts to perform an indirect call where the target function type doesn't match the function pointer type. In order to fix this, you'll need to figure out which function it attempts to call and change the function signature to match the pointer. If you can't find the function just by reading the source code, you can always add a simple pr_warn("calling function %pS", function_pointer) statement before the call.

1: 15 00 71 bf 23 adc $0x23bf7100,%eax

I think you'd want to decode this into arm64 assembly instead. However, since the faulting instruction is in the CFI error handler in the module, this probably won't be of much help anyway.

Thanks, this was rather helpful. With some logs I was able to track the issue down: It is this function call that triggers the CFI error: [ 2.980509][ C0] calling function __ipa_parse_gen_pkt_v5_0.ce1aea28ac0c05872853d0c83c2cb671.cfi_jt+0x0/0x4 [ipam]

The method signature of the calling method is

static void ipa_pkt_status_parse_v5_0(
    const void *unparsed_status, struct ipahal_pkt_status *status)

and the signature of the method it is calling here is


static void __ipa_parse_gen_pkt_v5_0(struct ipahal_pkt_status *status,
                const void *unparsed_status)

with the arguments provided being the same as received in the function parameters:

    void (*__parse_gen_pkt)(struct ipahal_pkt_status *status,
                const void *unparsed_status);
[snip]
        ipahal_pkt_status_objs[ipahal_ctx->hw_type].\
            __parse_gen_pkt(status, unparsed_status);

To me, this looks like the types are perfectly matched. Any idea what is wrong here? Does CFI not like the void pointers?

Gelbpunkt commented 2 months ago

We just discovered that there is more than just BTI that triggers this. LTO is another way to cause this:

samitolvanen commented 2 months ago

OK, since the function types do match, it sounds like you're running into some kind of a Clang CFI bug here. I have seen this before when the function declaration didn't match the definition, so you might want to double check that there are no subtle differences there.

Full LTO and/or dropping BTI probably causes the compiler to optimize this differently. If the call becomes a direct call during optimization or the called function is completely inlined, the CFI check is dropped and you won't see the failure.