espressif / esp-hosted

Hosted Solution (Linux/MCU) with ESP32 (Wi-Fi + BT + BLE)
Other
703 stars 166 forks source link

[esp-hosted-ng] Kernel Oops w/sdio driver on IMX93 #482

Closed mdeneen closed 1 month ago

mdeneen commented 1 month ago

Checklist

How often does this bug occurs?

often

Expected behavior

I expected the OS to detect a card and interface with the esp-hosted sdio driver, providing a wlan0 interface.

Actual behavior (suspected bug)

When inserting the driver, the kernel sometimes oopses. The board is based on an imx93 reference design, with an esp32c6 mounted on the PCB (read: trace lengths are equal in length and very short).

I also modified cmd_set_ip_address to print out some debug between each line of code. It oopses right away, so either priv or priv->adapter point to garbage. It seems like the function shouldn't even be called for the lo interface, and that may be part of the problem.

Error logs or terminal output

[   91.747075] mmc2: new high speed SDIO card at address 0001
[   91.933242] esp32_sdio: esp_probe: ESP network device detected
[   91.933457] esp32_sdio: get_firmware_data: Rx Pre ====== 0
[   91.933468] esp32_sdio: get_firmware_data: Rx Pos ======  0
[   91.933494] esp32_sdio: get_firmware_data: Tx Pre ======  0
[   91.933500] esp32_sdio: get_firmware_data: Tx Pos ======  10
[   91.934627] esp32_sdio: probe of mmc2:0001:2 failed with error -22
[   91.937873] esp32_sdio: process_esp_bootup_event: Received ESP bootup event
[   91.937895] esp32_sdio: process_event_esp_bootup: Bootup Event tag: 3
[   91.937900] esp32_sdio: esp_validate_chipset: Chipset=ESP32-C6 ID=0d detected over SDIO
[   91.937905] esp32_sdio: process_event_esp_bootup: Bootup Event tag: 0
[   91.937909] esp32_sdio: process_event_esp_bootup: Bootup Event tag: 1
[   91.937912] esp32_sdio: process_fw_data: ESP chipset's last reset cause:
[   91.937916] esp32_sdio: print_reset_reason: POWERON_RESET
[   91.937921] esp32_sdio: check_esp_version: ESP-Hosted Version: NG-1.0.3.0.0
[   91.938687] esp32_sdio: esp_reg_notifier: Driver init is ongoing
[   91.951543] esp32_sdio: tx_process: not ready
[   92.279484] esp32_sdio: init_bt: ESP Bluetooth init
[   92.280547] esp32_sdio: print_capabilities: Capabilities: 0xd. Features supported are:
[   92.280569] esp32_sdio: print_capabilities:  * WLAN on SDIO
[   92.280572] esp32_sdio: print_capabilities:  * BT/BLE
[   92.280576] esp32_sdio: print_capabilities:    - HCI over SDIO
[   92.280579] esp32_sdio: print_capabilities:    - BLE only
[   92.441249] audit: type=1334 audit(1726165376.450:125): prog-id=39 op=LOAD
[   92.442093] audit: type=1334 audit(1726165376.450:126): prog-id=40 op=LOAD
[   92.442115] audit: type=1334 audit(1726165376.450:127): prog-id=41 op=LOAD
[   92.468193] esp32_sdio: esp_inetaddr_event: NETDEV_UP interface lo ip changed to  127.000.000.001
[   92.468218] 1
[   92.468240] Unable to handle kernel paging request at virtual address 000001a40000015c
[   92.468245] Mem abort info:
[   92.468247]   ESR = 0x0000000096000004
[   92.468250]   EC = 0x25: DABT (current EL), IL = 32 bits
[   92.468254]   SET = 0, FnV = 0
[   92.468257]   EA = 0, S1PTW = 0
[   92.468260]   FSC = 0x04: level 0 translation fault
[   92.468263] Data abort info:
[   92.468265]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[   92.468268]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[   92.468271]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[   92.468275] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000088656000
[   92.468278] [000001a40000015c] pgd=0000000000000000, p4d=0000000000000000
[   92.468287] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[   92.474545] Modules linked in: esp32_sdio(O) can_raw can xt_conntrack xt_MASQUERADE xt_addrtype br_netfilter rpmsg_ctrl rpmsg_char imx_rpmsg_tty nft_ct nft_chain_nat ip6table_nat ip6table_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle ip6table_filter ip6_tables iptable_filter ip_tables x_tables option qmi_wwan usb_wwan cdc_wdm ti_adc081c layerscape_edac_mod crct10dif_ce polyval_ce polyval_generic flexcan can_dev iio_rescale overlay fuse
[   92.516424] CPU: 1 PID: 882 Comm: (ostnamed) Tainted: G           O       6.6.23-lts-next #1
[   92.524846] Hardware name: IMX93 (DT)
[   92.529968] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   92.536918] pc : cmd_set_ip_address+0x38/0x1c4 [esp32_sdio]
[   92.542499] lr : cmd_set_ip_address+0x34/0x1c4 [esp32_sdio]
[   92.548072] sp : ffff8000838138b0
[   92.551374] x29: ffff8000838138b0 x28: ffff000005a3bb00 x27: 0000000000000000
[   92.558498] x26: ffff800082197ae0 x25: ffff0000058c3980 x24: ffff80007a3f5da8
[   92.565622] x23: ffff0000058c3980 x22: ffff80007a3f5d48 x21: 000000000100007f
[   92.572746] x20: ffff000004caf880 x19: ffff0000058c3980 x18: fffffffffffeac80
[   92.579870] x17: 65676e6168632070 x16: 69206f6c20656361 x15: 0000000000000010
[   92.586994] x14: 0000000000000000 x13: ffff8000820510f0 x12: 00000000000006ff
[   92.592687] Bluetooth: MGMT ver 1.22
[   92.594118] x11: 0000000000000255 x10: ffff8000820a90f0 x9 : ffff8000820510f0
[   92.604799] x8 : 00000000ffffefff x7 : ffff8000820a90f0 x6 : 80000000fffff000
[   92.611928] x5 : ffff00007fbaed48 x4 : 0000000000000000 x3 : 0000000000000000
[   92.619052] x2 : 0000000000000000 x1 : ffff000005a3bb00 x0 : 000001a400000004
[   92.626177] Call trace:
[   92.628613]  cmd_set_ip_address+0x38/0x1c4 [esp32_sdio]
[   92.633837]  esp_inetaddr_event+0xc0/0x15c [esp32_sdio]
[   92.639063]  blocking_notifier_call_chain+0x6c/0xa0
[   92.643934]  __inet_insert_ifa+0x23c/0x330
[   92.648024]  inet_rtm_newaddr+0x1e8/0x288
[   92.652020]  rtnetlink_rcv_msg+0x128/0x378
[   92.656111]  netlink_rcv_skb+0x60/0x130
[   92.659941]  rtnetlink_rcv+0x18/0x24
[   92.663503]  netlink_unicast+0x300/0x36c
[   92.667412]  netlink_sendmsg+0x1a8/0x420
[   92.671321]  __sys_sendto+0x118/0x180
[   92.674978]  __arm64_sys_sendto+0x28/0x38
[   92.678973]  invoke_syscall+0x48/0x110
[   92.682717]  el0_svc_common.constprop.0+0xc0/0xe0
[   92.687414]  do_el0_svc+0x1c/0x28
[   92.690717]  el0_svc+0x40/0xe4
[   92.693767]  el0t_64_sync_handler+0x120/0x12c
[   92.698109]  el0t_64_sync+0x190/0x194
[   92.701762] Code: a9025bf5 2a0103f5 95738f3a f9438e60 (f940ac00) 
[   92.707843] ---[ end trace 0000000000000000 ]---
[   92.719862] NET: Registered PF_ALG protocol family

Steps to reproduce the behavior

  1. Flash esp32 with sdio firmware

modprobe esp32_sdio resetpin=608 ip link set up wlan0 wpa_supplicant -c /etc/wpa_supplicant.conf -i wlan0& udhcpc -i wlan0

Project release version

20d939491fb20841ae0b56221f249be02ad0ac69

System architecture

other (details in Additional context)

Operating system

Linux

Operating system version

Yocto 6.6 (scarthgap), kernel 6.6.23

Shell

Bash

Additional context

[ 91.934627] esp32_sdio: probe of mmc2:0001:2 failed with error -22 looks suspicious to me, as does attempting to do things in the esp_sdio driver when the loopback interface receives a link event.

I seem to have the ability to influence the behavior by messing around with clockspeed parameter.

One thing to note is that my reset pin controls the power supply to the esp32c6 chip. I've modified the reset logic to turn off the supply, turn it back on, and leave it on instead of toggling it and returning it to an input.

mdeneen commented 1 month ago

Just for fun, I set the bus-width to 1 in my device tree and experienced the same result.

mantriyogesh commented 1 month ago

@kapilkedawat ++

mdeneen commented 1 month ago

Some more notes / questions:

The bootstrap pins MTMS and MTDI are floating on our board. Are these signals used only for SDIO boot or are they also used to tune SDIO communication?

mantriyogesh commented 1 month ago

the question is are you through with the porting process? Your earlier kernel crash was in cmd_set_ip_address.

For your question,

The bootstrap pins MTMS and MTDI are floating on our board. Are these signals used only for SDIO boot or are they also used to tune SDIO communication?

For esp32-c6 , it is not needed. there is no conflict between the SDIO interface and the bootstrap process, unlike esp32.

We recommend using pcb for sdio. As it greatly avoids signal integrity issues.

As long as the wire used as tiny(<5cm), equal length for all signals, and expected pull-ups are in place, we are good.

The issue focus is still unclear although. Can you explain current issue you face and logs both sides?

mdeneen commented 1 month ago

Hi Yogesh,

Yes, I have gone through the porting process and I was sometimes able to use it to connect to an AP and ping the AP itself -- the cmd_set_ip_address does not always crash.

Our esp32-c6 is on the PCB with the required pull-ups:

image

image

You can see that the esp32-c6 is very close to the IMX93 -- maybe even less than 1cm away. You can see one pull-up, for CMD, on the top of the board. The other four are on the bottom:

image

Both SPI and SDIO are connected, as we are hoping to use SDIO but can also use SPI if needed. I'll gather logs from both ends today, but, in the mean time, this is what happens on the host side when any meaningful amount of data is sent over the SDIO interface:

root@imx93:~# iperf -c 192.168.2.10 -p 8000

Client connecting to 192.168.2.10, TCP port 8000 TCP window size: 85.0 KByte (default)

[ 3] local 192.168.2.31 port 35710 connected with 192.168.2.10 port 8000

<3>[60133.607379] mmc2: sdhci: Resp[0]: 0x00001000 | Resp[1]: 0x00000000 <3>[60133.607382] mmc2: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000 <3>[60133.607385] mmc2: sdhci: Host ctl2: 0x00000000 <3>[60133.607389] mmc2: sdhci: ADMA Err: 0x00000003 | ADMA Ptr: 0x847c6204 <3>[60133.607392] mmc2: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP ========= <3>[60133.607395] mmc2: sdhci-esdhc-imx: cmd debug status: 0x3100 <3>[60133.607399] mmc2: sdhci-esdhc-imx: data debug status: 0x32a0 <3>[60133.607402] mmc2: sdhci-esdhc-imx: trans debug status: 0x33a2 <3>[60133.607405] mmc2: sdhci-esdhc-imx: dma debug status: 0x3400 <3>[60133.607409] mmc2: sdhci-esdhc-imx: adma debug status: 0x35b4 <3>[60133.607412] mmc2: sdhci-esdhc-imx: fifo debug status: 0x3680 <3>[60133.607415] mmc2: sdhci-esdhc-imx: async fifo debug status: 0x3750 <3>[60133.607418] mmc2: sdhci: ============================================ [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.3 sec 107 KBytes 85.3 Kbits/sec root@imx93:~# wlan0: CTRL-EVENT-DISCONNECTED bssid=24:5a:4c:13:af:69 reason=3 locally_generated=1 <3>[60138.727338] esp32_sdio: wait_and_decode_cmd_resp: Command[0x6] timed out <3>[60138.727363] esp32_sdio: cmd_disconnect_request: wait_and_decode_cmd_resp(priv, cmd_node) failure, ret: -22 root@imx93:~# <3>[60143.851336] esp32_sdio: wait_and_decode_cmd_resp: Command[0xF] timed out <3>[60143.851361] esp32_sdio: cmd_get_tx_power: wait_and_decode_cmd_resp(priv, cmd_node) failure, ret: -22 <3>[60143.851563] esp32_sdio: prepare_command_request: command queue init is not done yet <3>[60143.851579] esp32_sdio: cmd_get_tx_power: Failed to get command node
mdeneen commented 1 month ago

I also want to mention that I was looking at MTMS / MTDI to see if I was, perhaps, not treating them as they should be treated.

mdeneen commented 1 month ago

Yogesh,

I have identified the problem and we can close out this issue. The esp32-c6 is powered by a 500mA linear regulator and it is not strong enough to handle 20 dBm TX. I have reduced the ESP_PHY_MAX_TX_POWER in the sdkconfig and everything is working now. The behavior was unpredictable when the voltage dropped, and it didn't drop far enough for the brown out detection to kick in.

For now we will reduce the max TX power, but in the future the linear regulator will be beefed up.

mantriyogesh commented 1 month ago

Oh this looks concerning.

It will try to involve the phy team for this. Btw, in sure you would have taken care, but is the input power sufficient?

mantriyogesh commented 1 month ago

I got the update from physical team on this.

If they use C6 module, we recommend the power supply current of module is not less than 500 mA. Following table is the Wi-Fi power consumption of C6-WROOM-1 module.

tmp_fb5bfdf9-616f-4de1-aba6-513b2e02a3d7

Also,

If the antenna matching circuit is mismatch, it will impact the RF PHY power. When user use ESP chip on board design, they should do antenna matching test or send the board to our RF test lab to verify it.

I am not entirely sure, but I sense the linear regular and antenna design might also be play some role here? I am not deep down into electronics. But let me know if something is to be done with above response from either side.

mdeneen commented 1 month ago

We are using a module:

image

The antenna connector you see in the photo is for a GPS; it is not related to the esp32-c6 module.

Are you suggesting that there is additional impedance matching to be done in this case?

At the moment we are looking at replacing the 500Ma linear supply with either 800mA or 1A, but I'm quite interested in knowing if we should be doing additional RF design or if perhaps we should be disabling 802.11b and g.