SolidHal / PrawnOS

Libre Mainline Kernel and Debian for arm laptops
https://www.PrawnOS.com
GNU General Public License v2.0
114 stars 29 forks source link

Boot sometimes hangs at "cfg80211: failed to load regulatory.db" #47

Closed SolidHal closed 5 years ago

SolidHal commented 5 years ago

Boot sometimes stops at

[    2.176922] vcc18_lcd: supplied by vcc_18
[    2.179482] ALSA device list:
[    2.181976]   #0: VEYRON-I2S
[    2.184627] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[    2.188464] cfg80211: failed to load regulatory.db

If you wait a loooooong time, it will eventually boot.

[    2.184627] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[    2.188464] cfg80211: failed to load regulatory.db
[  602.203292] mmc2: Card stuck being busy! mmc_poll_for_busy
[  602.205906] mmc2: cache flush error -110
[  602.708471] dwmmc_rockchip ff0f0000.dwmmc: Busy; trying anyway
[  603.211032] mmc_host mmc2: Timeout sending command (cmd 0x202000 arg 0x0 status 0x80202000)
[  603.229577] mmc_host mmc2: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
[  603.324767] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[  603.585485] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 173
[  603.633887] Primary GPT is being ignored, using alternate GPT.
[  603.636626]  mmcblk2: p1 p2
[  603.646121] EXT4-fs (mmcblk2p2): warning: mounting unchecked fs, running e2fsck is recommended
[  603.653761] EXT4-fs (mmcblk2p2): mounted filesystem without journal. Opts: (null)
[  603.656679] VFS: Mounted root (ext4 filesystem) on device 179:2.

Attaching full log for later.

Workaround: Hold power button, reboot. Loooooong_db_emmc_error.txt

SolidHal commented 5 years ago

11 is definitely related. When waking, get a spam of the same emmc errors and some about ath9k as well.

Attaching log from #11 situation

wakeup_hang_kern.log

TODO: Recreate that log w/ the wifi dongle and without. See how much of a role ath9k plays, if any.

SolidHal commented 5 years ago

For reference, heres the lof of a successful sleep then wake.

successfull_sleep_wake.txt

This section seems especially useful:

[ 1375.146362] mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
[ 1375.203666] mmc1: queuing unknown CIS tuple 0x80 (2 bytes)
[ 1375.205184] mmc1: queuing unknown CIS tuple 0x80 (3 bytes)
[ 1375.206703] mmc1: queuing unknown CIS tuple 0x80 (3 bytes)
[ 1375.209430] mmc1: queuing unknown CIS tuple 0x80 (7 bytes)
[ 1375.238313] rockchip-dp ff970000.dp: Timeout of video streamclk ok
[ 1375.238316] rockchip-dp ff970000.dp: unable to config video
[ 1375.263318] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[ 1375.405812] elan_i2c 4-0015: invalid report id data (ff)
[ 1375.471409] usb 3-1: reset high-speed USB device number 2 using ehci-platform
[ 1375.517518] dwmmc_rockchip ff0d0000.dwmmc: Successfully tuned phase to 196
[ 1375.907082] usb 3-1: ath9k_htc: Transferred FW: ath9k_htc/htc_9271-1.4.0.fw, size: 51008
[ 1376.156702] ath9k_htc 3-1:1.0: ath9k_htc: HTC initialized with 33 credits
[ 1376.157940] OOM killer enabled.
[ 1376.157944] Restarting tasks ... done.
[ 1376.177633] PM: suspend exit
[ 1376.203719] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[ 1376.292281] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[ 1376.392468] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[ 1376.430572] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[ 1376.762010] mmc_host mmc2: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
[ 1376.828618] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[ 1376.975042] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 173
SolidHal commented 5 years ago

Recreated the boot issue with grf patch, emmc addr/size patch, and powerseq patch while no usb wifi dongle was plugged in on a 4GB model.

Got nearly identical log to the original posted error log

[    2.184627] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[    2.188464] cfg80211: failed to load regulatory.db
[  602.203292] mmc2: Card stuck being busy! mmc_poll_for_busy
[  602.205906] mmc2: cache flush error -110
[  602.708471] dwmmc_rockchip ff0f0000.dwmmc: Busy; trying anyway
[  603.211032] mmc_host mmc2: Timeout sending command (cmd 0x202000 arg 0x0 status 0x80202000)
[  603.229577] mmc_host mmc2: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
[  603.324767] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[  603.585485] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 175
SolidHal commented 5 years ago

Checking chromeos to see if they may have a fix that didn't get mainlined.

Some good candidates I'll be trying: This one has to do with power off/on https://chromium.googlesource.com/chromiumos/third_party/kernel/+/68136398112894eb79b9c62e300abe9760cb06a7%5E%21/#F0

This one is more suspend/ resume directed: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/edea1b23d78ae9c9c66b0551baee11ef79dd0192

SolidHal commented 5 years ago

Tried Implementing both. The second one can't be applied correctly as the mmc host driver has changed too much.

The first one can be applied after some slight modification. First sleep after reinstall crashes and NEVER recovers. It can be left for hours and won't recover.

By accident, I found that it is possible this configuration will wake from sleep properly sometimes. TL;DR: Neither fix this issue.

SolidHal commented 5 years ago

For reboot testing, do something like

Put the following in: /lib/systemd/system/cycletest.service

[Unit]
Description=Reboots unit after 30s

[Service]
StandardOutput=syslog+console
ExecStart=/bin/sh -c "\
test -f /cycle-count || echo 0 > /cycle-count;\
echo 'starting cycletest';\
sleep 30;\
expr `cat /cycle-count` + 1 > /cycle-count;\
systemctl reboot;\
"

[Install]
WantedBy=multi-user.target

To install and start the script:

systemctl daemon-reload
systemctl enable cycletest.service (enable the service to start on reboot)
systemctl start cycletest.service (start the service, should reboot in 30s)
SolidHal commented 5 years ago

What is at spi0.0?

[  324.445382] OOM killer enabled.
[  324.445386] Restarting tasks ... done.
[  324.478729] PM: suspend exit
[  324.509574] cros-ec-spi spi0.0: Command xfer error (err:-11)
[  324.509596] cros-ec-i2c-tunnel ff110000.spi:ec@0:i2c-tunnel: Error transferring EC i2c message -11
SolidHal commented 5 years ago

Tested kernel version 4.9 to see if this is a regression. It doesn't seem to be. Not only can I recreate the same issues, it seems to have some further emmc troubles.

Heres the part with the same issue:

[  600.910977] mmc2: Card stuck in programming state! __mmc_switch
[  600.913037] mmc2: cache flush error -110
[  601.415969] dwmmc_rockchip ff0f0000.dwmmc: Busy; trying anyway
[  601.417979] mmc_host mmc2: Timeout sending command (cmd 0x202000 arg 0x0 status 0x0)
[  601.438206] mmc_host mmc2: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
[  601.498686] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[  601.729184] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 173

And a line earlier in boot with context:

[    0.795554] mmcblk2: mmc2:0001 HAG2e 14.7 GiB 
[    0.795655] mmcblk2boot0: mmc2:0001 HAG2e partition 1 4.00 MiB
[    0.795749] mmcblk2boot1: mmc2:0001 HAG2e partition 2 4.00 MiB
[    0.795835] mmcblk2rpmb: mmc2:0001 HAG2e partition 3 4.00 MiB
[    0.809010] usb 2-1: SerialNumber: AC220B2805A4BFA0C9727846
[    0.811852] usb-storage 2-1:1.0: USB Mass Storage device detected
[    0.814616] scsi host0: usb-storage 2-1:1.0
[    0.892839] input: Elan Touchpad as /devices/platform/ff160000.i2c/i2c-4/4-0015/input/input3
[    0.909405] mmcblk2: error -110 transferring data, sector 0, nr 8, cmd response 0x900, card status 0x0

I wonder if these are in any way related?

Attaching the full log. 4.9-emmc-boot-test.txt

SolidHal commented 5 years ago

Something interesting to not. Between 4.9 and 4.17, the error partially changed.

4.9:

[  600.910977] mmc2: Card stuck in programming state! __mmc_switch
[  600.913037] mmc2: cache flush error -110
[  601.415969] dwmmc_rockchip ff0f0000.dwmmc: Busy; trying anyway

vs

4.17.2

[  602.203292] mmc2: Card stuck being busy! mmc_poll_for_busy
[  602.205906] mmc2: cache flush error -110
[  602.708471] dwmmc_rockchip ff0f0000.dwmmc: Busy; trying anyway

Both seem to a cache flush error? Was it simply renamed?

SolidHal commented 5 years ago

Tested multiple patches, all which don't fix the issue:

00010-dont-send-clock-off-command-if-power-off.patch
0004-fix-mvsdio-eMMC-timing.patch
0006-ARM-DTSI-rk3288-Missing-GRF-handles.patch
0007-RK3288-DTSI-rk3288-Add-missing-SPI2-pinctrl.patch
0009-ARM-DTSI-rk3288-Adding-cells-addresses-and-size.patch
0009-power-add-power-sequence-library.patch

Of those, these should probably be applied anyway, but will test later

0006-ARM-DTSI-rk3288-Missing-GRF-handles.patch
0007-RK3288-DTSI-rk3288-Add-missing-SPI2-pinctrl.patch
SolidHal commented 5 years ago

Tracing log:

  1. mmc2: Card stuck being busy! mmc_poll_for_busy
  2. mmc2: cache flush error -110
  3. dwmmc_rockchip ff0f0000.dwmmc: Busy; trying anyway
  4. mmc_host mmc2: Timeout sending command (cmd 0x202000 arg 0x0 status 0x80202000)
  5. mmc_host mmc2: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
  6. mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
  7. dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 173

1 is printed by mmc_poll_for_busy() in drivers/mmc/core/mmc_ops.c (duh, but confirmed anyway) 1 is called by: __mmc_switch in drivers/mmc/core/mmc_ops.c 1 returns: -ETIMEDOUT

2 is printed by mmc_flush_cache in drivers/mmc/core/mmc_ops.c 2 is called by: mmc_reset in core/mmc.c

So, hw_reset calls for the cache to be cleared, and __mmc_switch sends that command and then polls for an answer with mmc_poll_for_busy, which then times out as the data lines never read all 0's.

This explains these lines:

mmc2: Card stuck being busy! mmc_poll_for_busy
mmc2: cache flush error -110

mmc_poll_for_busy just checks the data lines to see it is not longer busy. Chrome os uses CMD13 and a status check to see if it busy.

TODO: Confirm the c201 uses host->ops->card_busy, if so: see if disabling host->ops->card_busy and using the CMD13 check by calling mmc_send_status instead fixes the issue. Maybe theres a device tree flag?

Now, what about these?

dwmmc_rockchip ff0f0000.dwmmc: Busy; trying anyway
mmc_host mmc2: Timeout sending command (cmd 0x202000 arg 0x0 status 0x80202000)

3 is printed by dw_mci_wait_while_busy in drivers/mmc/host/dw_mmc.c 3 is called by: mci_send_cmd

4 is printed by mci_send_cmd after dw_mci_wait_while_busy gives up 4' is most likely called bydw_mci_setup_bus`

TODO: Put a dump_stack() in dw_mci_wait_while_busy to confirm all of this

IDEA: When the issue happens, does dw_mci_setup_bus somehow not complete before __mmc_switch gets caught in poll for busy?

Log/ stack trace from the hang with mmc_reset


[    2.098984] HAL_DEBUG: mmc2: IN __mmc_switch, blocking retuning
[    2.132689] HAL_DEBUG: mmc2:   mmc_poll_for_busy called by: __mmc_switch
[    2.258439] dwmmc_rockchip ff0d0000.dwmmc: Successfully tuned phase to 198
[    2.503736] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[    2.505883] cfg80211: failed to load regulatory.db
[    8.323410] mmc1: new ultra high speed SDR104 SDIO card at address 0001
[  172.411772] random: crng init done
[  602.151267] mmc2: Card stuck being busy! mmc_poll_for_busy returning this value ffffff92 
[  602.153518] HAL_DEBUG: mmc2: IN __mmc_switch, and mmc_poll_for_busy returned err, goto out
[  602.155787] HAL_DEBUG: mmc2: IN __mmc_switch, calling mmc_return_release 
[  602.158060] HAL_DEBUG: mmc2: IN __mmc_switch, returned from mmc_set_timing 
[  602.160345] HAL_DEBUG: mmc2: IN __mmc_switch, RETURNING ERROR: ffffff92 
[  602.162625] mmc2: cache flush error -110
[  602.164902] CPU: 0 PID: 118 Comm: kworker/0:1H Not tainted 4.17.2-gnu+ #1
[  602.167221] Hardware name: Rockchip (Device Tree)
[  602.169569] Workqueue: kblockd mmc_blk_mq_complete_work
[  602.171920] [<c02126b0>] (unwind_backtrace) from [<c020ceac>] (show_stack+0x20/0x24)
[  602.174296] [<c020ceac>] (show_stack) from [<c0ac1f84>] (dump_stack+0x84/0xa4)
[  602.176714] [<c0ac1f84>] (dump_stack) from [<c07f3a74>] (mmc_flush_cache+0x84/0x94)
[  602.179140] [<c07f3a74>] (mmc_flush_cache) from [<c07f2450>] (mmc_reset+0x24/0xa0)
[  602.181593] [<c07f2450>] (mmc_reset) from [<c07eb7a0>] (mmc_hw_reset+0xbc/0x138)
[  602.184074] [<c07eb7a0>] (mmc_hw_reset) from [<c07fd244>] (mmc_blk_reset+0x40/0x10c)
[  602.186566] [<c07fd244>] (mmc_blk_reset) from [<c07fd540>] (mmc_blk_mq_rw_recovery+0x230/0x408)
[  602.189088] [<c07fd540>] (mmc_blk_mq_rw_recovery) from [<c07fdc34>] (mmc_blk_mq_complete_prev_req.part.5+0x64/0x1a0)
[  602.191650] [<c07fdc34>] (mmc_blk_mq_complete_prev_req.part.5) from [<c07fe6ac>] (mmc_blk_mq_complete_work+0x30/0x34)
[  602.194283] [<c07fe6ac>] (mmc_blk_mq_complete_work) from [<c0244a28>] (process_one_work+0x250/0x474)
[  602.196914] [<c0244a28>] (process_one_work) from [<c0245938>] (worker_thread+0x2ec/0x430)
[  602.199542] [<c0245938>] (worker_thread) from [<c024a4e4>] (kthread+0x144/0x160)
[  602.202202] [<c024a4e4>] (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[  602.204841] Exception stack(0xec5d5fb0 to 0xec5d5ff8)
[  602.207464] 5fa0:                                     00000000 00000000 00000000 00000000
[  602.210135] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  602.212817] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  602.715535] dwmmc_rockchip ff0f0000.dwmmc: Busy; trying anyway
[  603.218237] mmc_host mmc2: Timeout sending command (cmd 0x202000 arg 0x0 status 0x80202000)
[  603.236522] mmc_host mmc2: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)

Final thing of note, this line IDEA: dwmmc_rockchip ff0d0000.dwmmc: Successfully tuned phase to 198 happens after __mmmc_switch_ disabled retuning. Could the retuning of one of the other dwmmc_rockchip devices cause this issue?

Test for this by disabling all retuning when mmc_retune_hold(host) is called in __mmc_switch? Or see if that node gets retuned when it boots successfully? However, if it doesn't return far off what it was tuned to it may not cause an issue.

This is unlikely the problem.

SolidHal commented 5 years ago

Two questions:

  1. Why does mmc_reset hang, but complete
  2. Why is mmc_reset called in the first place?

So, mmc_reset calls

TODO: determine which call to mci_send_cmd fails, is it really required if the emmc works after? Add an error return to mci_send_cmd and have dw_mci_setup_bus report which one failed

That answers the first question.

On to number 2:

As part of its work queue,

Questions raised:

  1. Is it due to mmc_blk_rq_error(&mqrq->brq) or due to mmc_blk_card_busy(mq->card, req) that mmc_blk_mq_rw_recovery gets called?

  2. does mmc_blk_reset call mmc_hw_reset both times it runs (theres some logic to exit early in certain conditions)

  3. which call to mmc_blk_reset leads to the problem?

Heres the portion of the kernel log I got all of this from:

[    2.218093] mmc2: in mmc_reset, flushing cache
[    2.220377] HAL_DEBUG: mmc2: IN __mmc_switch, pause retuning 
[    2.222637] HAL_DEBUG: mmc2:   mmc_poll_for_busy called by: __mmc_switch
[  602.226643] mmc2: Card stuck being busy! mmc_poll_for_busy returning this value ffffff92 
[  602.229158] HAL_DEBUG: mmc2: IN __mmc_switch, and mmc_poll_for_busy returned err, goto out
[  602.231703] HAL_DEBUG: mmc2: IN __mmc_switch, resuming tuning 
[  602.234213] HAL_DEBUG: mmc2: IN __mmc_switch, RETURNING ERROR: ffffff92 
[  602.236733] mmc2: cache flush error -110
[  602.239247] CPU: 0 PID: 119 Comm: kworker/0:1H Not tainted 4.17.2-gnu+ #1
[  602.241814] Hardware name: Rockchip (Device Tree)
[  602.244374] Workqueue: kblockd mmc_blk_mq_complete_work
[  602.246939] [<c02126b0>] (unwind_backtrace) from [<c020ceac>] (show_stack+0x20/0x24)
[  602.249549] [<c020ceac>] (show_stack) from [<c0ac1fa4>] (dump_stack+0x84/0xa4)
[  602.252198] [<c0ac1fa4>] (dump_stack) from [<c07f3a8c>] (mmc_flush_cache+0x84/0x94)
[  602.254840] [<c07f3a8c>] (mmc_flush_cache) from [<c07f2498>] (mmc_reset+0x38/0xb8)
[  602.257475] [<c07f2498>] (mmc_reset) from [<c07eb7a0>] (mmc_hw_reset+0xbc/0x138)
[  602.260097] [<c07eb7a0>] (mmc_hw_reset) from [<c07fd25c>] (mmc_blk_reset+0x40/0x10c)
[  602.262705] [<c07fd25c>] (mmc_blk_reset) from [<c07fd558>] (mmc_blk_mq_rw_recovery+0x230/0x408)
[  602.265330] [<c07fd558>] (mmc_blk_mq_rw_recovery) from [<c07fdc4c>] (mmc_blk_mq_complete_prev_req.part.5+0x64/0x1a0)
[  602.267982] [<c07fdc4c>] (mmc_blk_mq_complete_prev_req.part.5) from [<c07fe6c4>] (mmc_blk_mq_complete_work+0x30/0x34)
[  602.270644] [<c07fe6c4>] (mmc_blk_mq_complete_work) from [<c0244a28>] (process_one_work+0x250/0x474)
[  602.273310] [<c0244a28>] (process_one_work) from [<c0245938>] (worker_thread+0x2ec/0x430)
[  602.275976] [<c0245938>] (worker_thread) from [<c024a4e4>] (kthread+0x144/0x160)
[  602.278629] [<c024a4e4>] (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[  602.281246] Exception stack(0xec613fb0 to 0xec613ff8)
[  602.283848] 3fa0:                                     00000000 00000000 00000000 00000000
[  602.286493] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  602.289126] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  602.791735] dwmmc_rockchip ff0f0000.dwmmc: Busy; trying anyway
[  602.794281] CPU: 0 PID: 119 Comm: kworker/0:1H Not tainted 4.17.2-gnu+ #1
[  602.796813] Hardware name: Rockchip (Device Tree)
[  602.799335] Workqueue: kblockd mmc_blk_mq_complete_work
[  602.801856] [<c02126b0>] (unwind_backtrace) from [<c020ceac>] (show_stack+0x20/0x24)
[  602.804424] [<c020ceac>] (show_stack) from [<c0ac1fa4>] (dump_stack+0x84/0xa4)
[  602.806988] [<c0ac1fa4>] (dump_stack) from [<c0808a24>] (dw_mci_wait_while_busy+0x88/0xb4)
[  602.809539] [<c0808a24>] (dw_mci_wait_while_busy) from [<c080a540>] (mci_send_cmd.constprop.6+0x44/0xf0)
[  602.812097] [<c080a540>] (mci_send_cmd.constprop.6) from [<c080a6d0>] (dw_mci_setup_bus+0x58/0x1dc)
[  602.814689] [<c080a6d0>] (dw_mci_setup_bus) from [<c080a9dc>] (dw_mci_set_ios+0x188/0x214)
[  602.817309] [<c080a9dc>] (dw_mci_set_ios) from [<c07ed34c>] (mmc_set_initial_state+0x104/0x114)
[  602.819973] [<c07ed34c>] (mmc_set_initial_state) from [<c07eda60>] (mmc_power_off+0x44/0x54)
[  602.822626] [<c07eda60>] (mmc_power_off) from [<c07edbc8>] (mmc_power_cycle+0x20/0x3c)
[  602.825278] [<c07edbc8>] (mmc_power_cycle) from [<c07f24d0>] (mmc_reset+0x70/0xb8)
[  602.827917] [<c07f24d0>] (mmc_reset) from [<c07eb7a0>] (mmc_hw_reset+0xbc/0x138)
[  602.830561] [<c07eb7a0>] (mmc_hw_reset) from [<c07fd25c>] (mmc_blk_reset+0x40/0x10c)
[  602.833179] [<c07fd25c>] (mmc_blk_reset) from [<c07fd558>] (mmc_blk_mq_rw_recovery+0x230/0x408)
[  602.835814] [<c07fd558>] (mmc_blk_mq_rw_recovery) from [<c07fdc4c>] (mmc_blk_mq_complete_prev_req.part.5+0x64/0x1a0)
[  602.838492] [<c07fdc4c>] (mmc_blk_mq_complete_prev_req.part.5) from [<c07fe6c4>] (mmc_blk_mq_complete_work+0x30/0x34)
[  602.841204] [<c07fe6c4>] (mmc_blk_mq_complete_work) from [<c0244a28>] (process_one_work+0x250/0x474)
[  602.843935] [<c0244a28>] (process_one_work) from [<c0245938>] (worker_thread+0x2ec/0x430)
[  602.846673] [<c0245938>] (worker_thread) from [<c024a4e4>] (kthread+0x144/0x160)
[  602.849430] [<c024a4e4>] (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[  602.852167] Exception stack(0xec613fb0 to 0xec613ff8)
[  602.854869] 3fa0:                                     00000000 00000000 00000000 00000000
[  602.857621] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  602.860389] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  603.363149] mmc_host mmc2: Timeout sending command (cmd 0x202000 arg 0x0 status 0x80202000)
[  603.381022] mmc_host mmc2: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
SolidHal commented 5 years ago

Can commands be sent when the clock is disabled?

        /* disable clock */
        mci_writel(host, CLKENA, 0);
        mci_writel(host, CLKSRC, 0);

        /* inform CIU */
        mci_send_cmd(slot, sdmmc_cmd_bits, 0);

        /* set clock to desired speed */
        mci_writel(host, CLKDIV, div);

        /* inform CIU */
        mci_send_cmd(slot, sdmmc_cmd_bits, 0);

        /* enable clock; only low power if no SDIO */
        clk_en_a = SDMMC_CLKEN_ENABLE << slot->id;
        if (!test_bit(DW_MMC_CARD_NO_LOW_PWR, &slot->flags))
            clk_en_a |= SDMMC_CLKEN_LOW_PWR << slot->id;
        mci_writel(host, CLKENA, clk_en_a);

        /* inform CIU */
        mci_send_cmd(slot, sdmmc_cmd_bits, 0);
SolidHal commented 5 years ago

On a successful boot,

This chunk of the boot log regarding the mmc and mmc block device looks something like this

[    1.954060] mmc2: new HS200 MMC card at address 0001
[    1.955214] mmcblk2: mmc2:0001 HAG2e 14.7 GiB 
[    1.956564] mmcblk2boot0: mmc2:0001 HAG2e partition 1 4.00 MiB
[    1.958037] mmcblk2boot1: mmc2:0001 HAG2e partition 2 4.00 MiB
[    1.958296] mmcblk2rpmb: mmc2:0001 HAG2e partition 3 4.00 MiB, chardev (243:0)
[   1.964869] Primary GPT is being ignored, using alternate GPT.
[   1.964893]  mmcblk2: p1 p2

On an unsuccessful boot, it looks more like this:

[    1.954060] mmc2: new HS200 MMC card at address 0001
[    1.955214] mmcblk2: mmc2:0001 HAG2e 14.7 GiB 
[    1.956564] mmcblk2boot0: mmc2:0001 HAG2e partition 1 4.00 MiB
[    1.958037] mmcblk2boot1: mmc2:0001 HAG2e partition 2 4.00 MiB
[    1.958296] mmcblk2rpmb: mmc2:0001 HAG2e partition 3 4.00 MiB, chardev (243:0)
[    1.970965] Console: switching to colour frame buffer device 170x48
[    2.022840] usb 2-1: New USB device found, idVendor=0951, idProduct=1666, bcdDevice= 1.10
[    2.023442] rockchip-drm display-subsystem: fb0:  frame buffer device
[    2.026514] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[    2.033098] [drm] Initialized rockchip 1.0.0 20140818 for display-subsystem on minor 0
[    2.035889] usb 2-1: Product: DataTraveler 3.0
[    2.039883] input: gpio-keys as /devices/platform/gpio-keys/input/input2
[    2.042467] usb 2-1: Manufacturer: Kingston
[    2.066931] rk808-rtc rk808-rtc: setting system clock to 2018-10-13 14:52:37 UTC (1539442357)
[    2.067101] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[    2.068772] usb 2-1: SerialNumber: AC220B2805A4BFA0C9727846
[    2.105571] usb-storage 2-1:1.0: USB Mass Storage device detected
[    2.108026] scsi host0: usb-storage 2-1:1.0
[    2.177081] elan_i2c 4-0015: Elan Touchpad: Module ID: 0x0061, Firmware: 0x0001, Sample: 0x0002, IAP: 0x0009
[    2.179560] input: Elan Touchpad as /devices/platform/ff160000.i2c/i2c-4/4-0015/input/input3
[    2.180568] elan_i2c 4-0015: invalid report id data (ff)
[    2.218093] mmc2: in mmc_reset, flushing cache
[    2.220377] HAL_DEBUG: mmc2: IN __mmc_switch, pause retuning 
[    2.222637] HAL_DEBUG: mmc2:   mmc_poll_for_busy called by: __mmc_switch
[    2.251798] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
[    2.254130] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[    2.256441] cfg80211: failed to load regulatory.db
[    2.256907] vcc18_lcd: supplied by vcc_18
[    2.261101] ALSA device list:
[    2.263372]   #0: VEYRON-I2S
[    3.214667] scsi 0:0:0:0: Direct-Access     Kingston DataTraveler 3.0 PMAP PQ: 0 ANSI: 6
[    3.217385] sd 0:0:0:0: [sda] 15138816 512-byte logical blocks: (7.75 GB/7.22 GiB)
[    3.219894] sd 0:0:0:0: [sda] Write Protect is off
[    3.222186] sd 0:0:0:0: [sda] Mode Sense: 45 00 00 00
[    3.222386] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[    3.239351] GPT:Primary header thinks Alt. header is not at the end of the disk.
[    3.241724] GPT:4095999 != 15138815
[    3.244061] GPT:Alternate GPT header not at the end of the disk.
[    3.246425] GPT:4095999 != 15138815
[    3.248780] GPT: Use GNU Parted to correct GPT errors.
[    3.251182]  sda: sda1 sda2
[    3.254512] sd 0:0:0:0: [sda] Attached SCSI removable disk
[  602.226643] mmc2: Card stuck being busy! mmc_poll_for_busy returning this value ffffff92 
[  602.229158] HAL_DEBUG: mmc2: IN __mmc_switch, and mmc_poll_for_busy returned err, goto out
[  602.231703] HAL_DEBUG: mmc2: IN __mmc_switch, resuming tuning 
[  602.234213] HAL_DEBUG: mmc2: IN __mmc_switch, RETURNING ERROR: ffffff92 
[  602.236733] mmc2: cache flush error -110
[  602.239247] CPU: 0 PID: 119 Comm: kworker/0:1H Not tainted 4.17.2-gnu+ #1
[  602.241814] Hardware name: Rockchip (Device Tree)
[  602.244374] Workqueue: kblockd mmc_blk_mq_complete_work
[  602.246939] [<c02126b0>] (unwind_backtrace) from [<c020ceac>] (show_stack+0x20/0x24)
[  602.249549] [<c020ceac>] (show_stack) from [<c0ac1fa4>] (dump_stack+0x84/0xa4)
[  602.252198] [<c0ac1fa4>] (dump_stack) from [<c07f3a8c>] (mmc_flush_cache+0x84/0x94)
[  602.254840] [<c07f3a8c>] (mmc_flush_cache) from [<c07f2498>] (mmc_reset+0x38/0xb8)
[  602.257475] [<c07f2498>] (mmc_reset) from [<c07eb7a0>] (mmc_hw_reset+0xbc/0x138)
[  602.260097] [<c07eb7a0>] (mmc_hw_reset) from [<c07fd25c>] (mmc_blk_reset+0x40/0x10c)
[  602.262705] [<c07fd25c>] (mmc_blk_reset) from [<c07fd558>] (mmc_blk_mq_rw_recovery+0x230/0x408)
[  602.265330] [<c07fd558>] (mmc_blk_mq_rw_recovery) from [<c07fdc4c>] (mmc_blk_mq_complete_prev_req.part.5+0x64/0x1a0)
[  602.267982] [<c07fdc4c>] (mmc_blk_mq_complete_prev_req.part.5) from [<c07fe6c4>] (mmc_blk_mq_complete_work+0x30/0x34)
[  602.270644] [<c07fe6c4>] (mmc_blk_mq_complete_work) from [<c0244a28>] (process_one_work+0x250/0x474)
[  602.273310] [<c0244a28>] (process_one_work) from [<c0245938>] (worker_thread+0x2ec/0x430)
[  602.275976] [<c0245938>] (worker_thread) from [<c024a4e4>] (kthread+0x144/0x160)
[  602.278629] [<c024a4e4>] (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[  602.281246] Exception stack(0xec613fb0 to 0xec613ff8)
[  602.283848] 3fa0:                                     00000000 00000000 00000000 00000000
[  602.286493] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  602.289126] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  602.791735] dwmmc_rockchip ff0f0000.dwmmc: Busy; trying anyway
[  602.794281] CPU: 0 PID: 119 Comm: kworker/0:1H Not tainted 4.17.2-gnu+ #1
[  602.796813] Hardware name: Rockchip (Device Tree)
[  602.799335] Workqueue: kblockd mmc_blk_mq_complete_work
[  602.801856] [<c02126b0>] (unwind_backtrace) from [<c020ceac>] (show_stack+0x20/0x24)
[  602.804424] [<c020ceac>] (show_stack) from [<c0ac1fa4>] (dump_stack+0x84/0xa4)
[  602.806988] [<c0ac1fa4>] (dump_stack) from [<c0808a24>] (dw_mci_wait_while_busy+0x88/0xb4)
[  602.809539] [<c0808a24>] (dw_mci_wait_while_busy) from [<c080a540>] (mci_send_cmd.constprop.6+0x44/0xf0)
[  602.812097] [<c080a540>] (mci_send_cmd.constprop.6) from [<c080a6d0>] (dw_mci_setup_bus+0x58/0x1dc)
[  602.814689] [<c080a6d0>] (dw_mci_setup_bus) from [<c080a9dc>] (dw_mci_set_ios+0x188/0x214)
[  602.817309] [<c080a9dc>] (dw_mci_set_ios) from [<c07ed34c>] (mmc_set_initial_state+0x104/0x114)
[  602.819973] [<c07ed34c>] (mmc_set_initial_state) from [<c07eda60>] (mmc_power_off+0x44/0x54)
[  602.822626] [<c07eda60>] (mmc_power_off) from [<c07edbc8>] (mmc_power_cycle+0x20/0x3c)
[  602.825278] [<c07edbc8>] (mmc_power_cycle) from [<c07f24d0>] (mmc_reset+0x70/0xb8)
[  602.827917] [<c07f24d0>] (mmc_reset) from [<c07eb7a0>] (mmc_hw_reset+0xbc/0x138)
[  602.830561] [<c07eb7a0>] (mmc_hw_reset) from [<c07fd25c>] (mmc_blk_reset+0x40/0x10c)
[  602.833179] [<c07fd25c>] (mmc_blk_reset) from [<c07fd558>] (mmc_blk_mq_rw_recovery+0x230/0x408)
[  602.835814] [<c07fd558>] (mmc_blk_mq_rw_recovery) from [<c07fdc4c>] (mmc_blk_mq_complete_prev_req.part.5+0x64/0x1a0)
[  602.838492] [<c07fdc4c>] (mmc_blk_mq_complete_prev_req.part.5) from [<c07fe6c4>] (mmc_blk_mq_complete_work+0x30/0x34)
[  602.841204] [<c07fe6c4>] (mmc_blk_mq_complete_work) from [<c0244a28>] (process_one_work+0x250/0x474)
[  602.843935] [<c0244a28>] (process_one_work) from [<c0245938>] (worker_thread+0x2ec/0x430)
[  602.846673] [<c0245938>] (worker_thread) from [<c024a4e4>] (kthread+0x144/0x160)
[  602.849430] [<c024a4e4>] (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[  602.852167] Exception stack(0xec613fb0 to 0xec613ff8)
[  602.854869] 3fa0:                                     00000000 00000000 00000000 00000000
[  602.857621] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  602.860389] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  603.363149] mmc_host mmc2: Timeout sending command (cmd 0x202000 arg 0x0 status 0x80202000)
[  603.381022] mmc_host mmc2: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
[  603.429175] HAL_DEBUG: mmc2: IN __mmc_switch, pause retuning 
[  603.452366] HAL_DEBUG: mmc2:   mmc_poll_for_busy called by: __mmc_switch
[  603.475258] HAL_DEBUG: mmc2: IN __mmc_switch, calling mmc_switch_status
[  603.499270] HAL_DEBUG: mmc2: IN __mmc_switch, returned from mmc_set_timing 
[  603.502036] HAL_DEBUG: mmc2: IN __mmc_switch, resuming tuning 
[  603.504757] HAL_DEBUG: mmc2: IN __mmc_switch, RETURNING ERROR: 0 
[  603.507454] HAL_DEBUG: mmc2: IN __mmc_switch, pause retuning 
[  603.510444] HAL_DEBUG: mmc2:   mmc_poll_for_busy called by: __mmc_switch
[  603.513095] HAL_DEBUG: mmc2: IN __mmc_switch, calling mmc_switch_status
[  603.516020] HAL_DEBUG: mmc2: IN __mmc_switch, returned from mmc_set_timing 
[  603.518592] HAL_DEBUG: mmc2: IN __mmc_switch, resuming tuning 
[  603.521203] HAL_DEBUG: mmc2: IN __mmc_switch, RETURNING ERROR: 0 
[  603.523838] HAL_DEBUG: mmc2: IN __mmc_switch, pause retuning 
[  603.526716] HAL_DEBUG: mmc2:   mmc_poll_for_busy called by: __mmc_switch
[  603.529260] HAL_DEBUG: mmc2: IN __mmc_switch, calling mmc_switch_status
[  603.532100] HAL_DEBUG: mmc2: IN __mmc_switch, returned from mmc_set_timing 
[  603.534475] HAL_DEBUG: mmc2: IN __mmc_switch, resuming tuning 
[  603.536830] HAL_DEBUG: mmc2: IN __mmc_switch, RETURNING ERROR: 0 
[  603.547830] HAL_DEBUG: mmc2: IN __mmc_switch, pause retuning 
[  603.553243] HAL_DEBUG: mmc2:   mmc_poll_for_busy called by: __mmc_switch
[  603.558263] HAL_DEBUG: mmc2: IN __mmc_switch, resuming tuning 
[  603.563219] HAL_DEBUG: mmc2: IN __mmc_switch, RETURNING ERROR: 0 
[  603.568515] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[  603.824555] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 173
[  603.857299] HAL_DEBUG: mmc2: IN __mmc_switch, pause retuning 
[  603.890957] HAL_DEBUG: mmc2:   mmc_poll_for_busy called by: __mmc_switch
[  603.924122] HAL_DEBUG: mmc2: IN __mmc_switch, calling mmc_switch_status
[  603.926388] HAL_DEBUG: mmc2: IN __mmc_switch, returned from mmc_set_timing 
[  603.928640] HAL_DEBUG: mmc2: IN __mmc_switch, resuming tuning 
[  603.930848] HAL_DEBUG: mmc2: IN __mmc_switch, RETURNING ERROR: 0 
[  603.933012] HAL_DEBUG: mmc2: IN __mmc_switch, pause retuning 
[  603.935140] HAL_DEBUG: mmc2:   mmc_poll_for_busy called by: __mmc_switch
[  603.937267] HAL_DEBUG: mmc2: IN __mmc_switch, calling mmc_switch_status
[  603.939385] HAL_DEBUG: mmc2: IN __mmc_switch, returned from mmc_set_timing 
[  603.941471] HAL_DEBUG: mmc2: IN __mmc_switch, resuming tuning 
[  603.943508] HAL_DEBUG: mmc2: IN __mmc_switch, RETURNING ERROR: 0 
**[  603.949265] Primary GPT is being ignored, using alternate GPT.**
**[  603.957849]  mmcblk2: p1 p2**
[  603.987520] EXT4-fs (mmcblk2p2): mounted filesystem without journal. Opts: (null)
[  603.991085] VFS: Mounted root (ext4 filesystem) on device 179:2.
[  603.996829] Freeing unused kernel memory: 2048K
[  604.148502] random: systemd: uninitialized urandom read (16 bytes read)
[  604.151963] random: systemd: uninitialized urandom read (16 bytes read)
[  604.156758] systemd[1]: systemd 232 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN)
[  604.162430] systemd[1]: Detected architecture arm.

See how the lines with ** around them get shoved way down below? TODO: see if these lines:

**[  603.949265] Primary GPT is being ignored, using alternate GPT.**
**[  603.957849]  mmcblk2: p1 p2**

Are calling the functions that hang.... Could this have to do with the GPT patch brought in from ChromeOS?

SolidHal commented 5 years ago

When dw_mci_setup_bus gets called during the issue segment, this call to mci_send_cmd returns an error:

        /* disable clock */
        mci_writel(host, CLKENA, 0);
        mci_writel(host, CLKSRC, 0);

        /* inform CIU */
        if (mci_send_cmd(slot, sdmmc_cmd_bits, 0) == 1){
      dev_err(&slot->mmc->class_dev,
               "HAL_DEBUG: Error informing CIU just after clock disable!\n");
    }

TODO : (Question 6) Does dw_mci_setup_bus get called before the time it errors? If so, why does mci_send_cmd only fail this time? - added "HAL_DEBUG: ENTERED dw_mci_setup_bus!" to check

Summary of previous questions:

  1. Why does mmc_reset hang, but complete

    • Short answer, because the timeout errors that are reported are purposely ignored because:
    • When mci_send_cmd errors out, dw_mci_setup_bus never gets it because mci_send_cmd doesn't report errors
    • When mmc_reset calls mmc_flush_cache it purposely ignore errors
  2. Why is mmc_reset called in the first place?

    • mmc_reset is called by mmc_blk_mq_recovery for the sake of resetting the card
    • Furthur info in the answer to (3)
  3. Is it due to mmc_blk_rq_error(&mqrq->brq) or due to mmc_blk_card_busy(mq->card, req) that mmc_blk_mq_rw_recovery gets called?

    • mmc_blk_mq_rw_recovery gets called twice. Both times because mmc_blk_rq_error is TRUE
    • The first time it returns before getting to calling mmc_blk_reset, the second time it reaches mmc_blk_reset
  4. does mmc_blk_reset call mmc_hw_reset both times it runs (theres some logic to exit early in certain conditions)

    • No, mmc_blk_reset actually only gets called once in mmc_blk_mq_rw_recovery
    • So no double reset issues
  5. which call to mmc_blk_reset leads to the problem?

    • This one:
      if (!mmc_host_is_spi(mq->card->host) &&
      err && mmc_blk_reset(md, card->host, type)) {
      pr_err("%s: recovery failed!\n", req->rq_disk->disk_name);
      mqrq->retries = MMC_NO_RETRIES;
      return;
      }

So some follow ups.

Multiple potential issues at three levels:

  1. Should mmc_reset really call `mmc_flush_cache?

    • I think not, as mmc_reset is likely caused when there is an error with the mmc, so mmc_flush_cache hanging and then timing out is likely
    • Anyway, two lines lower mmc_power_cycle is called, which eventually calls dw_mci_setup_bus which re-sets-up the cache
    • ? Should the cache be immediately cleared here, after the cache gets re-set-up?
    • does dw_mci_setup_bus get called any other times? How should it run? (expands on the question on the top of this comment)
  2. How does dw_mci_setup_bus run usually?

  3. Why is there a mmc_blk_rq_error ???? Is it related to these lines?

**[  603.949265] Primary GPT is being ignored, using alternate GPT.**
**[  603.957849]  mmcblk2: p1 p2**

Attaching log that highlights the hang. debug_log_mci_write_and_mmc_blk_rq.txt

SolidHal commented 5 years ago

2) dw_mci_setup_bus is called a ton. A ton during boot, and then continuously after boot. Why is that? Lets look into the stack traces of each call to dw_mci_setup_bus

First dw_mci_setup_bus (for mmc2)

[    1.634118] mmc_host mmc2: HAL_DEBUG: ENTERED dw_mci_setup_bus!
[    1.634136] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.17.2-gnu+ #1
[    1.634149] Hardware name: Rockchip (Device Tree)
[    1.634169] [<c02126b0>] (unwind_backtrace) from [<c020ceac>] (show_stack+0x20/0x24)
[    1.634189] [<c020ceac>] (show_stack) from [<c0ac2124>] (dump_stack+0x84/0xa4)
[    1.634209] [<c0ac2124>] (dump_stack) from [<c080a730>] (dw_mci_setup_bus+0x3c/0x264)
[    1.634229] [<c080a730>] (dw_mci_setup_bus) from [<c080aad0>] (dw_mci_set_ios+0x178/0x214)
[    1.634249] [<c080aad0>] (dw_mci_set_ios) from [<c07ed878>] (mmc_power_up+0x134/0x1a0)
[    1.634269] [<c07ed878>] (mmc_power_up) from [<c07ee71c>] (mmc_start_host+0x5c/0x88)
[    1.634289] [<c07ee71c>] (mmc_start_host) from [<c07ef9e8>] (mmc_add_host+0x68/0x8c)
[    1.634309] [<c07ef9e8>] (mmc_add_host) from [<c080a330>] (dw_mci_probe+0x8e4/0xba8)
[    1.634329] [<c080a330>] (dw_mci_probe) from [<c080bcf0>] (dw_mci_pltfm_register+0xa4/0xb4)
[    1.634349] [<c080bcf0>] (dw_mci_pltfm_register) from [<c080c45c>] (dw_mci_rockchip_probe+0x88/0x108)
[    1.634371] [<c080c45c>] (dw_mci_rockchip_probe) from [<c0686424>] (platform_drv_probe+0x60/0xb0)
[    1.634393] [<c0686424>] (platform_drv_probe) from [<c0684220>] (driver_probe_device+0x20c/0x468)
[    1.634415] [<c0684220>] (driver_probe_device) from [<c0684518>] (__driver_attach+0x9c/0x100)
[    1.634436] [<c0684518>] (__driver_attach) from [<c068233c>] (bus_for_each_dev+0x68/0x8c)
[    1.634456] [<c068233c>] (bus_for_each_dev) from [<c0683b1c>] (driver_attach+0x28/0x30)
[    1.634477] [<c0683b1c>] (driver_attach) from [<c0683448>] (bus_add_driver+0xe8/0x240)
[    1.634497] [<c0683448>] (bus_add_driver) from [<c06852a0>] (driver_register+0xac/0xf0)
[    1.634518] [<c06852a0>] (driver_register) from [<c0686364>] (__platform_driver_register+0x40/0x54)
[    1.634539] [<c0686364>] (__platform_driver_register) from [<c103e570>] (dw_mci_rockchip_pltfm_driver_init+0x20/0x28)
[    1.634563] [<c103e570>] (dw_mci_rockchip_pltfm_driver_init) from [<c0202f78>] (do_one_initcall+0xcc/0x270)
[    1.634586] [<c0202f78>] (do_one_initcall) from [<c10011ec>] (kernel_init_freeable+0x2a0/0x3e4)
[    1.634609] [<c10011ec>] (kernel_init_freeable) from [<c0ad6424>] (kernel_init+0x18/0x128)
[    1.634629] [<c0ad6424>] (kernel_init) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[    1.634645] Exception stack(0xed13dfb0 to 0xed13dff8)
[    1.634658] dfa0:                                     00000000 00000000 00000000 00000000
[    1.634675] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.634693] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    1.634727] mmc_host mmc2: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)

Second entry, a little below

[    1.650408] mmc_host mmc2: HAL_DEBUG: ENTERED dw_mci_setup_bus!
[    1.650424] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.17.2-gnu+ #1
[    1.650433] Hardware name: Rockchip (Device Tree)
[    1.650445] Workqueue: events_freezable mmc_rescan
[    1.650462] [<c02126b0>] (unwind_backtrace) from [<c020ceac>] (show_stack+0x20/0x24)
[    1.650476] [<c020ceac>] (show_stack) from [<c0ac2124>] (dump_stack+0x84/0xa4)
[    1.650490] [<c0ac2124>] (dump_stack) from [<c080a730>] (dw_mci_setup_bus+0x3c/0x264)
[    1.650505] [<c080a730>] (dw_mci_setup_bus) from [<c080aad0>] (dw_mci_set_ios+0x178/0x214)
[    1.650519] [<c080aad0>] (dw_mci_set_ios) from [<c07ecf7c>] (mmc_set_chip_select+0x94/0xa4)
[    1.650533] [<c07ecf7c>] (mmc_set_chip_select) from [<c07f3078>] (mmc_go_idle+0x44/0xb8)
[    1.650547] [<c07f3078>] (mmc_go_idle) from [<c07ee5bc>] (mmc_rescan+0x290/0x394)
[    1.650560] [<c07ee5bc>] (mmc_rescan) from [<c0244a28>] (process_one_work+0x250/0x474)
[    1.650574] [<c0244a28>] (process_one_work) from [<c0245938>] (worker_thread+0x2ec/0x430)
[    1.650588] [<c0245938>] (worker_thread) from [<c024a4e4>] (kthread+0x144/0x160)
[    1.650601] [<c024a4e4>] (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[    1.650611] Exception stack(0xed157fb0 to 0xed157ff8)
[    1.650620] 7fa0:                                     00000000 00000000 00000000 00000000
[    1.650632] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.650644] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000

And third entry, directly following the last

[    1.653188] mmc_host mmc2: HAL_DEBUG: ENTERED dw_mci_setup_bus!
[    1.653209] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.17.2-gnu+ #1
[    1.653218] Hardware name: Rockchip (Device Tree)
[    1.653229] Workqueue: events_freezable mmc_rescan
[    1.653243] [<c02126b0>] (unwind_backtrace) from [<c020ceac>] (show_stack+0x20/0x24)
[    1.653257] [<c020ceac>] (show_stack) from [<c0ac2124>] (dump_stack+0x84/0xa4)
[    1.653271] [<c0ac2124>] (dump_stack) from [<c080a730>] (dw_mci_setup_bus+0x3c/0x264)
[    1.653285] [<c080a730>] (dw_mci_setup_bus) from [<c080aad0>] (dw_mci_set_ios+0x178/0x214)
[    1.653298] [<c080aad0>] (dw_mci_set_ios) from [<c07ecf7c>] (mmc_set_chip_select+0x94/0xa4)
[    1.653312] [<c07ecf7c>] (mmc_set_chip_select) from [<c07f30c8>] (mmc_go_idle+0x94/0xb8)
[    1.653326] [<c07f30c8>] (mmc_go_idle) from [<c07ee5bc>] (mmc_rescan+0x290/0x394)
[    1.653339] [<c07ee5bc>] (mmc_rescan) from [<c0244a28>] (process_one_work+0x250/0x474)
[    1.653352] [<c0244a28>] (process_one_work) from [<c0245938>] (worker_thread+0x2ec/0x430)
[    1.653366] [<c0245938>] (worker_thread) from [<c024a4e4>] (kthread+0x144/0x160)
[    1.653379] [<c024a4e4>] (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[    1.653389] Exception stack(0xed157fb0 to 0xed157ff8)
[    1.653398] 7fa0:                                     00000000 00000000 00000000 00000000
[    1.653410] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.653421] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000

It runs mmc_rescan on one of the other mmc devices, then does this:

[    1.663601] mmc_host mmc2: HAL_DEBUG: ENTERED dw_mci_setup_bus!
[    1.663617] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.17.2-gnu+ #1
[    1.663626] Hardware name: Rockchip (Device Tree)
[    1.663638] Workqueue: events_freezable mmc_rescan
[    1.663655] [<c02126b0>] (unwind_backtrace) from [<c020ceac>] (show_stack+0x20/0x24)
[    1.663669] [<c020ceac>] (show_stack) from [<c0ac2124>] (dump_stack+0x84/0xa4)
[    1.663683] [<c0ac2124>] (dump_stack) from [<c080a730>] (dw_mci_setup_bus+0x3c/0x264)
[    1.663697] [<c080a730>] (dw_mci_setup_bus) from [<c080aad0>] (dw_mci_set_ios+0x178/0x214)
[    1.663711] [<c080aad0>] (dw_mci_set_ios) from [<c07ed194>] (mmc_set_bus_mode+0x94/0xa4)
[    1.663725] [<c07ed194>] (mmc_set_bus_mode) from [<c07f2810>] (mmc_attach_mmc+0x4c/0x15c)
[    1.663739] [<c07f2810>] (mmc_attach_mmc) from [<c07ee680>] (mmc_rescan+0x354/0x394)
[    1.663753] [<c07ee680>] (mmc_rescan) from [<c0244a28>] (process_one_work+0x250/0x474)
[    1.663767] [<c0244a28>] (process_one_work) from [<c0245938>] (worker_thread+0x2ec/0x430)
[    1.663780] [<c0245938>] (worker_thread) from [<c024a4e4>] (kthread+0x144/0x160)
[    1.663795] [<c024a4e4>] (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[    1.663805] Exception stack(0xed157fb0 to 0xed157ff8)
[    1.663814] 7fa0:                                     00000000 00000000 00000000 00000000
[    1.663826] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.663838] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    1.664180] mmc_host mmc2: HAL_DEBUG: ENTERED dw_mci_setup_bus!
[    1.664192] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.17.2-gnu+ #1
[    1.664201] Hardware name: Rockchip (Device Tree)
[    1.664210] Workqueue: events_freezable mmc_rescan
[    1.664224] [<c02126b0>] (unwind_backtrace) from [<c020ceac>] (show_stack+0x20/0x24)
[    1.664238] [<c020ceac>] (show_stack) from [<c0ac2124>] (dump_stack+0x84/0xa4)
[    1.664251] [<c0ac2124>] (dump_stack) from [<c080a730>] (dw_mci_setup_bus+0x3c/0x264)
[    1.664264] [<c080a730>] (dw_mci_setup_bus) from [<c080aad0>] (dw_mci_set_ios+0x178/0x214)
[    1.664278] [<c080aad0>] (dw_mci_set_ios) from [<c07ed194>] (mmc_set_bus_mode+0x94/0xa4)
[    1.664292] [<c07ed194>] (mmc_set_bus_mode) from [<c07f0b2c>] (mmc_init_card+0x54/0x1988)
[    1.664306] [<c07f0b2c>] (mmc_init_card) from [<c07f288c>] (mmc_attach_mmc+0xc8/0x15c)
[    1.664319] [<c07f288c>] (mmc_attach_mmc) from [<c07ee680>] (mmc_rescan+0x354/0x394)
[    1.664332] [<c07ee680>] (mmc_rescan) from [<c0244a28>] (process_one_work+0x250/0x474)
[    1.664345] [<c0244a28>] (process_one_work) from [<c0245938>] (worker_thread+0x2ec/0x430)
[    1.664359] [<c0245938>] (worker_thread) from [<c024a4e4>] (kthread+0x144/0x160)
[    1.664373] [<c024a4e4>] (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[    1.664383] Exception stack(0xed157fb0 to 0xed157ff8)
[    1.664391] 7fa0:                                     00000000 00000000 00000000 00000000
[    1.664403] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.664415] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    1.664429] mmc_host mmc2: HAL_DEBUG: ENTERED dw_mci_setup_bus!
[    1.664439] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.17.2-gnu+ #1
[    1.664448] Hardware name: Rockchip (Device Tree)
[    1.664456] Workqueue: events_freezable mmc_rescan
[    1.664469] [<c02126b0>] (unwind_backtrace) from [<c020ceac>] (show_stack+0x20/0x24)
[    1.664483] [<c020ceac>] (show_stack) from [<c0ac2124>] (dump_stack+0x84/0xa4)
[    1.664495] [<c0ac2124>] (dump_stack) from [<c080a730>] (dw_mci_setup_bus+0x3c/0x264)
[    1.664509] [<c080a730>] (dw_mci_setup_bus) from [<c080aad0>] (dw_mci_set_ios+0x178/0x214)
[    1.664522] [<c080aad0>] (dw_mci_set_ios) from [<c07ecf7c>] (mmc_set_chip_select+0x94/0xa4)
[    1.664536] [<c07ecf7c>] (mmc_set_chip_select) from [<c07f3078>] (mmc_go_idle+0x44/0xb8)
[    1.664550] [<c07f3078>] (mmc_go_idle) from [<c07f0b34>] (mmc_init_card+0x5c/0x1988)
[    1.664564] [<c07f0b34>] (mmc_init_card) from [<c07f288c>] (mmc_attach_mmc+0xc8/0x15c)
[    1.664577] [<c07f288c>] (mmc_attach_mmc) from [<c07ee680>] (mmc_rescan+0x354/0x394)
[    1.664590] [<c07ee680>] (mmc_rescan) from [<c0244a28>] (process_one_work+0x250/0x474)
[    1.664603] [<c0244a28>] (process_one_work) from [<c0245938>] (worker_thread+0x2ec/0x430)
[    1.664617] [<c0245938>] (worker_thread) from [<c024a4e4>] (kthread+0x144/0x160)
[    1.664630] [<c024a4e4>] (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[    1.664640] Exception stack(0xed157fb0 to 0xed157ff8)
[    1.664648] 7fa0:                                     00000000 00000000 00000000 00000000
[    1.664660] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.664672] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    1.667181] mmc_host mmc2: HAL_DEBUG: ENTERED dw_mci_setup_bus!
[    1.667193] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.17.2-gnu+ #1
[    1.667202] Hardware name: Rockchip (Device Tree)
[    1.667212] Workqueue: events_freezable mmc_rescan
[    1.667226] [<c02126b0>] (unwind_backtrace) from [<c020ceac>] (show_stack+0x20/0x24)
[    1.667240] [<c020ceac>] (show_stack) from [<c0ac2124>] (dump_stack+0x84/0xa4)
[    1.667253] [<c0ac2124>] (dump_stack) from [<c080a730>] (dw_mci_setup_bus+0x3c/0x264)
[    1.667266] [<c080a730>] (dw_mci_setup_bus) from [<c080aad0>] (dw_mci_set_ios+0x178/0x214)
[    1.667280] [<c080aad0>] (dw_mci_set_ios) from [<c07ecf7c>] (mmc_set_chip_select+0x94/0xa4)
[    1.667294] [<c07ecf7c>] (mmc_set_chip_select) from [<c07f30c8>] (mmc_go_idle+0x94/0xb8)
[    1.667308] [<c07f30c8>] (mmc_go_idle) from [<c07f0b34>] (mmc_init_card+0x5c/0x1988)
[    1.667321] [<c07f0b34>] (mmc_init_card) from [<c07f288c>] (mmc_attach_mmc+0xc8/0x15c)
[    1.667335] [<c07f288c>] (mmc_attach_mmc) from [<c07ee680>] (mmc_rescan+0x354/0x394)
[    1.667348] [<c07ee680>] (mmc_rescan) from [<c0244a28>] (process_one_work+0x250/0x474)
[    1.667361] [<c0244a28>] (process_one_work) from [<c0245938>] (worker_thread+0x2ec/0x430)
[    1.667375] [<c0245938>] (worker_thread) from [<c024a4e4>] (kthread+0x144/0x160)
[    1.667388] [<c024a4e4>] (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[    1.667398] Exception stack(0xed157fb0 to 0xed157ff8)
[    1.667407] 7fa0:                                     00000000 00000000 00000000 00000000
[    1.667419] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.667431] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    1.668659] mmc1: queuing unknown CIS tuple 0x80 (2 bytes)
[    1.670181] mmc1: queuing unknown CIS tuple 0x80 (3 bytes)
[    1.671696] mmc1: queuing unknown CIS tuple 0x80 (3 bytes)
[    1.674408] mmc1: queuing unknown CIS tuple 0x80 (7 bytes)

Does a couple other things, then rescans again...

[    1.694366] mmc_host mmc2: HAL_DEBUG: ENTERED dw_mci_setup_bus!
[    1.694389] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.17.2-gnu+ #1
[    1.694398] Hardware name: Rockchip (Device Tree)
[    1.694414] Workqueue: events_freezable mmc_rescan
[    1.694438] [<c02126b0>] (unwind_backtrace) from [<c020ceac>] (show_stack+0x20/0x24)
[    1.694454] [<c020ceac>] (show_stack) from [<c0ac2124>] (dump_stack+0x84/0xa4)
[    1.694471] [<c0ac2124>] (dump_stack) from [<c080a730>] (dw_mci_setup_bus+0x3c/0x264)
[    1.694485] [<c080a730>] (dw_mci_setup_bus) from [<c080aad0>] (dw_mci_set_ios+0x178/0x214)
[    1.694500] [<c080aad0>] (dw_mci_set_ios) from [<c07ed194>] (mmc_set_bus_mode+0x94/0xa4)
[    1.694514] [<c07ed194>] (mmc_set_bus_mode) from [<c07f0c54>] (mmc_init_card+0x17c/0x1988)
[    1.694529] [<c07f0c54>] (mmc_init_card) from [<c07f288c>] (mmc_attach_mmc+0xc8/0x15c)
[    1.694543] [<c07f288c>] (mmc_attach_mmc) from [<c07ee680>] (mmc_rescan+0x354/0x394)
[    1.694558] [<c07ee680>] (mmc_rescan) from [<c0244a28>] (process_one_work+0x250/0x474)
[    1.694572] [<c0244a28>] (process_one_work) from [<c0245938>] (worker_thread+0x2ec/0x430)
[    1.694588] [<c0245938>] (worker_thread) from [<c024a4e4>] (kthread+0x144/0x160)
[    1.694602] [<c024a4e4>] (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[    1.694612] Exception stack(0xed157fb0 to 0xed157ff8)
[    1.694621] 7fa0:                                     00000000 00000000 00000000 00000000
[    1.694633] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.694645] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000

Couple more things, then again

[    1.715471] mmc_host mmc2: HAL_DEBUG: ENTERED dw_mci_setup_bus!
[    1.715479] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.17.2-gnu+ #1
[    1.715482] Hardware name: Rockchip (Device Tree)
[    1.715493] Workqueue: events_freezable mmc_rescan
[    1.715513] [<c02126b0>] (unwind_backtrace) from [<c020ceac>] (show_stack+0x20/0x24)
[    1.715524] [<c020ceac>] (show_stack) from [<c0ac2124>] (dump_stack+0x84/0xa4)
[    1.715535] [<c0ac2124>] (dump_stack) from [<c080a730>] (dw_mci_setup_bus+0x3c/0x264)
[    1.715544] [<c080a730>] (dw_mci_setup_bus) from [<c080aad0>] (dw_mci_set_ios+0x178/0x214)
[    1.715552] [<c080aad0>] (dw_mci_set_ios) from [<c07ed238>] (mmc_set_bus_width+0x94/0xa4)
[    1.715562] [<c07ed238>] (mmc_set_bus_width) from [<c07f0488>] (mmc_select_bus_width+0x7c/0x2b8)
[    1.715571] [<c07f0488>] (mmc_select_bus_width) from [<c07f1e0c>] (mmc_init_card+0x1334/0x1988)
[    1.715580] [<c07f1e0c>] (mmc_init_card) from [<c07f288c>] (mmc_attach_mmc+0xc8/0x15c)
[    1.715587] [<c07f288c>] (mmc_attach_mmc) from [<c07ee680>] (mmc_rescan+0x354/0x394)
[    1.715597] [<c07ee680>] (mmc_rescan) from [<c0244a28>] (process_one_work+0x250/0x474)
[    1.715605] [<c0244a28>] (process_one_work) from [<c0245938>] (worker_thread+0x2ec/0x430)
[    1.715614] [<c0245938>] (worker_thread) from [<c024a4e4>] (kthread+0x144/0x160)
[    1.715623] [<c024a4e4>] (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[    1.715626] Exception stack(0xed157fb0 to 0xed157ff8)
[    1.715631] 7fa0:                                     00000000 00000000 00000000 00000000
[    1.715636] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.715641] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000

and again

[    1.724837] mmc_host mmc2: HAL_DEBUG: ENTERED dw_mci_setup_bus!
[    1.724842] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.17.2-gnu+ #1
[    1.724844] Hardware name: Rockchip (Device Tree)
[    1.724849] Workqueue: events_freezable mmc_rescan
[    1.724861] [<c02126b0>] (unwind_backtrace) from [<c020ceac>] (show_stack+0x20/0x24)
[    1.724869] [<c020ceac>] (show_stack) from [<c0ac2124>] (dump_stack+0x84/0xa4)
[    1.724876] [<c0ac2124>] (dump_stack) from [<c080a730>] (dw_mci_setup_bus+0x3c/0x264)
[    1.724882] [<c080a730>] (dw_mci_setup_bus) from [<c080aad0>] (dw_mci_set_ios+0x178/0x214)
[    1.724889] [<c080aad0>] (dw_mci_set_ios) from [<c07ed620>] (mmc_set_timing+0x94/0xa4)
[    1.724896] [<c07ed620>] (mmc_set_timing) from [<c07f1e90>] (mmc_init_card+0x13b8/0x1988)
[    1.724903] [<c07f1e90>] (mmc_init_card) from [<c07f288c>] (mmc_attach_mmc+0xc8/0x15c)
[    1.724910] [<c07f288c>] (mmc_attach_mmc) from [<c07ee680>] (mmc_rescan+0x354/0x394)
[    1.724916] [<c07ee680>] (mmc_rescan) from [<c0244a28>] (process_one_work+0x250/0x474)
[    1.724922] [<c0244a28>] (process_one_work) from [<c0245938>] (worker_thread+0x2ec/0x430)
[    1.724929] [<c0245938>] (worker_thread) from [<c024a4e4>] (kthread+0x144/0x160)
[    1.724936] [<c024a4e4>] (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[    1.724938] Exception stack(0xed157fb0 to 0xed157ff8)
[    1.724942] 7fa0:                                     00000000 00000000 00000000 00000000
[    1.724947] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.724951] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000

and again, and prints the bus speed info and all that:

[    1.725312] mmc_host mmc2: HAL_DEBUG: ENTERED dw_mci_setup_bus!
[    1.725317] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.17.2-gnu+ #1
[    1.725319] Hardware name: Rockchip (Device Tree)
[    1.725324] Workqueue: events_freezable mmc_rescan
[    1.725335] [<c02126b0>] (unwind_backtrace) from [<c020ceac>] (show_stack+0x20/0x24)
[    1.725342] [<c020ceac>] (show_stack) from [<c0ac2124>] (dump_stack+0x84/0xa4)
[    1.725348] [<c0ac2124>] (dump_stack) from [<c080a730>] (dw_mci_setup_bus+0x3c/0x264)
[    1.725355] [<c080a730>] (dw_mci_setup_bus) from [<c080aad0>] (dw_mci_set_ios+0x178/0x214)
[    1.725361] [<c080aad0>] (dw_mci_set_ios) from [<c07ed03c>] (mmc_set_clock+0xb0/0xd4)
[    1.725368] [<c07ed03c>] (mmc_set_clock) from [<c07efca4>] (mmc_set_bus_speed+0x58/0x5c)
[    1.725375] [<c07efca4>] (mmc_set_bus_speed) from [<c07f1bc8>] (mmc_init_card+0x10f0/0x1988)
[    1.725382] [<c07f1bc8>] (mmc_init_card) from [<c07f288c>] (mmc_attach_mmc+0xc8/0x15c)
[    1.725389] [<c07f288c>] (mmc_attach_mmc) from [<c07ee680>] (mmc_rescan+0x354/0x394)
[    1.725395] [<c07ee680>] (mmc_rescan) from [<c0244a28>] (process_one_work+0x250/0x474)
[    1.725401] [<c0244a28>] (process_one_work) from [<c0245938>] (worker_thread+0x2ec/0x430)
[    1.725407] [<c0245938>] (worker_thread) from [<c024a4e4>] (kthread+0x144/0x160)
[    1.725414] [<c024a4e4>] (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[    1.725417] Exception stack(0xed157fb0 to 0xed157ff8)
[    1.725420] 7fa0:                                     00000000 00000000 00000000 00000000
[    1.725425] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.725429] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    1.725436] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)

Its left alone for awhile, then this familiar chunk is printed

[    1.984716] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 173
[    1.984744] HAL_DEBUG: mmc2: IN __mmc_switch, pause retuning 
[    1.984806] HAL_DEBUG: mmc2:   mmc_poll_for_busy called by: __mmc_switch
[    1.986466] HAL_DEBUG: mmc2: IN __mmc_switch, calling mmc_switch_status
[    1.986532] HAL_DEBUG: mmc2: IN __mmc_switch, returned from mmc_set_timing 
[    1.986547] HAL_DEBUG: mmc2: IN __mmc_switch, resuming tuning 
[    1.986563] HAL_DEBUG: mmc2: IN __mmc_switch, RETURNING ERROR: 0 
[    1.986577] HAL_DEBUG: mmc2: IN __mmc_switch, pause retuning 
[    1.986633] HAL_DEBUG: mmc2:   mmc_poll_for_busy called by: __mmc_switch
[    1.986663] HAL_DEBUG: mmc2: IN __mmc_switch, calling mmc_switch_status
[    1.986717] HAL_DEBUG: mmc2: IN __mmc_switch, returned from mmc_set_timing 
[    1.986731] HAL_DEBUG: mmc2: IN __mmc_switch, resuming tuning 
[    1.986746] HAL_DEBUG: mmc2: IN __mmc_switch, RETURNING ERROR: 0 
[    1.986789] mmc2: new HS200 MMC card at address 0001
[    1.990076] mmcblk2: mmc2:0001 HAG2e 14.7 GiB 
[    1.992736] mmcblk2boot0: mmc2:0001 HAG2e partition 1 4.00 MiB
[    1.995408] mmcblk2boot1: mmc2:0001 HAG2e partition 2 4.00 MiB
[    1.996058] mmcblk2rpmb: mmc2:0001 HAG2e partition 3 4.00 MiB, chardev (243:0)
[    2.002004] Primary GPT is being ignored, using alternate GPT.
[    2.002168]  mmcblk2: p1 p2

And just below that, this familiar chunk:

[    5.436092] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[    5.438529] cfg80211: failed to load regulatory.db
[    5.441204] mmc_host mmc2: HAL_DEBUG: ENTERED dw_mci_setup_bus!
[    5.443563] CPU: 1 PID: 18 Comm: kworker/1:0H Not tainted 4.17.2-gnu+ #1
[    5.445886] Hardware name: Rockchip (Device Tree)
[    5.448229] Workqueue: kblockd blk_mq_run_work_fn
[    5.450575] [<c02126b0>] (unwind_backtrace) from [<c020ceac>] (show_stack+0x20/0x24)
[    5.452946] [<c020ceac>] (show_stack) from [<c0ac2124>] (dump_stack+0x84/0xa4)
[    5.455316] [<c0ac2124>] (dump_stack) from [<c080a730>] (dw_mci_setup_bus+0x3c/0x264)
[    5.457699] [<c080a730>] (dw_mci_setup_bus) from [<c080ad00>] (dw_mci_runtime_resume+0x194/0x1a4)
[    5.460084] [<c080ad00>] (dw_mci_runtime_resume) from [<c068d07c>] (pm_generic_runtime_resume+0x3c/0x48)
[    5.462488] [<c068d07c>] (pm_generic_runtime_resume) from [<c069017c>] (__rpm_callback+0x12c/0x1e4)
[    5.464911] [<c069017c>] (__rpm_callback) from [<c0690280>] (rpm_callback+0x4c/0x90)
[    5.467374] [<c0690280>] (rpm_callback) from [<c068fdd8>] (rpm_resume+0x414/0x5c8)
[    5.469867] [<c068fdd8>] (rpm_resume) from [<c068ffe8>] (__pm_runtime_resume+0x5c/0x74)
[    5.472370] [<c068ffe8>] (__pm_runtime_resume) from [<c07ebb60>] (__mmc_claim_host+0x1e0/0x1f0)
[    5.474886] [<c07ebb60>] (__mmc_claim_host) from [<c07ebba8>] (mmc_get_card+0x38/0x3c)
[    5.477420] [<c07ebba8>] (mmc_get_card) from [<c07ff324>] (mmc_mq_queue_rq+0x118/0x200)
[    5.479912] [<c07ff324>] (mmc_mq_queue_rq) from [<c0538714>] (blk_mq_dispatch_rq_list+0x278/0x490)
[    5.482381] [<c0538714>] (blk_mq_dispatch_rq_list) from [<c053c670>] (blk_mq_do_dispatch_sched+0xdc/0xec)
[    5.484898] [<c053c670>] (blk_mq_do_dispatch_sched) from [<c053cdcc>] (blk_mq_sched_dispatch_requests+0x15c/0x1a8)
[    5.487475] [<c053cdcc>] (blk_mq_sched_dispatch_requests) from [<c0536958>] (__blk_mq_run_hw_queue+0xf8/0x128)
[    5.490099] [<c0536958>] (__blk_mq_run_hw_queue) from [<c05369c0>] (blk_mq_run_work_fn+0x38/0x3c)
[    5.492754] [<c05369c0>] (blk_mq_run_work_fn) from [<c0244a28>] (process_one_work+0x250/0x474)
[    5.495428] [<c0244a28>] (process_one_work) from [<c0245938>] (worker_thread+0x2ec/0x430)
[    5.498136] [<c0245938>] (worker_thread) from [<c024a4e4>] (kthread+0x144/0x160)
[    5.500873] [<c024a4e4>] (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[    5.503632] Exception stack(0xed181fb0 to 0xed181ff8)
[    5.506385] 1fa0:                                     00000000 00000000 00000000 00000000
[    5.509195] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    5.511996] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    5.533578] EXT4-fs (mmcblk2p2): mounted filesystem without journal. Opts: (null)

The first call clearly stands out. This must be the very basic kernel initialization.

The last call to dw_mci_setup_bus and the subsequent calls all seem to just be block read/write requests. Since the emmc uses pwrseq, defined by emmc_pwrseq in rk32888-veyron.dtsi and the emmc device in the same file, the card has to be "woken-up" somehow.

As dw_mci_runtime_resume states before calling dw_mci_setup_bus, this is done to "Force setup but to guarantee available clock output". Seem blunt, but should work.

At first glance, the other calls to dw_mci_setup_bus between the first and last all seem the same. They start at mmc_rescan and end up in dw_mci_set_ios and call dw_mci_set_ios causing our stack dump. ios is like the parameters of the mmc, like the timing you can check it in linux by running cat /sys/kernel/debug/mmc*/ios When you start to look at the whole stack however, a pattern emerges. Here are the other stack traces, with the call to mmc_rescan and to dw_mci_set_ios anddw_mci_setup_bus removed:

[    1.650533] [<c07ecf7c>] (mmc_set_chip_select) from [<c07f3078>] (mmc_go_idle+0x44/0xb8)
[    1.650547] [<c07f3078>] (mmc_go_idle) from [<c07ee5bc>] (mmc_rescan+0x290/0x394)
[    1.653312] [<c07ecf7c>] (mmc_set_chip_select) from [<c07f30c8>] (mmc_go_idle+0x94/0xb8)
[    1.653326] [<c07f30c8>] (mmc_go_idle) from [<c07ee5bc>] (mmc_rescan+0x290/0x394)

So mmc_go_idle gets called to set the chip select, twice, kinda strange.

[    1.663725] [<c07ed194>] (mmc_set_bus_mode) from [<c07f2810>] (mmc_attach_mmc+0x4c/0x15c)
[    1.663739] [<c07f2810>] (mmc_attach_mmc) from [<c07ee680>] (mmc_rescan+0x354/0x394)

Then we enter mmc_attach_mmc, and set the bus mode.

[    1.664292] [<c07ed194>] (mmc_set_bus_mode) from [<c07f0b2c>] (mmc_init_card+0x54/0x1988)
[    1.664306] [<c07f0b2c>] (mmc_init_card) from [<c07f288c>] (mmc_attach_mmc+0xc8/0x15c)
[    1.664319] [<c07f288c>] (mmc_attach_mmc) from [<c07ee680>] (mmc_rescan+0x354/0x394)

Then mmc_attach_mmc calls mmc_init_card which feels the need to set the bus mode, again. Looking at the source, this is done for safety.

[    1.664536] [<c07ecf7c>] (mmc_set_chip_select) from [<c07f3078>] (mmc_go_idle+0x44/0xb8)
[    1.664550] [<c07f3078>] (mmc_go_idle) from [<c07f0b34>] (mmc_init_card+0x5c/0x1988)
[    1.664564] [<c07f0b34>] (mmc_init_card) from [<c07f288c>] (mmc_attach_mmc+0xc8/0x15c)
[    1.664577] [<c07f288c>] (mmc_attach_mmc) from [<c07ee680>] (mmc_rescan+0x354/0x394)

mmc_init_card tells the card to go idle to cover certain cards quirks. Does our emmc need this?

[    1.667294] [<c07ecf7c>] (mmc_set_chip_select) from [<c07f30c8>] (mmc_go_idle+0x94/0xb8)
[    1.667308] [<c07f30c8>] (mmc_go_idle) from [<c07f0b34>] (mmc_init_card+0x5c/0x1988)
[    1.667321] [<c07f0b34>] (mmc_init_card) from [<c07f288c>] (mmc_attach_mmc+0xc8/0x15c)
[    1.667335] [<c07f288c>] (mmc_attach_mmc) from [<c07ee680>] (mmc_rescan+0x354/0x394)

go_idle doin its thing

[    1.694514] [<c07ed194>] (mmc_set_bus_mode) from [<c07f0c54>] (mmc_init_card+0x17c/0x1988)
[    1.694529] [<c07f0c54>] (mmc_init_card) from [<c07f288c>] (mmc_attach_mmc+0xc8/0x15c)
[    1.694543] [<c07f288c>] (mmc_attach_mmc) from [<c07ee680>] (mmc_rescan+0x354/0x394)

sets bus mode to push pull

[    1.715562] [<c07ed238>] (mmc_set_bus_width) from [<c07f0488>] (mmc_select_bus_width+0x7c/0x2b8)
[    1.715571] [<c07f0488>] (mmc_select_bus_width) from [<c07f1e0c>] (mmc_init_card+0x1334/0x1988)
[    1.715580] [<c07f1e0c>] (mmc_init_card) from [<c07f288c>] (mmc_attach_mmc+0xc8/0x15c)
[    1.715587] [<c07f288c>] (mmc_attach_mmc) from [<c07ee680>] (mmc_rescan+0x354/0x394)

Select and set the buswidth, since we don't yet know it from the timing.

mmc_init_card then calls mmc_select_timing which doesn't show up on the stack dump for some reason (bug?) which then calls the appropriate timing function.

[    1.724896] [<c07ed620>] (mmc_set_timing) from [<c07f1e90>] (mmc_init_card+0x13b8/0x1988)
[    1.724903] [<c07f1e90>] (mmc_init_card) from [<c07f288c>] (mmc_attach_mmc+0xc8/0x15c)
[    1.724910] [<c07f288c>] (mmc_attach_mmc) from [<c07ee680>] (mmc_rescan+0x354/0x394)

and then calls mmc_set_bus_speed

[    1.725368] [<c07ed03c>] (mmc_set_clock) from [<c07efca4>] (mmc_set_bus_speed+0x58/0x5c)
[    1.725375] [<c07efca4>] (mmc_set_bus_speed) from [<c07f1bc8>] (mmc_init_card+0x10f0/0x1988)
[    1.725382] [<c07f1bc8>] (mmc_init_card) from [<c07f288c>] (mmc_attach_mmc+0xc8/0x15c)
[    1.725389] [<c07f288c>] (mmc_attach_mmc) from [<c07ee680>] (mmc_rescan+0x354/0x394)

Also, I noticed in mmc_init_card that the cache is only set up if HPI (High performance interrupts) aren't broken with the card.

Looking at a boot that hangs, the full mmc_attach_mmc function completes before the hang. So probably not an issue there.

These line shows we have an HS200 mmc, and show up the same with or without the hang, so further proof that the hardwar einit is not to blame.

[    1.978646] mmc2: new HS200 MMC card at address 0001
[    1.980018] mmcblk2: mmc2:0001 HAG2e 14.7 GiB 
[    1.981499] mmcblk2boot0: mmc2:0001 HAG2e partition 1 4.00 MiB
[    1.983016] mmcblk2boot1: mmc2:0001 HAG2e partition 2 4.00 MiB
[    1.983376] mmcblk2rpmb: mmc2:0001 HAG2e partition 3 4.00 MiB, chardev (243:0)

QUESTION RAISED: Do we use a command queue? Added "ENABLING COMMAND QUEUE" message for debugging

dump the ios after init for debugging, dump the ios again in cache flush to see if it changed

lets shift to the block layer. Almost definitely related to the delay of these lines

[  606.505323] Primary GPT is being ignored, using alternate GPT.
[  606.507779]  mmcblk2: p1 p2
[  606.524384] EXT4-fs (mmcblk2p2): mounted filesystem without journal. Opts: (null)
[  606.527328] VFS: Mounted root (ext4 filesystem) on device 179:2.
SolidHal commented 5 years ago

Primary GPT is being ignored, using alternate GPT. is added by this patch: https://github.com/SolidHal/PrawnOS/blob/develop/resources/BuildResources/patches-tested/kernel/0001-block-partitions-efi-Add-support-for-IGNOREME-GPT-si.patch, and is in the file block/partitions/efi.c

SolidHal commented 5 years ago

Do 2 things: 1) Continue finding the source of the corruption/block mis-read

2) try to create a kernel and bootable filesystem on the emmc WITHOUT the gpt patch.

Most interesting is that:

From this I can conclude that something is wrong with using the secondary gpt header. 1) Is the is problem with loading the secondary gpt header itself, or 2) maybe with determining the primary is ignored/invalid? 3) Why are we forced to use the secondary gpt header anyway? I think we have to use the secondary because there is a problem in reading the primary. Reading to/writing to the early sectors of the emmc seems to always result in async errors.

To further refine what I mean in number 3, if I dd to the first 10 logical blocks, I get buffer I/O errors:

[ 2964.366869] print_req_error: I/O error, dev mmcblk2, sector 0
[ 2964.370977] Buffer I/O error on dev mmcblk2, logical block 0, lost async page write
[ 2964.375301] Buffer I/O error on dev mmcblk2, logical block 1, lost async page write
[ 2964.379617] Buffer I/O error on dev mmcblk2, logical block 2, lost async page write
[ 2964.381746] Buffer I/O error on dev mmcblk2, logical block 3, lost async page write
[ 2964.383457] Buffer I/O error on dev mmcblk2, logical block 4, lost async page write
[ 2964.385160] Buffer I/O error on dev mmcblk2, logical block 5, lost async page write
[ 2964.386867] Buffer I/O error on dev mmcblk2, logical block 6, lost async page write
[ 2964.388577] Buffer I/O error on dev mmcblk2, logical block 7, lost async page write
[ 2964.390225] Buffer I/O error on dev mmcblk2, logical block 8, lost async page write
[ 2964.391875] Buffer I/O error on dev mmcblk2, logical block 9, lost async page write

Never more than logical block 9, even if I write to 1000 512 bytes sectors. (maybe rate limited?)

There are 512 bytes in a sectors, and 8 sectors in a logical block. So to write to the first 10 logical blocks you can do dd if=/dev/zero of=/dev/mmcblk2 bs=512 count=80

So, clearly some buffering problems. What if we try and skip the buffer with oflag=direct? dd if=/dev/zero of=/dev/mmcblk2 bs=512 count=80 oflag=direct It just throws:

[   67.602658] print_req_error: I/O error, dev mmcblk2, sector 0
dd: error writing /dev/mmcblk2: Input/Out error

Using the seek feature of dd, we can find out more about these broken sectors. Lets seek 100 sectors in:

dd if=/dev/zero of=/dev/mmcblk2 bs=512 count=80 seek=100 oflag=direct
[   67.602658] print_req_error: I/O error, dev mmcblk2, sector 100
dd: error writing /dev/mmcblk2: Input/Out error

so still no good. Lets look without the direct flag:

dd if=/dev/zero of=/dev/mmcblk2 bs=512 count=80 seek=100 oflag=direct
[ 2964.366869] print_req_error: I/O error, dev mmcblk2, sector 96
[ 2964.370977] Buffer I/O error on dev mmcblk2, logical block 12, lost async page write
[ 2964.375301] Buffer I/O error on dev mmcblk2, logical block 13, lost async page write
[ 2964.379617] Buffer I/O error on dev mmcblk2, logical block 13, lost async page write
[ 2964.381746] Buffer I/O error on dev mmcblk2, logical block 14, lost async page write
[ 2964.383457] Buffer I/O error on dev mmcblk2, logical block 15, lost async page write
[ 2964.385160] Buffer I/O error on dev mmcblk2, logical block 16, lost async page write
[ 2964.386867] Buffer I/O error on dev mmcblk2, logical block 17, lost async page write
[ 2964.388577] Buffer I/O error on dev mmcblk2, logical block 18 lost async page write
[ 2964.390225] Buffer I/O error on dev mmcblk2, logical block 19, lost async page write
[ 2964.391875] Buffer I/O error on dev mmcblk2, logical block 20, lost async page write
[ 2964.392675 Buffer I/O error on dev mmcblk2, logical block 21, lost async page write

Ahh, so these sectors are still broken.

I know the data sector of the chrome os partition map started at 20480, so lets try there as that has to work...:

dd if=/dev/zero of=/dev/mmcblk2 bs=512 count=80 seek=20480 oflag=direct
80+0 records in
80+0 records out
40960 bytes (41 kB, 40 KiB) copied, 0.0883362 s, 464 KB/s

It works!

How low can we go! The first sector I can successfully write to is # 16384. That makes sense, 16384 * 512 /1024 /1024 is 8MB, so the first 8MB of the device is completely unusable. I have a feeling I know why. There are two "boot" partitions, each 4M in size.

This isn't the case. I read both of the boot partitions. They both were filled with zeros. I confirmed this by running, which will show non-zero output if it exists

od /dev/mmcblk2boot# | head

I then dumped the first 16384 sectors of /dev/mmcblk2:

dd if=/dev/mmcblk2 of=mmcblk2_dump.bin bs=512 count=16384

The only non zero sections are:

E5 04 AB B3 74 A1 41 8E BD E4 48 FE F1 D5 7D 0A 00 00 00 00 1D 9A 00 00 02 00 EE FF FF FF 01 00 00 00 FF BF D5 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 AA 49 47 4E 4F 52 45 4D 45 00 00 00 00 00 00 00 00 C4 67 A4 53

Which is: Anything that doesn't render properly has been replaced with *

******U™IGNOREME******

soooooo... The invalid primary gpt header is baked in for some reason.

To get some more info on why this might be, I ran the chromeos recovery. Im attaching the recovery log here: chromeos_recovery.log

I found that they use cgpt to dump the partition map, so maybe they use it to create the partitions as well? Heres the partition map of the emmc dumped with debug info using cgpt:

       start        size    part  contents
           0           1          PMBR (Boot GUID: 7CE2DD5A-EF83-0247-A2FF-05965ADF2E22)
           1           1 IGNORED  Pri GPT header
    30785503          32          Sec GPT table
     8704000    22077440       1  Label: "STATE"
                                  Type: Linux data
                                  UUID: B663593D-F9F5-F04F-90B6-88690A87BC90
       20480       32768       2  Label: "KERN-A"
                                  Type: ChromeOS kernel
                                  UUID: B6DED3FF-55A5-A249-B023-83C4C96193FB
                                  Attr: priority=1 tries=6 successful=1
     4509696     4194304       3  Label: "ROOT-A"
                                  Type: ChromeOS rootfs
                                  UUID: EC18BDF4-5F74-2C45-AB3F-A73300C53FFA
       53248       32768       4  Label: "KERN-B"
                                  Type: ChromeOS kernel
                                  UUID: B41EC191-68F0-9045-8AAF-3389B7A438E0
                                  Attr: priority=0 tries=15 successful=0
      315392     4194304       5  Label: "ROOT-B"
                                  Type: ChromeOS rootfs
                                  UUID: 1384D932-90D2-654D-AA24-102E4425EF3F
       16448           1       6  Label: "KERN-C"
                                  Type: ChromeOS kernel
                                  UUID: 7FEBD683-66F3-E84E-8F3F-6016B7FC1A75
                                  Attr: priority=0 tries=15 successful=0
       16449           1       7  Label: "ROOT-C"
                                  Type: ChromeOS rootfs
                                  UUID: 5E929AF3-AD37-7349-8A2C-3F65ECFD3A73
       86016       32768       8  Label: "OEM"
                                  Type: Linux data
                                  UUID: 4D8FB2B6-731F-B04A-ACA7-8E94FE5E1CB9
       16450           1       9  Label: "reserved"
                                  Type: ChromeOS reserved
                                  UUID: 38B962D7-E470-0A43-865F-6560D767CB74
       16451           1      10  Label: "reserved"
                                  Type: ChromeOS reserved
                                  UUID: 544D5A16-BA7A-DA40-B019-EEAE66A26D5F
          64       16384      11  Label: "RWFW"
                                  Type: ChromeOS firmware
                                  UUID: 7F111A09-ECB4-EB4A-9DB2-B7162D8C5093
      249856       65536      12  Label: "EFI-SYSTEM"
                                  Type: EFI System Partition
                                  UUID: 48A3ADD5-E9B4-AB47-90B7-9FE1D9C9D25F
    30785535           1          Sec GPT header
                                  Sig: [EFI PART]
                                  Rev: 0x00010000
                                  Size: 92
                                  Header CRC: 0x4eb39c6f 
                                  My LBA: 30785535
                                  Alternate LBA: 1
                                  First LBA: 34
                                  Last LBA: 30785502
                                  Disk UUID: 73C380DB-31B6-B343-A674-05AF1C6C3FCC
                                  Entries LBA: 30785503
                                  Number of entries: 128
                                  Size of entry: 128
                                  Entries CRC: 0xe7d15af9 

I mounted p8, the OEM partition. But it had nothing in it. p12 is interesting though.

Its all zeros after the first section which is:

EB 3C 90 6D 6B 66 73 2E 66 61 74 00 02 04 01 00 02 00 02 00 00 F8 40 00 20 00 40 00 00 00 00 00 00 00 01 00 80 00 29 87 9F C5 DB 45 46 49 2D 53 59 53 54 45 4D 20 46 41 54 31 36 20 20 20 0E 1F BE 5B 7C AC 22 C0 74 0B 56 B4 0E BB 07 00 CD 10 5E EB F0 32 E4 CD 16 CD 19 EB FE 54 68 69 73 20 69 73 20 6E 6F 74 20 61 20 62 6F 6F 74 61 62 6C 65 20 64 69 73 6B 2E 20 20 50 6C 65 61 73 65 20 69 6E 73 65 72 74 20 61 20 62 6F 6F 74 61 62 6C 65 20 66 6C 6F 70 70 79 20 61 6E 64 0D 0A 70 72 65 73 73 20 61 6E 79 20 6B 65 79 20 74 6F 20 74 72 79 20 61 67 61 69 6E 20 2E 2E 2E 20 0D 0A 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 AA F8 FF FF FF

Which roughly translates to this. Anything that doesn't render properly has been replaced with *

*****EFI-SYSTEM FAT16  *****This is not a bootable disk.  Please insert a bootable floppy and
press any key to try again ... *****U™*****

seems like a hacky version of a pmbr.

So, that why the alternative gpt header must be used, and thats also why the first section of the emmc can't be read or written to. Because it is purposely broken.

TODO:

Test making a bootable partition map using the chromeos one as a base. Test including p12, and dd it over from the chromeos installl. Do all of this WITH the IGNOREME patch.

SolidHal commented 5 years ago

Did the above TODO, no improvement.

So, we end up in find_valid_gpt() from this stack:

[    1.952496] [<c0544e78>] (efi_partition) from [<c0543c78>] (check_partition+0x110/0x1d8)
[    1.952508] [<c0543c78>] (check_partition) from [<c0541f64>] (rescan_partitions+0x78/0x424)
[    1.952521] [<c0541f64>] (rescan_partitions) from [<c03fac64>] (__blkdev_get+0x214/0x404)
[    1.952531] [<c03fac64>] (__blkdev_get) from [<c03faf18>] (blkdev_get+0xc4/0x2dc)
[    1.952540] [<c03faf18>] (blkdev_get) from [<c05407f8>] (__device_add_disk+0x3fc/0x468)
[    1.952551] [<c05407f8>] (__device_add_disk) from [<c0540880>] (device_add_disk+0x1c/0x20)
[    1.952564] [<c0540880>] (device_add_disk) from [<c07fc17c>] (mmc_add_disk+0x20/0xf8)
[    1.952576] [<c07fc17c>] (mmc_add_disk) from [<c07fc984>] (mmc_blk_probe+0x4fc/0x68c)
[    1.952586] [<c07fc984>] (mmc_blk_probe) from [<c07ee928>] (mmc_bus_probe+0x24/0x28)
[    1.952601] [<c07ee928>] (mmc_bus_probe) from [<c0684210>] (driver_probe_device+0x20c/0x468)
[    1.952614] [<c0684210>] (driver_probe_device) from [<c0684678>] (__device_attach_driver+0xc0/0x108)
[    1.952625] [<c0684678>] (__device_attach_driver) from [<c0682410>] (bus_for_each_drv+0x94/0xa4)
[    1.952636] [<c0682410>] (bus_for_each_drv) from [<c0683ef0>] (__device_attach+0x9c/0x138)
[    1.952647] [<c0683ef0>] (__device_attach) from [<c06846dc>] (device_initial_probe+0x1c/0x20)
[    1.952657] [<c06846dc>] (device_initial_probe) from [<c06831dc>] (bus_probe_device+0x38/0x90)
[    1.952668] [<c06831dc>] (bus_probe_device) from [<c0681228>] (device_add+0x2f4/0x574)
[    1.952678] [<c0681228>] (device_add) from [<c07eef30>] (mmc_add_card+0x1f8/0x25c)
[    1.952689] [<c07eef30>] (mmc_add_card) from [<c07f2894>] (mmc_attach_mmc+0xe0/0x15c)
[    1.952699] [<c07f2894>] (mmc_attach_mmc) from [<c07ee670>] (mmc_rescan+0x354/0x394)
[    1.952711] [<c07ee670>] (mmc_rescan) from [<c0244a28>] (process_one_work+0x250/0x474)

Seems mmc_attach_mmc is our common contact. What part of this process results in a bad read?

Looking at some further debug info:

[    1.938794] mmc2: new HS200 MMC card at address 0001
[    1.940658] mmcblk2: mmc2:0001 HAG2e 14.7 GiB 
[    1.942205] mmcblk2boot0: mmc2:0001 HAG2e partition 1 4.00 MiB
[    1.945144] mmcblk2boot1: mmc2:0001 HAG2e partition 2 4.00 MiB
[    1.945554] mmcblk2rpmb: mmc2:0001 HAG2e partition 3 4.00 MiB, chardev (243:0)
[    1.946102] HAL_DEBUG: in __blkdev_get
[    1.946116] HAL_DEBUG: mmcblk2: in __blkdev_get got a disk
[    1.946129] HAL_DEBUG: mmcblk2: in __blkdev_get, calling first rescan
[    1.946137] HAL_DEBUG: mmcblk2: in rescan_partition, may have gotten here from the goto
[    1.946315] HAL_DEBUG: In find_valid_gpt
[    1.946332] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.17.2-gnu+ #1
[    1.946339] Hardware name: Rockchip (Device Tree)
[    1.946362] Workqueue: events_freezable mmc_rescan
[    1.946406] [<c02126b0>] (unwind_backtrace) from [<c020ceac>] (show_stack+0x20/0x24)
[    1.946430] [<c020ceac>] (show_stack) from [<c0ac2264>] (dump_stack+0x84/0xa4)
[    1.946456] [<c0ac2264>] (dump_stack) from [<c0544f88>] (efi_partition+0xb0/0x9d4)
[    1.946480] [<c0544f88>] (efi_partition) from [<c0543d88>] (check_partition+0x110/0x1d8)
[    1.946502] [<c0543d88>] (check_partition) from [<c0542060>] (rescan_partitions+0xa4/0x464)
[    1.946526] [<c0542060>] (rescan_partitions) from [<c03fac88>] (__blkdev_get+0x23c/0x4dc)
[    1.946545] [<c03fac88>] (__blkdev_get) from [<c03fafec>] (blkdev_get+0xc4/0x2dc)
[    1.946563] [<c03fafec>] (blkdev_get) from [<c05408c8>] (__device_add_disk+0x3fc/0x468)
[    1.946583] [<c05408c8>] (__device_add_disk) from [<c0540950>] (device_add_disk+0x1c/0x20)
[    1.946606] [<c0540950>] (device_add_disk) from [<c07fc28c>] (mmc_add_disk+0x20/0xf8)
[    1.946628] [<c07fc28c>] (mmc_add_disk) from [<c07fca94>] (mmc_blk_probe+0x4fc/0x68c)
[    1.946647] [<c07fca94>] (mmc_blk_probe) from [<c07eea38>] (mmc_bus_probe+0x24/0x28)
[    1.946673] [<c07eea38>] (mmc_bus_probe) from [<c0684320>] (driver_probe_device+0x20c/0x468)
[    1.946698] [<c0684320>] (driver_probe_device) from [<c0684788>] (__device_attach_driver+0xc0/0x108)
[    1.946719] [<c0684788>] (__device_attach_driver) from [<c0682520>] (bus_for_each_drv+0x94/0xa4)
[    1.946739] [<c0682520>] (bus_for_each_drv) from [<c0684000>] (__device_attach+0x9c/0x138)
[    1.946760] [<c0684000>] (__device_attach) from [<c06847ec>] (device_initial_probe+0x1c/0x20)
[    1.946780] [<c06847ec>] (device_initial_probe) from [<c06832ec>] (bus_probe_device+0x38/0x90)
[    1.946800] [<c06832ec>] (bus_probe_device) from [<c0681338>] (device_add+0x2f4/0x574)
[    1.946820] [<c0681338>] (device_add) from [<c07ef040>] (mmc_add_card+0x1f8/0x25c)
[    1.946840] [<c07ef040>] (mmc_add_card) from [<c07f29a4>] (mmc_attach_mmc+0xe0/0x15c)
[    1.946859] [<c07f29a4>] (mmc_attach_mmc) from [<c07ee780>] (mmc_rescan+0x354/0x394)
[    1.946879] [<c07ee780>] (mmc_rescan) from [<c0244a28>] (process_one_work+0x250/0x474)
[    1.946898] [<c0244a28>] (process_one_work) from [<c0245938>] (worker_thread+0x2ec/0x430)
[    1.946918] [<c0245938>] (worker_thread) from [<c024a4e4>] (kthread+0x144/0x160)
[    1.946939] [<c024a4e4>] (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[    1.946948] Exception stack(0xed157fb0 to 0xed157ff8)
[    1.946959] 7fa0:                                     00000000 00000000 00000000 00000000
[    1.946972] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.946984] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    2.008755] dwmmc_rockchip ff0d0000.dwmmc: Successfully tuned phase to 189
[    2.015351] mmc1: new ultra high speed SDR104 SDIO card at address 0001
[    2.040201] Console: switching to colour frame buffer device 170x48
[    2.061676] HAL_DEBUG: in mmc_blk_poll_completion, and mmc_blk_rq_error is TRUE!
[    2.061678] mmcblk2: HAL_DEBUG: in mmc_blk_mq_recovery!
[    2.061712] mmcblk2: HAL_DEBUG: in mmc_blk_mq_recovery, about to call problem reset!
[    2.175949] HAL_DEBUG: in mmc_blk_poll_completion, and mmc_blk_rq_error is TRUE!
[    2.175951] mmcblk2: HAL_DEBUG: in mmc_blk_mq_recovery!
[    2.175981] mmcblk2: HAL_DEBUG: in mmc_blk_mq_recovery, about to call problem reset!
[    2.175982] mmcblk2: HAL_DEBUG: in mmc_blk_reset
[    2.175985] mmc2: in mmc_reset, flushing cache

looks like we are in find_valid_gpt, then efi_partition. Does efi_partition return? Does find_valid_gpt return?

SolidHal commented 5 years ago

find_valid_gpt does not return. This order of events suggests that its an issue in reading the mmc's pmbr:

[    1.976457] HAL_DEBUG: find_valid_gpt, reading legacy mbr 
[    1.989443] Console: switching to colour frame buffer device 170x48
[    2.089782] HAL_DEBUG: in mmc_blk_poll_completion, and mmc_blk_rq_error is TRUE!

as heres the code:

    pr_warn("HAL_DEBUG: %s, reading legacy mbr \n", __func__);
        read_lba(state, 0, (u8 *)legacymbr, sizeof(*legacymbr));

    pr_warn("HAL_DEBUG: %s, read legacy mbr, checking if is valid pmbr \n", __func__);

and heres the log of a clean boot:

[    1.960233] HAL_DEBUG: find_valid_gpt, reading legacy mbr 
[    1.960940] HAL_DEBUG: find_valid_gpt, read legacy mbr, checking if is valid pmbr 
[    1.960943] HAL_DEBUG: find_valid_gpt, checked if pmbr is valid, freeing legacymbr 
SolidHal commented 5 years ago

Looks like it has trouble reading the pmbr:

[    1.984256] HAL_DEBUG: find_valid_gpt, reading legacy mbr 
[    1.984260] HAL_DEBUG: In read-lba
[    1.984264] HAL_DEBUG: In read-lba, getting n sector
[    1.984271] HAL_DEBUG: In read-lba, got n sector, pre while. count = 512
[    1.984275] HAL_DEBUG: In read-lba, at start of while, count = 512
[    1.984280] HAL_DEBUG: In read-lba, reading n++ sector, count = 512
[    1.996057] Console: switching to colour frame buffer device 170x48
[    2.097711] HAL_DEBUG: in mmc_blk_poll_completion, and mmc_blk_rq_error is TRUE!

Confirmed hang location with second hang.

Checked out the the pmbr, which is baked in and unchangable https://groups.google.com/a/chromium.org/forum/?hl=en#!topic/chromium-os-dev/SNIVpVjm40c

Seemed to be fine, although the disk GUID field didn't seem to match any know GUID?

Continuing, find_valid_gpt calls read_lba which calls read_part_sectorcalls read_dev_sector calls read_mapping_page calls read_cache_page calls do_read_cache_page calls

heres the hang boot:

[    1.977478] HAL_DEBUG: find_valid_gpt, reading legacy mbr 
[    1.977482] HAL_DEBUG: In read-lba
[    1.977485] HAL_DEBUG: In read-lba, getting n sector
[    1.977489] HAL_DEBUG: In read-lba, got n sector, pre while. count = 512
[    1.977493] HAL_DEBUG: In read-lba, at start of while, count = 512
[    1.977496] HAL_DEBUG: In read-lba, reading n++ sector, count = 512
[    1.977499] HAL_DEBUG: in read_part_sector
[    1.977502] HAL_DEBUG: in read_part_sector, calling read_dev_sector
[    1.977505] HAL_DEBUG: in read_DEV_sector, calling read_mapping_page
[    1.977508] HAL_DEBUG: in read_mapping_page, returning read_cache_page
[    1.977512] HAL_DEBUG: in read_cache_page, calling DO_read_cache_page
[    1.977515] HAL_DEBUG: in DO_read_cache_page, entering repeat for first time
[    1.977518] HAL_DEBUG: in DO_read_cache_page, in repeat, calling find_get_page
[    1.977523] HAL_DEBUG: in DO_read_cache_page, first if not page, calling __page_cache_alloc
[    1.977527] HAL_DEBUG: in DO_read_cache_page, calling add_to_page_cache_lru
[    1.977540] HAL_DEBUG: in DO_read_cache_page, returned from add_to_cache_lru
[    1.977544] HAL_DEBUG: in DO_read_cache_page, pre first filler flag
[    1.977547] HAL_DEBUG: in DO_read_cache_page, in filler flag, calling filler
[    1.977727] HAL_DEBUG: in DO_read_cache_page, in filler flag, RETURNED from filler
[    1.977844] HAL_DEBUG: in DO_read_cache_page, in filler flag, calling wait_on_page_read
[    1.996805] Console: switching to colour frame buffer device 170x48
[    2.091106] HAL_DEBUG: in mmc_blk_poll_completion, and mmc_blk_rq_error is TRUE!

Versus a clean boot:

[    1.990933] HAL_DEBUG: find_valid_gpt, reading legacy mbr 
[    1.990945] HAL_DEBUG: In read-lba
[    1.990956] HAL_DEBUG: In read-lba, getting n sector
[    1.990971] HAL_DEBUG: In read-lba, got n sector, pre while. count = 512
[    1.990984] HAL_DEBUG: In read-lba, at start of while, count = 512
[    1.990996] HAL_DEBUG: In read-lba, reading n++ sector, count = 512
[    1.991007] HAL_DEBUG: in read_part_sector
[    1.991018] HAL_DEBUG: in read_part_sector, calling read_dev_sector
[    1.991030] HAL_DEBUG: in read_DEV_sector, calling read_mapping_page
[    1.991042] HAL_DEBUG: in read_mapping_page, returning read_cache_page
[    1.991054] HAL_DEBUG: in read_cache_page, calling DO_read_cache_page
[    1.991066] HAL_DEBUG: in DO_read_cache_page, entering repeat for first time
[    1.991078] HAL_DEBUG: in DO_read_cache_page, in repeat, calling find_get_page
[    1.991094] HAL_DEBUG: in DO_read_cache_page, first if not page, calling __page_cache_alloc
[    1.991129] HAL_DEBUG: in DO_read_cache_page, calling add_to_page_cache_lru
[    1.991166] HAL_DEBUG: in DO_read_cache_page, returned from add_to_cache_lru
[    1.991179] HAL_DEBUG: in DO_read_cache_page, pre first filler flag
[    1.991191] HAL_DEBUG: in DO_read_cache_page, in filler flag, calling filler
[    1.991719] HAL_DEBUG: in DO_read_cache_page, in filler flag, RETURNED from filler
[    1.992049] HAL_DEBUG: in DO_read_cache_page, in filler flag, calling wait_on_page_read
[    1.992647] HAL_DEBUG: in DO_read_cache_page, in filler flag, returned from wait_on_page_read
[    1.992661] HAL_DEBUG: in DO_read_cache_page, in filler flag, GOTO out
[    1.992673] HAL_DEBUG: in DO_read_cache_page, AT OUT FLAG
[    1.992685] HAL_DEBUG: in DO_read_cache_page, AT OUT FLAG, returning page!
[    1.992699] HAL_DEBUG: in read_DEV_sector, RETURNING pag address of page plus some shift
[    1.992715] HAL_DEBUG: In read-lba, read n++ sector, count = 512

Seems wait_on_page_locked is the issue

wait_on_page_locked calls wait_on_page_bit in both the hang and the clean boot, but doesn't return in the hang.

SolidHal commented 5 years ago

So it seems the train ends in io_sched

Heres the clean boot:

Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955556] HAL_DEBUG: find_valid_gpt, reading legacy mbr 
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955558] HAL_DEBUG: In read-lba
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955559] HAL_DEBUG: In read-lba, getting n sector
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955561] HAL_DEBUG: In read-lba, got n sector, pre while. count = 512
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955563] HAL_DEBUG: In read-lba, at start of while, count = 512
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955564] HAL_DEBUG: In read-lba, reading n++ sector, count = 512
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955566] HAL_DEBUG: in read_part_sector
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955567] HAL_DEBUG: in read_part_sector, calling read_dev_sector
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955569] HAL_DEBUG: in read_DEV_sector, calling read_mapping_page
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955571] HAL_DEBUG: in read_mapping_page, returning read_cache_page
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955573] HAL_DEBUG: in read_cache_page, calling DO_read_cache_page
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955575] HAL_DEBUG: in DO_read_cache_page, entering repeat for first time
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955576] HAL_DEBUG: in DO_read_cache_page, in repeat, calling find_get_page
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955579] HAL_DEBUG: in DO_read_cache_page, first if not page, calling __page_cache_alloc
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955582] HAL_DEBUG: in DO_read_cache_page, calling add_to_page_cache_lru
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955592] HAL_DEBUG: in DO_read_cache_page, returned from add_to_cache_lru
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955595] HAL_DEBUG: in DO_read_cache_page, pre first filler flag
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955596] HAL_DEBUG: in DO_read_cache_page, in filler flag, calling filler
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955675] HAL_DEBUG: in DO_read_cache_page, in filler flag, RETURNED from filler
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955677] HAL_DEBUG: in DO_read_cache_page, in filler flag, calling wait_on_page_read
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955678] HAL_DEBUG: in wait_on_page_read
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955680] HAL_DEBUG: in wait_on_page_read, calling wait_on_page_locked
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955682] HAL_DEBUG: in wait_on_page_LOCKED, calling PageLocked(page)
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955683] HAL_DEBUG: in wait_on_page_LOCKED, pageLocked is true, calling wait_on_page_bit
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955685] HAL_DEBUG: in wait_on_page_bit, getting a waitqueue for page
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955707] HAL_DEBUG: in wait_on_page_bit, got a waitqueue for page, calling wait_on_bit_common
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955709] HAL_DEBUG: in wait_on_bit_common
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955710] HAL_DEBUG: in wait_on_bit_common, calling init_wait
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955712] HAL_DEBUG: in wait_for_wait_bit entering infinite for loop...
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955713] HAL_DEBUG: in wait_on_bit_common, calling spin_lock_irq
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955715] HAL_DEBUG: in wait_on_bit_common, spin_lock_irq returned
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955717] HAL_DEBUG: in wait_on_bit_common, likely list empty is true
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955719] HAL_DEBUG: in wait_on_bit_common, added wait queue entry tail, calling set page waiters
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955720] HAL_DEBUG: in wait_on_bit_common, set page waiters returned
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955722] HAL_DEBUG: in wait_on_bit_common, calling set cur state
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955723] HAL_DEBUG: in wait_on_bit_common, set cur state returned, calling spin_unlock_irq num 2
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955725] HAL_DEBUG: in wait_on_bit_common, spin lock irq num 2 returned, call test bit
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.955726] HAL_DEBUG: in wait_on_bit_common, likely test bit is true, calling io_sched
Oct 17 20:37:07 PrawnOS-Alpha kernel: [    1.956311] HAL_DEBUG: in wait_on_bit_common, io_sched returned

And the hang:

Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.975861] HAL_DEBUG: find_valid_gpt, reading legacy mbr 
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.975866] HAL_DEBUG: In read-lba
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.975870] HAL_DEBUG: In read-lba, getting n sector
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.975875] HAL_DEBUG: In read-lba, got n sector, pre while. count = 512
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.975880] HAL_DEBUG: In read-lba, at start of while, count = 512
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.975885] HAL_DEBUG: In read-lba, reading n++ sector, count = 512
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.975889] HAL_DEBUG: in read_part_sector
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.975893] HAL_DEBUG: in read_part_sector, calling read_dev_sector
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.975897] HAL_DEBUG: in read_DEV_sector, calling read_mapping_page
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.975902] HAL_DEBUG: in read_mapping_page, returning read_cache_page
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.975906] HAL_DEBUG: in read_cache_page, calling DO_read_cache_page
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.975911] HAL_DEBUG: in DO_read_cache_page, entering repeat for first time
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.975915] HAL_DEBUG: in DO_read_cache_page, in repeat, calling find_get_page
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.975921] HAL_DEBUG: in DO_read_cache_page, first if not page, calling __page_cache_alloc
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.975927] HAL_DEBUG: in DO_read_cache_page, calling add_to_page_cache_lru
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.975945] HAL_DEBUG: in DO_read_cache_page, returned from add_to_cache_lru
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.975950] HAL_DEBUG: in DO_read_cache_page, pre first filler flag
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.975954] HAL_DEBUG: in DO_read_cache_page, in filler flag, calling filler
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.976353] HAL_DEBUG: in DO_read_cache_page, in filler flag, RETURNED from filler
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.976357] HAL_DEBUG: in DO_read_cache_page, in filler flag, calling wait_on_page_read
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.976359] HAL_DEBUG: in wait_on_page_read
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.976362] HAL_DEBUG: in wait_on_page_read, calling wait_on_page_locked
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.976365] HAL_DEBUG: in wait_on_page_LOCKED, calling PageLocked(page)
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.976368] HAL_DEBUG: in wait_on_page_LOCKED, pageLocked is true, calling wait_on_page_bit
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.976370] HAL_DEBUG: in wait_on_page_bit, getting a waitqueue for page
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.976373] HAL_DEBUG: in wait_on_page_bit, got a waitqueue for page, calling wait_on_bit_common
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.976376] HAL_DEBUG: in wait_on_bit_common
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.976378] HAL_DEBUG: in wait_on_bit_common, calling init_wait
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.976381] HAL_DEBUG: in wait_for_wait_bit entering infinite for loop...
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.976383] HAL_DEBUG: in wait_on_bit_common, calling spin_lock_irq
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.976386] HAL_DEBUG: in wait_on_bit_common, spin_lock_irq returned
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.976389] HAL_DEBUG: in wait_on_bit_common, likely list empty is true
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.976392] HAL_DEBUG: in wait_on_bit_common, added wait queue entry tail, calling set page waiters
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.976394] HAL_DEBUG: in wait_on_bit_common, set page waiters returned
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.976396] HAL_DEBUG: in wait_on_bit_common, calling set cur state
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.976399] HAL_DEBUG: in wait_on_bit_common, set cur state returned, calling spin_unlock_irq num 2
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.976402] HAL_DEBUG: in wait_on_bit_common, spin lock irq num 2 returned, call test bit
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.976405] HAL_DEBUG: in wait_on_bit_common, likely test bit is true, calling io_sched
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    1.988364] Console: switching to colour frame buffer device 170x48
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    2.089309] HAL_DEBUG: in mmc_blk_poll_completion, and mmc_blk_rq_error is TRUE!
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    2.090657] rockchip-drm display-subsystem: fb0:  frame buffer device
Oct 17 21:34:34 PrawnOS-Alpha kernel: [    2.092771] mmcblk2: HAL_DEBUG: in mmc_blk_mq_recovery!
SolidHal commented 5 years ago

Does chromeos enable/flush the cache?

SolidHal commented 5 years ago

Tested reversing this commit for kicks. https://github.com/torvalds/linux/commit/93c23ae385299f889606b42507b12b40e50d6088 no improvements.

Have potential fix, assembling patch.

SolidHal commented 5 years ago

Talking with the mmc kernel mailing list. as the patches I've created are more workarounds than perfect fixes.

Suggest I tried two things:

1)run the mmc_reset test using the mmc_test driver. Attach the test driver with instructing from here: https://forum.kernelnewbies.org/read.php?12,1991,116405 use mmc2:0001

2) Replace the mmc cache flush with another command

replaced it with a call to mmc_alive()

SolidHal commented 5 years ago

this issue is resolved in PrawnOS. the patch can be found here: https://github.com/SolidHal/PrawnOS/blob/master/resources/BuildResources/patches-tested/kernel/Don-t-try-to-flush-cache-on-reset.patch