dreemurrs-embedded / Pine64-Arch

:penguin: Arch Linux ARM for your PinePhone/Pro and PineTab/2
716 stars 108 forks source link

Frequent freezing and crashing with eMMC VCCQ mod #404

Closed dariox86 closed 2 months ago

dariox86 commented 2 years ago

Steps to reproduce

Perform eMMC VCCQ mod as described here and copy pinephone-vccq-mod.dtbo and user.scr to /boot.

Expected behavior

I expect the device to work with no stability issues.

Actual behavior

I performed the hardware modification in January 2022. Since then I have been experiencing frequent freezing and crashing as if the eMMC becomes unreadable and unwritable all of sudden during normal operation. Sometimes the device does not freeze as long as everything you need in that specific moment runs from RAM. As soon as I do something connected to internal storage, like launching a new application that is not already in the RAM, the device freezes. A sufficiently long eMMC I/O activity is enough to reproduce the issue. This is bound to happen about ten times a day on average during normal operation.

Logfiles and additional information

I don't know what log could be useful. Suggestions are welcome.

Danct12 commented 2 years ago

As this is a hardware mod, any problem occurred by the mod is outside of my support. But it's possible that the eMMC used in your device is not compatible with the mod.

You may want to reach out to dsimic in the PinePhone chat.

On Thu, 02 Jun 2022 12:22:58 -0700 Dario @.***> wrote:

  • Device: PinePhone
  • Kernel Version : 5.17.6-1-danctnix
  • UI: Phosh

Steps to reproduce

Perform eMMC VCCQ mod as described here and copy pinephone-vccq-mod.dtbo and user.scr to /boot.

Expected behavior

I expect the device to work with no stability issues.

Actual behavior

I performed the hardware modification in January 2022. Since then I have been experiencing frequent freezing and crashing as if the eMMC becomes unreadable and unwritable all of sudden during normal operation. Sometimes the device does not freeze as long as everything you need in that specific moment runs from RAM. As soon as I do something connected to internal storage, like launching a new application that is not already in the RAM, the device freezes. A sufficiently long eMMC I/O activity is enough to reproduce the issue. This is bound to happen about ten times a day on average during normal operation.

Logfiles and additional information

I don't know what log could be useful. Suggestions are welcome.

-- Reply to this email directly or view it on GitHub: https://github.com/dreemurrs-embedded/Pine64-Arch/issues/404 You are receiving this because you are subscribed to this thread.

Message ID: @.***>

dariox86 commented 2 years ago

Update: I tried to pinpoint the issue and I concluded that this is something specific to Arch Linux ARM DanctNIX. I connected a USB stick to the PinePhone through the hub and launched archlinux-pinephone-phosh-20220502 from a microSD. From the live system I launched a copy operation of my /home/alarm in the eMMC to the USB stick. Eventually the copy operation would freeze before finishing the copy operation. I tried it four times just to be sure.

Then I tried doing the same with 20220601-0442-postmarketOS-v21.12-phosh-17-pine64-pinephone. It took literally hours but in the end it worked on first try.

Of course, in both cases I had to copy pinephone-vccq-mod.dtbo and user.scr to the boot partition.

Danct12 commented 2 years ago

Does the whole system hangs when copy operating hangs? Can you please post dmesg?

dariox86 commented 2 years ago

When running from microSD I can still manage to get hold of the system by killing the copy process. When the same problem occurs when running from eMMC, the system operates erratically. It may or may not freeze, though even when the system is not completely frozen I can not launch any new application or load a file because it is unable to communicate with eMMC. I will try again and post my dmesg for you to check.

dariox86 commented 2 years ago

I launched the copy command and when the copy froze I dumped dmesg output. The only relevant lines I see are:

[  710.656872] sunxi-mmc 1c11000.mmc: data error, sending stop command
[  711.661903] sunxi-mmc 1c11000.mmc: send stop command failed

Attached full dmesg log dmesg.txt .

Danct12 commented 2 years ago

That looks like the eMMC driver tried to read/write some data, but failed.

Are you sure the VCCQ patch files are installed? Non-VCCQ images do not work properly on a modded device.

dariox86 commented 2 years ago

pinephone-vccq-mod.dtbo and user.scr are in the boot partition. It would not boot without these files in place.

bfra2373 commented 1 year ago

@dariox86 Any luck lately with this issue?

dariox86 commented 1 year ago

A while ago something changed. It does not freeze anymore as long as the screen is turned on. It only happens when the device is idle and the screen is turned off. Turning off the screen with the power button even for a second can be sufficient to trigger the issue. If I am unlucky, it can happen up to four times in a row. If I am lucky, it will stay at rest for a night and it will still be operating at morning. It is very random. On average, it happens a dozen times in the span of a day. It seems less frequent when the device is plugged to the power via USB. On a side note, I have been experiencing a bunch of unrelated regressions. I did not have the time to pinpoint the respective causes.

dariox86 commented 1 year ago

When a new release is out I may try to reinstall everything from scratch.

bfra2373 commented 1 year ago

Does it run fine on Mobian or pmOS?

dariox86 commented 1 year ago

Back then Arch Linux ARM DanctNIX ran fine when booted from a microSD. I could reproduce the problem by issuing a long copy command from the eMMC. Eventually the eMMC would not respond. Doing the same from postmarketOS did not cause problems. At the moment I can not reinstall another operating system on the eMMC because I use my device as my daily driver. I would need to copy a whole bunch of data out and back on the device.

bfra2373 commented 1 year ago

I understand! I had a bit of the same issue with data management but know I use syncthing to sync /home to my home computer. So no more headache when I need to replace the OS!

dariox86 commented 1 year ago

I do the same with my computer, I can afford to lose everything at any time, but I have yet to set up a similar feature for my smartphone.

dariox86 commented 1 year ago

Tried again with Arch Linux ARM DanctNIX installed from scratch from the latest release 2023/02/03: same issue all along, turning the screen off even for a split second is enough to randomly trigger the issue. Then I tried with postmarketOS: went smooth for the last twenty-four hours.

sorry-i-am-late commented 1 year ago

Tried again with Arch Linux ARM DanctNIX installed from scratch from the latest release 2023/02/03: same issue all along, turning the screen off even for a split second is enough to randomly trigger the issue. Then I tried with postmarketOS: went smooth for the last twenty-four hours.

I read that some hardware sub models do not support this mod, what revision do you have? I would like to do this on my device but, mine is a ubports (v1.2) so I want to be a sure as possible that I am not bricking anything.

dariox86 commented 1 year ago

If I am not mistaken, I had version 1.2. It is the one that came with Manjaro preinstalled.

AndreySV commented 1 year ago
  1. Different batches may have different eMMC chips installed. Could you post information about yours for statistics.

This one is on my 1.2a (3/32) revision:

root@mobian-dev:/sys/block/mmcblk2/device# cat /sys/block/mmcblk2/device/oemid
0x0100
root@mobian-dev:/sys/block/mmcblk2/device# cat /sys/block/mmcblk2/device/manfid 
0x000045
root@mobian-dev:/sys/block/mmcblk2/device# cat /sys/block/mmcblk2/device/date
06/2020
root@mobian-dev:/sys/block/mmcblk2/device# cat /sys/block/mmcblk2/device/fwrev
0x3034313430363139
root@mobian-dev:/sys/block/mmcblk2/device# cat /sys/block/mmcblk2/device/hwrev
0x0
  1. Could you take kernel logs from serial (UART) console, when the devices hangs? I've moded my PP and have freezes once in a several days on Mobian with 6.1. The last one was related to eMMC.

After couple of minutes I've got following messages on serial console:

[293876.407089] INFO: task systemd-journal:305 blocked for more than 241 seconds.
[293876.415187]       Tainted: G         C  E      6.1-sunxi64 #1
[293876.422697] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[293876.432247] task:systemd-journal state:D stack:0     pid:305   ppid:1      flags:0x0000080c
[293876.441568] Call trace:
[293876.445748]  __switch_to+0xc0/0x130
[293876.450013]  __schedule+0x388/0x994
[293876.454458]  schedule+0x54/0xdc
[293876.458566]  io_schedule+0x40/0x60
[293876.463711]  bit_wait_io+0x1c/0x70
[293876.468078]  __wait_on_bit+0x78/0xcc
[293876.472584]  out_of_line_wait_on_bit+0x8c/0xb4
[293876.478772]  __wait_on_buffer+0x3c/0x50
[293876.483552]  ext4_read_bh+0xd8/0xf0 [ext4]
[293876.488961]  ext4_read_bh_lock+0x5c/0xa0 [ext4]
[293876.495315]  ext4_bread+0x78/0xb0 [ext4]
[293876.500260]  __ext4_read_dirblock+0x5c/0x3c0 [ext4]
[293876.506904]  ext4_dx_find_entry+0x11c/0x1e4 [ext4]
[293876.512713]  __ext4_find_entry+0x3c4/0x410 [ext4]
[293876.519083]  ext4_lookup+0x1ac/0x2a0 [ext4]
[293876.524249]  __lookup_hash+0x80/0xd0
[293876.528864]  do_renameat2+0x264/0x49c
[293876.534280]  __arm64_sys_renameat+0x5c/0x70
[293876.539442]  invoke_syscall+0x4c/0x110
[293876.544188]  el0_svc_common.constprop.0+0xc8/0xf0
[293876.550498]  do_el0_svc+0x30/0xb0
[293876.554776]  el0_svc+0x14/0x4c
[293876.558825]  el0t_64_sync_handler+0x10c/0x120
[293876.564881]  el0t_64_sync+0x14c/0x150
[293876.575193] INFO: task kworker/1:2H:209260 blocked for more than 241 seconds.
[293876.583958]       Tainted: G         C  E      6.1-sunxi64 #1
[293876.590641] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[293876.600359] task:kworker/1:2H    state:D stack:0     pid:209260 ppid:2      flags:0x00000008
[293876.610372] Workqueue: kblockd blk_mq_run_work_fn
[293876.616147] Call trace:
[293876.619553]  __switch_to+0xc0/0x130
[293876.624784]  __schedule+0x388/0x994
[293876.628834]  schedule+0x54/0xdc
[293876.632890]  schedule_timeout+0x14c/0x180
[293876.637856]  __wait_for_common+0xe4/0x234
[293876.642816]  wait_for_completion+0x24/0x2c
[293876.647895]  mmc_wait_for_req_done+0x30/0xf4
[293876.653935]  mmc_wait_for_req+0xac/0xfc
[293876.658613]  mmc_wait_for_cmd+0x6c/0xb0
[293876.663387]  __mmc_send_status+0x7c/0xc0
[293876.669047]  mmc_blk_mq_rw_recovery+0x5c/0x3d0
[293876.674466]  mmc_blk_mq_poll_completion+0x7c/0x210
[293876.680249]  mmc_blk_rw_wait+0x11c/0x210
[293876.685898]  mmc_blk_mq_issue_rq+0x26c/0x8e0
[293876.691126]  mmc_mq_queue_rq+0x150/0x320
[293876.696786]  blk_mq_dispatch_rq_list+0x1b8/0x960
[293876.702361]  blk_mq_do_dispatch_sched+0x2e0/0x360
[293876.708021]  __blk_mq_sched_dispatch_requests+0x128/0x180
[293876.715135]  blk_mq_sched_dispatch_requests+0x3c/0x7c
[293876.721143]  __blk_mq_run_hw_queue+0x7c/0xb0
[293876.727165]  blk_mq_run_work_fn+0x24/0x2c
[293876.731955]  process_one_work+0x1e4/0x440
[293876.736825]  worker_thread+0x180/0x4a0
[293876.742163]  kthread+0xd8/0xe0
[293876.746202]  ret_from_fork+0x10/0x20

Make sure your kernel has following options enabled. CONFIG_DETECT_HUNG_TASK=y CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=120 To check use zcat /proc/config.gz

It would be interesting to see whether your backtrace is the same (or similar) as mine.

sorry-i-am-late commented 1 year ago
  1. Different batches may have different eMMC chips installed. Could you post information about yours for statistics.

This one is on my 1.2a (3/32) revision:

Mine is the ubports (2/16) my emmc is the same as shown in the original write up for this mod. I should have time to mess around with it more either tomorrow or Tuesday.

AndreySV commented 4 months ago
  1. Different batches may have different eMMC chips installed. Could you post information about yours for statistics.

This one is on my 1.2a (3/32) revision:

Mine is the ubports (2/16) my emmc is the same as shown in the original write up for this mod. I should have time to mess around with it more either tomorrow or Tuesday.

Could you post here technical details about eMMC chip in your modded PinePhone?

JFYI: the problem still happens for me with 6.6 on Mobian.