OpenNuvoton / NUC970_Linux_Kernel

Linux Kernel Source Code for NUC970 Series Microprocessor
Other
68 stars 69 forks source link

NUC970 Watchdog was fired when UBI do first scanning after programming PACK image. #53

Closed chenxy1988 closed 4 years ago

chenxy1988 commented 4 years ago

Hi NUC team,

Currenlty, our company using NUC970 for our product. We use a 256M flash as our memory. The 246MB using for user region in a UBI layer. As we know, ubi must scan the partitions for PEB and LEB during the partition first init. We found that when we erase whole nand flash and write PACK image to flash, the watchdog was fired due to UBI scanning spent lots of time, this issue won't see next reboot loop until erase whole flash and re-programming PACK image. This issue 100% reproduce, I used NUC970 linux kernel changed mtd partition only and NUC970 buildroot with latest commit in github.

Have you ever seen this issue before? How to solve or avoid this issue?

Thanks!

================================================================================ Below is my platform information & Kernel messages MTD partition:

root@(none)/root#cat /proc/mtd dev: size erasesize name mtd0: 00200000 00020000 "u-boot" mtd1: 00800000 00020000 "Kernel" mtd2: 0f600000 00020000 "user"

---------------Error message-----------------

[ 1.740000] UBI: scanning is finished [ 1.780000] gluebi (pid 1): gluebi_resized: got update notification for unknown UBI device 0 volume 0 [ 1.790000] UBI: volume 0 ("system") re-sized from 1322 to 1395 LEBs [ 1.800000] UBI: attached mtd2 (name "user", size 246 MiB) to ubi0 [ 1.800000] UBI: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes [ 1.810000] UBI: min./max. I/O unit sizes: 2048/2048, sub-page size 2048 [ 1.820000] UBI: VID header offset: 2048 (aligned 2048), data offset: 4096 [ 1.830000] UBI: good PEBs: 1964, bad PEBs: 4, corrupted PEBs: 0 [ 1.830000] UBI: user volume: 2, internal volumes: 1, max. volumes count: 128 [ 1.840000] UBI: max/mean erase counter: 1/0, WL threshold: 4096, image sequence number: 745765970 [ 1.850000] UBI: available PEBs: 0, total reserved PEBs: 1964, PEBs reserved for bad PEB handling: 36 [ 1.860000] UBI: background thread "ubi_bgt0d" started, PID 577 [ 1.880000] drivers/rtc/hctosys.c: unable to open rtc device (rtc0) [ 1.970000] UBIFS: background thread "ubifs_bgt0_0" started, PID 595 [ 2.010000] UBIFS: start fixing up free space [ 5.570000] UBIFS: free space fixup complete [ 5.610000] UBIFS: mounted UBI device 0, volume 0, name "system" [ 5.610000] UBIFS: LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes [ 5.630000] UBIFS: FS size: 161132544 bytes (153 MiB, 1269 LEBs), journal size 9023488 bytes (8 MiB, 72 LEBs) [ 5.630000] UBIFS: reserved for root: 0 bytes (0 KiB) [ 5.650000] UBIFS: media format: w4/r0 (latest is w4/r0), UUID 03E640EF-504B-4DB9-B63E-3E41859AE333, small LPT model [ 5.670000] VFS: Mounted root (ubifs filesystem) on device 0:11. [ 5.670000] devtmpfs: mounted [ 5.690000] Freeing unused kernel memory: 140K Starting logging: OK OK Starting mdev... nand_boot

U-Boot 2013.04-rc2 (Aug 05 2020 - 13:54:53)

CPU: NUC972 DRAM: 128 MiB NAND: 256 MiB MMC: mmc: 0 In: serial

chenxy1988 commented 4 years ago

I can provide some debug output in my HW environment if you want to add some code to debug.

Thanks,

chenxy1988 commented 4 years ago

I have tried another way, seems working. But still please share a NUC official generic solution with us. My solution: Setting 970 watchdog kernel driver as kernel modules, insert them in init.d/ script , put modules insert in user space that make sure ubi scanning finished and FS mounted. this issue won't see again. CONFIG_NUC970_WDT=m CONFIG_NUC970_WDT_WKUP=m CONFIG_NUC970_WWDT=m

Thanks, Xiangyu

yachen commented 4 years ago

Hi,

Since you can build the driver as module and install it after file system is mounted, I assume WDT is not enabled by power on setting? In order for NUC970's WDT to reset system properly, it must be enabled by power on setting. So please modify the power on setting to enable WDT at boot time.

I think one workaround is to start up a kernel timer to refresh WDT every few second (refer to probe() function in at91sam9_wdt.c). And once user level start to ping the dog, stop this kernel timer and hand over the ping task to user level.

Sincerely,

Yi-An Chen

chenxy1988 commented 4 years ago

Hi Yi'an,

Thanks for your fast response. Sorry for my bit understand messy, does " it must be enabled by power on setting." means enable watchdog in bootloader?

In my past comments, some description is very simple, let me add more information: We enabled watchdog in u-boot via environment setenv "watchdog on" After linux start, fs has been mountd, we have a deamon to feed watchdog cyclely .

Thanks, Xiangyu

yachen commented 4 years ago

Hi,

WDT can be enabled automatically after power on. This function is controlled by PA.3, please check TRM section 5.2.5. If WDT is enabled by software after bootup instead of power on setting, it cannot reset the system.

Sincerely,

Yi-An Chen

chenxy1988 commented 4 years ago

Hi Yi-An,

It's very strange that I checked the register value in bootloader before wdt driver init, the PA.3 already set to 1.

code as below //Before calling nuc970 wdt init

define REG_PWRON 0xB0000004

printf("## REG PWR ON value is 0x%x \n",readl(REG_PWRON));

result as below

REG PWR ON value is 0x20007fe

Best regards, Xiangyu

yachen commented 4 years ago

Hi,

There're build-in pull up resistors that will take effect during booting on power on setting pins. As long as you didn't add pull low resistor on PA.3, WDT function can work without problem.

So the issue now is UBI mount takes too long so user application cannot ping WDT in time and cause system reset right? Maybe you can connsider my suggestion to keep WDT working during kernel boot stage by adding kernel timer to ping WDT.

Sincerely,

Yi-An Chen

chenxy1988 commented 4 years ago

Hi Yi-An,

Ok, we will try that solution,thanks!

And regarding this behavior,I have another question: As the 3rd comment description, when I make the wdt driver build out kernel image and treat it as a driver module and load it after UBI fs mountd, the watchdog won't be fired(I also tried without loading wdt driver module, the system rebooted,it means BL's wdt config was working). But when the wdt driver build in kernel image, the watchdog would be fired. The watchdog feeder application under userspace is same place and same time to start and feed the dog. Does wdt driver in kernel changed/overwrite some register caused the configuration in BL lost efficacy?

Br, Xiangyu

yachen commented 4 years ago

Hi,

Consider kernel may take some time to boot up, 970's Uboot WDT driver sets timeout interval to 14 sec. But WDT driver in Linux set default timeout to 2 second in probe function, so once it's registered, WDT needs to be ping every 2 second.

Another option is to set a longer default timeout in probe function, and use application to set a proper timeout interval later.

Sincerely,

Yi-An Chen

chenxy1988 commented 4 years ago

Hi Yi-An,

I tried to change the default timeout to 9, the watchdog timeout behavior still have, the code as below: //proc function: 256 nuc970_wdd.timeout = 9; // default time out = 2 sec (2.03)
257 nuc970_wdd.min_timeout = 1; // min time out = 1 sec (0.53) 258 nuc970_wdd.max_timeout = 9; // max time out = 9 sec (8.03)

//global variable: static int heartbeat = 9; // default 2 second

I have another stupid question, According to the TRM document,the watchdog section,the timeout range is 0.48828125 ms ~ 8 s ,but in bootloader manual, the timeout is 14s, I haven't seen ever before where to metion the gap, could you help to explain the difference between two documents? Thanks!

WDT in TRM description image

WDT in BL description: image

Thanks! Xiangyu

yachen commented 4 years ago

WDT can have different clock source. If clock source is from 32K, max interval is about 8s, and while clock source is from PCLK/4096, the max interval is about 14s.

chenxy1988 commented 4 years ago

Hi,

Thanks for the info, Regarding this issue, will NUC team apply the solution to nuc series MCU driver for all customers or fix it by ourselves?

BTW, the final question without this topic, which way to submit the tickets would you or your team like? Github directly or NUC technical-support site? I learnt that it might related to the KPI, and anwser techinal questions might be out of you and your team's scope, so I would like you leave a way so that i would submite ticket next time.

Br, Xiangyu

yachen commented 4 years ago

Hi,

Ok, I'll update default timeout value. For other users need smaller timeout interval, they can always use ioctl() to change it. Posting software issues here is fine. But for hardware related issues, it's better post it on our supporting forum or contact FAE of our distributor.

Sincerely,

Yi-An Chen

chenxy1988 commented 4 years ago

Ok, Thanks for your great help, close this issue. Have a nice weekend ;)