danielschwierzeck / u-boot-lantiq

17 stars 24 forks source link

What branch should I use/refer to? #19

Open danijelt opened 5 years ago

danijelt commented 5 years ago

There are multiple branches merged with master at various times, with various levels of Lantiq devices merged and different openwrtX version tags. Which one should be used to build latest stable release and on what branch should I start working on a new device?

danielschwierzeck commented 5 years ago

The latest and most stable branch is openwrt/2014.07. This one has full support for Danube, VRX200 and GRX330 SoC's incl. NAND booting. I also use this branch in production in my company so I can provide support. I think OpenWRT itself uses a modified patch set based on the 2013.10 branch. You can additionally look there too for more board support or other bugfixes. With the lantiq/upstream branch I'm still trying to create a mainline version of the Lantiq port. But due to mainline changes like Kconfig, driver-model, device-tree support etc. the core code needs some bigger rewrite and refactoring for which I hadn't enough time or motivation yet ;)

danijelt commented 5 years ago

Thanks. In that case, I guessed it right.

OpenWrt appears to use generic U-Boot 2013.10 with patches taken from this repository.

Do you have any idea why v2014.07 crashes on my board when trying tftpboot, while OpenWrt version works? It's Arcadyan VGV9510, and I use P2812 config. The only difference, as far as I can see, is the commit 07473f50f7 (add driver for ethernet and switch subsystem on Lantiq VRX200 SoC devices), but CONFIG_LANTIQ_VRX200_SWITCH isn't used on P2812. Or is there some other change that could cause this?

Using ltq-eth device
TFTP from server 192.168.1.2; our IP address is 192.168.1.1
Filename '0101A8C0.img'.
Load address: 0x81000000
Loading: *
Ooops:
Relocation offset: 7e7d000
$ 0   : 00000000 00000000 45000014 87fffa7c
$ 4   : 87ffbd8e c0a80157 c0a80101 00000c32
$ 8   : 87ffbd86 0000005a 00000023 87ffbdd6
$12   : 00000000 87b79a4b 00000000 00000004
$16   : 87ffbd8e 0000002c 00000045 00000c32
$20   : 87fffa9c 0000002c 00000045 00000000
$24   : 87b79a68 87fd15f8                  
$28   : 87ffc000 87b79c68 00000000 87fd16f0
Hi    : 00000000
Lo    : 00000005
epc   : 87fd1620 (original 80154620)
ra    : 87fd16f0 (original 801546f0)
Status: 10000002
Cause : 40008014 (ExcCode 05)
BadVA : ffffffff
PrId  : 00019556
### ERROR ### Please RESET the board ###
danielschwierzeck commented 5 years ago

Maybe it's related to upstream commit 704f3acfcf55343043bbed01c5fb0a0094a68e8a and newer gcc versions. OpenWRT already has this one backported.

danijelt commented 5 years ago

You're right, I somehow missed that one when comparing. I backported it to 2014.07 and it works now.

The only problem left is NAND operations. When I write anything to NAND with 2014.07 from this repository, it can be read correctly only by that same U-Boot.
Trying to write U-Boot to NAND causes bootloop, and older U-Boot releases (OpenWrt's RAM version) fail to read that data.
My guess is that it's related to ECC changes but I went through the files and ported CONFIG_SYS_NAND_ECC* variables and I also tried removing CONFIG_SYS_NAND_5_ADDR_CYCLE which was added here, but it didn't change anything.

On the other hand, 2014.07 boots fine when written to NAND with older U-Boot.

danielschwierzeck commented 5 years ago

Only in 2014.07 I have carefully chosen and verified all ECC settings. They are likely not correct in older versions. And they can differ from OpenWRT or original vendor settings. So you have to decide whether you want to keep the vendor settings or the ones from 2014.07. In the latter case I recommend to flash the U-Boot from the UART image. Otherwise you have to adapt your U-Boot board config to match the vendor settings. In both cases you also have to take care to use the exact same settings in Linux.

Regarding the CONFIG_SYSNAND* options: normally all settings are chosen by the driver based on the probe results. But for NAND SPL/TPL this is not possible due to size constraints and must be configured manually like erase block size, page size, OOB size and address cycles. Thus those settings are specific for your board and you need to choose the right ones.

For various platforms I'm using following settings:

danijelt commented 5 years ago

Do you have any production examples with ECC configuration? Arcadyan VRV9510 has MX30LF1G08AA NAND. Datasheet mentions "1-bit ECC per 528-byte" but it doesn't say anything about on-die ECC explicitly.

Earlier releases had ECC configuration hardcoded in arch/mips/include/asm/lantiq/config.h. 2014.07 does not, and copying that configuration from older releases doesn't help.

Regardless of ECC and RAM/SPL configuration, something is wrong because anything that is written by this U-Boot can't be read by anything else - Boot ROM, Linux or another U-Boot. Whatever I do, OOB remains empty (all FF), while ECC is usually kept at offset 40-63.

danijelt commented 5 years ago

Update: 1) Sorry, I didn't notice earlier that soft ECC config is now in lantiq_nand.c. 2) After tracing what lantiq_nand is doing, I found that, despite explicitly setting #define CONFIG_SPL_NAND_SOFTECC, driver calls ltq_nand_ecc_none_setup because NAND_HAS_ONDIE_ECC returns 1. I patched it to always call ltq_nand_ecc_soft_setup and now it works, but why is this happening in the first place?

Edit: I was too fast, now this U-Boot writes bad ECC data when writing SPL image to NAND. What's interesting is that ECC data for environment is OK (OpenWrt can read it), and it can read kernel that was flashed with OpenWrt's sysupgrade.

Mafketel commented 5 years ago

Regarding the CONFIG_SYSNAND* options: normally all settings are chosen by the driver based on the probe results. But for NAND SPL/TPL this is not possible due to size constraints and must be configured manually like erase block size, page size, OOB size and address cycles. Thus those settings are specific for your board and you need to choose the right ones.

Since you have to start with the uart boot first anyway, Is it maybe an idea to get that data from the device while booting to the uart u-boot and then write those values somewhere? before or after you flash the spl/tpl u-boot image in nand?

Mafketel commented 5 years ago

here is the diff of https://github.com/danielschwierzeck/u-boot-lantiq/commit/704f3acfcf55343043bbed01c5fb0a0094a68e8a to the 2014.7 tftpboot works afterwards

u-boot-headers-packed.zip

danielschwierzeck commented 5 years ago

@danijelt sorry that I missed your latest questions.

I checked the datasheet for MX30LF1G08AA. It is a 8 Bit ECC chip without on-die ECC. The problem of the auto-detection in nand_decode_ondie_ecc() is that it only compares ID[3] to 0x80. Maybe I should add an extra flag or config option to optionally disable this detection.

For a reliable usage of an 8 Bit ECC you'll actially need to use the BCH ECC algorithm. Also the first erase block is not on those chip types is not guaranteed anymore to be readable without ECC. But the VRX200 BootROM doesn't do any ECC correction but only reads the first page. If you're getting bitflips in that page, you're already doomed. So the only reliable solution for VRX200 is to use a chip with on-die ECC. That's what we did in my company and that's why no BCH support is currently implemented in the NAND SPL code. It supports only 1 Bit Hamming ECC, but those older chip types usually had that guarantees in the first erase blocks.

There are two solutions:

  1. you configure SPL, U-Boot and Linux to use 1 Bit Hamming ECC (NAND_ECC_SOFT) and disable the on-die ECC detection
    • the parameters for SPL should be

      define CONFIG_SYS_NAND_PAGE_COUNT 64

      define CONFIG_SYS_NAND_PAGE_SIZE 2048

      define CONFIG_SYS_NAND_OOBSIZE 64

      define CONFIG_SYS_NAND_BLOCK_SIZE (128 * 1024)

      define CONFIG_SYS_NAND_5_ADDR_CYCLE

    • neither define CONFIG_NAND_ECC_BCH nor CONFIG_NAND_ECC_NONE
  2. we add support for BCH ECC in the NAND SPL code and configure SPL, U-Boot and Linux to use that

I think Arcadyan (and therefore OpenWRT) chose the first option. If you use this, this U-Boot should be fully compatible with OpenWRT. But you should check if NAND_BBT_USE_FLASH is used in OpenWRT as this will overwrite the last four erase blocks. The disadvantage is, that this setup is not reliable in the long term, as you can only detect and correct 1 Bit errors instead of the required 8.

The second option requires some work. Although everything is already available in nand_bch.c, it's likely be tough to make this working in the constrained SPL context.