dgibson / dtc

Device Tree Compiler
215 stars 127 forks source link

fdt.c crashes at absoffset = offset + fdt_off_dt_struct(fdt); #119

Closed symdeb closed 9 months ago

symdeb commented 10 months ago

stm32mp157a-board-mx.zip STM32MP1 TF-A crash.zip I am using the ECO system 5.0 TF-A build environment from ST used for the STM32MP1. built the t-fa. When booting the board from SD card (or via USB using the Cubeprogrammer) TF-A crashed without a message.. Put tons of NOTICE messages in the code. Finally nailed it it down, It runs for 45 minutes before it stops in fdt.c. Repeated this 12 times, the point it stops is always the same (after DEBUG: fdt_offset_ptr 0); it always happen at the 3rd node for power (vin, vdd and then the 3rd) of the DT even the node is replaced with another one. Approached ST for direct customer support, but so far no answers/clues.

How do I identify which version of fdt.c ST is using ? ST removed the version file. Has this been, or is this a known issue, if yes how to workaround it ? What could be causes. Anything else to try.

Replaced the code in the ST tool chain with the one this repo and the same result. The did an attempt to remove the 3rd node (only 2 nodes) and the TF-A boot continued (!) to load OPTEE !

#include <common/debug.h>
const void *fdt_offset_ptr(const void *fdt, int offset, unsigned int len)
{
    unsigned int uoffset = offset;

    NOTICE ("DEBUG: fdt_offset_ptr 0\n");

    unsigned int absoffset = offset + fdt_off_dt_struct(fdt);

    NOTICE ("DEBUG: fdt_offset_ptr 1\n");`

DT:


    vin: vin {
        compatible = "regulator-fixed";
        regulator-name = "vin";
        regulator-min-microvolt = <5000000>;
        regulator-max-microvolt = <5000000>;
        regulator-always-on;
    };  

    vdd: vdd {
        compatible = "regulator-fixed";
        regulator-name = "vdd";
        regulator-min-microvolt = <3300000>;
        regulator-max-microvolt = <3300000>;
        regulator-always-on;
    };

    v3v3: regulator-3p3v {
        compatible = "regulator-fixed";
        regulator-name = "v3v3";
        regulator-min-microvolt = <3300000>;
        regulator-max-microvolt = <3300000>;
        regulator-always-on;
    };
Log:

replacing with latest  fdt repo code (no debug messages in fdt:

NOTICE:  DEBUG: fixed_regulator_register:80 regulator_register regulator-3p3v
NOTICE:  DEBUG:: register regulator-3p3v
NOTICE:  DEBUG: parse_dt
NOTICE:  DEBUG @ parse_dt: regulator-3p3v: parse dt
NOTICE:  DEBUG:fdt_get_phandle
NOTICE:  DEBUG: fdt_get_prop (phandle)
NOTICE:  DEBUG: fdt_get_prop_namelen
NOTICE:  fdt_get_property_namelen_
NOTICE:  DEBUG: fdt_first_property_offset
NOTICE:  DEBUG: fdt_check_node_offset_
NOTICE:  DEBUG: can assume
NOTICE:  DEBUG: fdt_next_tag
NOTICE:  DEBUG: fdt_nextag_1
NOTICE:  DEBUG: fdt_offset_ptr 0

After removing the 3rd node:

NOTICE:  DEBUG: fixed_regulator_register:80 regulator_register vdd
NOTICE:  DEBUG:: registe� vdd
NOTICE:  DEBUG: parse_dt
NOTICE:  DEBUG @ parse_dt: vdd: parse dt
NOTICE:  DEBUG:fdt_get_phandle
NOTICE:  DEBUG:fdt_getprop
NOTICE:  DEBUG:fdt32_to_cpu
NOTICE:  DEBUG: vdd: min_mv=3300
NOTICE:  DEBUG:fdt_getprop
NOTICE:  DEBUG:fdt32_to_cpu
NOTICE:  DEBUG: vdd: max_mv=3300
NOTICE:  DEBUG: regulator_list_voltages
NOTICE:  DEBUG: parse_propterties
NOTICE:  DEBUG: regulator_register done
NOTICE:  DEBUG: assert
NOTICE:  DEBUG: dt_pmic_status
NOTICE:  DEBUG: stm32mp1_syscfg_init
NOTICE:  DEBUG: stm32mp1_iwdg_init
NOTICE:  DEBUG: stm32mp1_iwdg_refresh
NOTICE:  DEBUG: bsec_read_debug_conf
NOTICE:  DEBUG: stm32mp1_arch_security_setup
NOTICE:  DEBUG: print_reset_reason
INFO:    Reset reason (0x14):
INFO:      Pad Reset from NRST
NOTICE:  DEBUG: stm32mp1_syscfg_enable_io_compensation_finish
NOTICE:  DEBUG: fconf_populate
INFO:    FCONF: Reading TB_FW firmware configuration file from: 0x2ffe2000
INFO:    FCONF: Reading firmware configuration information for: stm32mp_io
NOTICE:  DEBUG: stm32_skip_boot_device_after_standby
NOTICE:  DEBUG: stm32mp_io_setup
INFO:    Using SDMMC
INFO:      Instance 1
INFO:    Boot used partition fsbl1
NOTICE:  BL2: v2.8-stm32mp1-r1.0(debug):()
dgibson commented 10 months ago

STM32MP1 TF-A crash.zip I am using the ECO system 5.0 TF-A build environment from ST used for the STM32MP1. built the t-fa.

From context, I'm assuming the STM32MP1 is a specific board, and ST are its makers. I know nothing else specific about this. I'm not sure what a "TF-A" means either.

When booting the board from SD card (or via USB using the Cubeprogrammer) TF-A crashed without a message.. Put tons of NOTICE messages in the code. Finally nailed it it down, It runs for 45 minutes before it stops in fdt.c. Repeated this 12 times, the point it stops is always the same (after DEBUG: fdt_offset_ptr 0); it always happen at the 3rd node for power (vin, vdd and then the 3rd) of the DT even the node is replaced with another one. Approached ST for direct customer support, but so far no answers/clues.

That's really weird. As you can see that line is just an addition, which isn't an instruction which can cause a crash on most CPUs. It does load a value fdt_off_dt_struct(fdt) from the fdt header, which is a bit more likely to cause problems. However it sounds like it's not typically crashing on the first time here, so I'm not sure why a later call would fail.

Nonetheless, it's probably worth trying to add a temporary variable and load fdt_off_dt_struct(fdt) into it, then add a NOTICE then do the addition, to confirm which operation is causing the crash. Seeing the actual values of fdt and offset might help too.

How do I identify which version of fdt.c ST is using ? ST removed the version file.

If they've removed the version, I don't think there's any easy way to do this. The only approach I can think of is just to compare it to various versions of the upstream source and try to find a match.

Has this been, or is this a known issue, if yes how to workaround it ? What could be causes. Anything else to try.

No, certainly not a known issue - as noted, it's very odd behaviour. In fact I suspect the bug is not actually here, but something before the call which crashes corrupting either the fdt pointer or something in the fdt itself.

symdeb commented 10 months ago

Thanks for the reply . It's indeed a very weird issue but there is nothing I can actually change. the whole toolchain and source code is as provided by ST and the FDT is part it. TF-A is the name used for the FSBL https://trustedfirmware-a.readthedocs.io/en/latest/index.html. is there a way to compile the DT and decompile the DT with another standalone tool that uses the same FDT library to see it's reproduceable using the attched DT ? I also wonder if there was an invalid pointer and wanted to add a check, but what are correct and incorrect values ?

dgibson commented 10 months ago

Thanks for the reply . It's indeed a very weird issue but there is nothing I can actually change.

Uh... I'm slightly confused. If you were able to add the NOTICE commands you should also be able to rearrange the load/addition there, yes? What am I missing?

the whole toolchain and source code is as provided by ST and the FDT is part it. TF-A is the name used for the FSBL https://trustedfirmware-a.readthedocs.io/en/latest/index.html. is there a way to compile the DT and decompile the DT with another standalone tool that uses the same FDT library to see it's reproduceable using the attched DT ?

What form did the dt itself come in? Source (dts) or binary (dtb)? Or is it built into the firmware image somehow?

You can use dtc (git://git.kernel.org/pub/scm/utils/dtc/dtc.git) to compile and decompile device trees. It's not strictly speaking based on libfdt, but dtc and libfdt are part of the same upstream source tree, and they are cross-tested.

I also wonder if there was an invalid pointer and wanted to add a check, but what are correct and incorrect values ?

There isn't a simple answer to that - it depends on the memory map of the firmware. 0 or something unaligned is probably bad, but not necessarily, and those certainly aren't the only possibilities for a bad address. The value changing between one call to fdt_offset_ptr() and the next would also be a bad sign.

symdeb commented 10 months ago

I tried to add a 2nd root note in the DT, but even adding one additional node in it, the same crash occurs after

NOTICE: CPU: STM32MP157AAA Rev.B NOTICE: Model: STMicroelectronics STM32MP157A-DK1 Discovery Board

The problem this causes is that is is not possible to add additional regulator entries for the chipset code to enable the power rails. I can't tell which power rails TF-A really needs. Currently only added vdd (3.3v) and vdd-core (1.2v).

All ST board use an PMIC regulator, I2C based , so all regulator entries are in that node, So I had to modiy the DT from PMIC to fixed regulator nodes. Probably that why this issue was never found because this board is discrete.

This DT issue does not happen with OP-TEE and U-boot or kernel DT's, only with TF-A.

I created two source tree. one using CubeMX generated DTS and one using the157a-DK1 dts files. In both this issue occurs in TF-A DTS . Can't add more than two regulators in the root of the dts. Anyhow two regulator seems sufficient to get TF-A running. one for 1.2V vddcore (1.2V) and one forvdd ( 3V3.)

dgibson commented 9 months ago

Sorry, I can barely follow what you're talking about here. It really seems like this is an issue with the broader board support package rather than in libfdt itself.

symdeb commented 9 months ago

Sorry for that. its a hard failure, 100% reproduce-able with the ST ECO source package. Perhaps it is ST unique. Anyhow as long as there are not more than two regulator nodes in the TF-A root (for example vin and vdd and remove v3v3 as the example above ), the freeze does not happen. When I have time in the future will look at this further,