enclustra-bsp / bsp-xilinx

26 stars 14 forks source link

Upgrading to v2020.1 kernel, not starting. #27

Closed markus-k closed 4 years ago

markus-k commented 4 years ago

I'm trying to upgrade the supplied Linux-kernel to the current v2020.1 upstream version, but having trouble booting it. I need the newer version for some updated drivers and APIs, and would like to take advantage of the Vitis device-tree generator (whichs output needs fixing to work with this current one). I'm on a Mars XU3 (with ST3 base-board). The kernel built by the Enclustra BSP has no problems booting.

All attempts just resulted in the console output stopping at "Starting kernel...". Using a debugger I was able to verify that the kernel actually starts, but doesn't create any console output, and sometimes crashes after a few seconds depending on the used device-tree. Network doesn't seem to get up at any point, and I wasn't able to figure out where the crash happens exactly.

I attempted this in two ways: (1) Merging the xilinx-v2020.1 tag into the current Enclustra kernel: This has quite a few merge conflicts, some of them I don't know how to solve correctly (e.g. mdio conflicts). Even when it compiles, the behaviour described above occurs. (2) Cloning the original linux-xlnx repo, copying the config from the Enclustra BSP and rerunning the menuconfig. This doesn't work either.

Comparing the supplied device-trees from the Enclustra- and Xilinx-Linux repositories shows a fair amount of changes, but none of them occured as a problematic one for me so far.

Do you have any advice on the best way of updating the kernel, or know of any breaking changes between the Enclustra-Kernel and Xilinx-Kernel that could lead to this behaviour? I spend quite a lot of time on this so far, and haven't gotten it to boot properly.

I have only used the Enclustra U-Boot with it's supplied device-tree so far, maybe thats a problem?

Thank you!

tholzsche commented 4 years ago

Hi

It's actually quite simple and I've just tested it. Basically the macb driver was modified to access two phys via one mdio interface. Therefore only minimal changes are necessary (macb-driver and device-tree) e.g.:

  1. git clone https://github.com/enclustra-bsp/xilinx-linux
  2. cd xilinx-linux
  3. git pull https://github.com/Xilinx/linux-xlnx xilinx-v2020.1
  4. resolve conficts 4.1. fix "./include/linux/mtd/spi-nor.h" (remove duplicate prototype "spi_nor_shutdown(struct spi_nor *nor)") 4.2. fix device-tree (replace "&clk" with "&zynqmp_clk") 4.3 macb ...

But if it hangs this early it most likely has nothing to do with the linux kernel itself and I would suspect a bug in the firmware.

Greets Till

markus-k commented 4 years ago

After some more debugging I found that when replacing earlyprintk with earlycon in the U-Boot def_args Variable, console output finally works. The crash seems to be caused by some device-tree issue, but that'll be easy to fix.

tholzsche commented 4 years ago

Okay, thanks for sharing.

markus-k commented 4 years ago

So, I'm not sure if it's really a device tree issue, probably not. The clock-controller driver seems to allocate too much memory for some reason (size > KMALLOC_MAX_SIZE). With the failed clock-controller driver, nothing else works obviously:

[    3.751985] zynqmp_firmware_probe Platform Management API v1.1
[    3.757715] zynqmp_firmware_probe Trustzone version v1.0
[    3.766277] zynqmp-pinctrl firmware:zynqmp-firmware:pinctrl: zynqmp pinctrl initialized
[    3.771100] WARNING: CPU: 2 PID: 1 at mm/slab_common.c:1031 kmalloc_slab+0x60/0x68
[    3.778468] Modules linked in:
[    3.781499] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.19.0-00001-ge38978d140d4-dirty #5
[    3.789625] Hardware name: xlnx,zynqmp (DT)
[    3.793780] pstate: 20000005 (nzCv daif -PAN -UAO)
[    3.798537] pc : kmalloc_slab+0x60/0x68
[    3.802346] lr : __kmalloc+0x18/0x110
[    3.805974] sp : ffffff800803b6f0
[    3.809261] x29: ffffff800803b6f0 x28: 0000000000000007 
[    3.814538] x27: 0000000000000000 x26: 0000000000000000 
[    3.819814] x25: 0000000000000000 x24: ffffff8008f7fea0 
[    3.825091] x23: ffffff8009058960 x22: 0000000000000000 
[    3.830367] x21: ffffff8008f7fea0 x20: ffffffc06c89f410 
[    3.835644] x19: 00000000006080c0 x18: ffffffffffffffff 
[    3.840920] x17: 000000002fa88302 x16: 0000000000000000 
[    3.846197] x15: ffffff8008f58648 x14: ffffffc06c94470a 
[    3.851473] x13: ffffffc06c944709 x12: 0000000000000030 
[    3.856750] x11: 0000000000000003 x10: 0101010101010101 
[    3.862026] x9 : fffffffffffffffa x8 : 7f7f7f7f7f7f7f7f 
[    3.867303] x7 : 0000000000000000 x6 : 0000000000000000 
[    3.872579] x5 : 0000000000000000 x4 : 0000000000000000 
[    3.877856] x3 : 0000000000000000 x2 : 0000000000000000 
[    3.883133] x1 : 00000000006080c0 x0 : 00000007fffffff8 
[    3.888410] Call trace:
[    3.890834]  kmalloc_slab+0x60/0x68
[    3.894297]  zynqmp_clock_probe+0x90/0x490
[    3.898362]  platform_drv_probe+0x50/0xa0
[    3.902338]  really_probe+0x228/0x3e0
[    3.905970]  driver_probe_device+0x68/0x158
[    3.910123]  __device_attach_driver+0xac/0x178
[    3.914537]  bus_for_each_drv+0x68/0xd0
[    3.918340]  __device_attach+0xd8/0x160
[    3.922146]  device_initial_probe+0x10/0x18
[    3.926299]  bus_probe_device+0x94/0xa0
[    3.930105]  device_add+0x428/0x650
[    3.933568]  of_device_add+0x38/0x48
[    3.937112]  of_platform_device_create_pdata+0xb8/0x120
[    3.942301]  of_platform_bus_create+0x15c/0x508
[    3.946799]  of_platform_populate+0x94/0x130
[    3.951041]  zynqmp_firmware_probe+0x194/0x3a0
[    3.955449]  platform_drv_probe+0x50/0xa0
[    3.959428]  really_probe+0x228/0x3e0
[    3.963061]  driver_probe_device+0x68/0x158
[    3.967213]  __driver_attach+0x124/0x140
[    3.971107]  bus_for_each_dev+0x74/0xc8
[    3.974911]  driver_attach+0x20/0x28
[    3.978459]  bus_add_driver+0x1f8/0x288
[    3.982265]  driver_register+0x60/0x110
[    3.986071]  __platform_driver_register+0x40/0x48
[    3.990744]  zynqmp_firmware_driver_init+0x1c/0x24
[    3.995502]  do_one_initcall+0x5c/0x180
[    3.999308]  kernel_init_freeable+0x148/0x1f0
[    4.003633]  kernel_init+0x10/0x100
[    4.007091]  ret_from_fork+0x10/0x1c
[    4.010638] ---[ end trace 1d3f3982cd24c89e ]---
[    4.015251] zynqmp_clock: probe of firmware:zynqmp-firmware:clock-controller failed with error -12
[    4.024645] usbcore: registered new interface driver usbhid
[    4.029672] usbhid: USB HID core driver
[    4.034495] ARM CCI_400_r1 PMU driver probed
[    4.034846] zynqmp_fpga_manager firmware:zynqmp-firmware:pcap: failed to to get pcp ref_clk (-517)
[    4.047470] pktgen: Packet Generator for packet performance testing. Version: 2.75
[    4.054627] Initializing XFRM netlink socket
[    4.058482] NET: Registered protocol family 10
[    4.063273] Segment Routing with IPv6
[    4.066542] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[    4.072734] NET: Registered protocol family 17
[    4.076739] NET: Registered protocol family 15
[    4.081154] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[    4.094081] can: controller area network core (rev 20170425 abi 9)
[    4.100219] NET: Registered protocol family 29
[    4.104584] can: raw protocol (rev 20170425)
[    4.108820] can: broadcast manager protocol (rev 20170425 t)
[    4.114446] can: netlink gateway (rev 20170425) max_hops=1
[    4.119967] Bluetooth: RFCOMM TTY layer initialized
[    4.124745] Bluetooth: RFCOMM socket layer initialized
[    4.129862] Bluetooth: RFCOMM ver 1.11
[    4.133564] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[    4.138836] Bluetooth: BNEP filters: protocol multicast
[    4.144030] Bluetooth: BNEP socket layer initialized
[    4.148957] Bluetooth: HIDP (Human Interface Emulation) ver 1.2
[    4.154841] Bluetooth: HIDP socket layer initialized
[    4.159937] 9pnet: Installing 9P2000 support
[    4.164029] Key type dns_resolver registered
[    4.169029] registered taskstats version 1
[    4.172318] Loading compiled-in X.509 certificates
[    4.177466] Btrfs loaded, crc32c=crc32c-generic
[    4.183483] zynq-gpio ff0a0000.gpio: input clock not found.
[    4.188254] xilinx-vdma 80040000.dma: failed to get axi_aclk (-517)
[    4.193574] xilinx-zynqmp-dma fd500000.dma: main clock not found.
[    4.199574] xilinx-zynqmp-dma fd510000.dma: main clock not found.
[    4.205623] xilinx-zynqmp-dma fd520000.dma: main clock not found.
[    4.211681] xilinx-zynqmp-dma fd530000.dma: main clock not found.
[    4.217735] xilinx-zynqmp-dma fd540000.dma: main clock not found.
[    4.223791] xilinx-zynqmp-dma fd550000.dma: main clock not found.
[    4.229842] xilinx-zynqmp-dma fd560000.dma: main clock not found.
[    4.235901] xilinx-zynqmp-dma fd570000.dma: main clock not found.
[    4.241982] xilinx-zynqmp-dma ffa80000.dma: main clock not found.
[    4.248013] xilinx-zynqmp-dma ffa90000.dma: main clock not found.
[    4.254063] xilinx-zynqmp-dma ffaa0000.dma: main clock not found.
[    4.260120] xilinx-zynqmp-dma ffab0000.dma: main clock not found.
[    4.266173] xilinx-zynqmp-dma ffac0000.dma: main clock not found.
[    4.272234] xilinx-zynqmp-dma ffad0000.dma: main clock not found.
[    4.278285] xilinx-zynqmp-dma ffae0000.dma: main clock not found.
[    4.284340] xilinx-zynqmp-dma ffaf0000.dma: main clock not found.
[    4.290471] zynqmp-qspi ff0f0000.spi: pclk clock not found.
[    4.296629] macb ff0e0000.ethernet: failed to get macb_clk (4294966779)
[    4.303915] zynqmp_gpd_attach_dev() domain7 request failed for node 22: -13
[    4.309248] dwc3-of-simple ff9d0000.usb0: failed to add to PM domain domain7: -13
[    4.316697] dwc3-of-simple: probe of ff9d0000.usb0 failed with error -13
[    4.323858] cdns-i2c ff020000.i2c: input clock not found.
[    4.328939] cdns-i2c ff030000.i2c: input clock not found.
[    4.334516] xilinx-video amba_pl@0:video_cap: /amba_pl@0/video_cap/ports/port@0 initialization failed
[    4.343240] xilinx-video amba_pl@0:video_cap: DMA initialization failed
[    4.350663] sdhci-arasan ff160000.mmc: clk_ahb clock not found.
[    4.356252] sdhci-arasan ff170000.mmc: clk_ahb clock not found.
[    4.362404] zynqmp_fpga_manager firmware:zynqmp-firmware:pcap: failed to to get pcp ref_clk (-517)
[    4.371506] zynq-gpio ff0a0000.gpio: input clock not found.
[    4.376482] xilinx-vdma 80040000.dma: failed to get axi_aclk (-517)
[    4.382458] xilinx-zynqmp-dma fd500000.dma: main clock not found.
[    4.388488] xilinx-zynqmp-dma fd510000.dma: main clock not found.
[    4.394539] xilinx-zynqmp-dma fd520000.dma: main clock not found.
[    4.400596] xilinx-zynqmp-dma fd530000.dma: main clock not found.
[    4.406650] xilinx-zynqmp-dma fd540000.dma: main clock not found.
[    4.412705] xilinx-zynqmp-dma fd550000.dma: main clock not found.
[    4.418758] xilinx-zynqmp-dma fd560000.dma: main clock not found.
[    4.424816] xilinx-zynqmp-dma fd570000.dma: main clock not found.
[    4.430871] xilinx-zynqmp-dma ffa80000.dma: main clock not found.
[    4.436924] xilinx-zynqmp-dma ffa90000.dma: main clock not found.
[    4.442980] xilinx-zynqmp-dma ffaa0000.dma: main clock not found.
[    4.449035] xilinx-zynqmp-dma ffab0000.dma: main clock not found.
[    4.455089] xilinx-zynqmp-dma ffac0000.dma: main clock not found.
[    4.461145] xilinx-zynqmp-dma ffad0000.dma: main clock not found.
[    4.467199] xilinx-zynqmp-dma ffae0000.dma: main clock not found.
[    4.473254] xilinx-zynqmp-dma ffaf0000.dma: main clock not found.
[    4.479370] zynqmp-qspi ff0f0000.spi: pclk clock not found.
[    4.485482] macb ff0e0000.ethernet: failed to get macb_clk (4294966779)
[    4.492344] cdns-i2c ff020000.i2c: input clock not found.
[    4.496811] cdns-i2c ff030000.i2c: input clock not found.
[    4.502371] xilinx-video amba_pl@0:video_cap: /amba_pl@0/video_cap/ports/port@0 initialization failed
[    4.511138] xilinx-video amba_pl@0:video_cap: DMA initialization failed
[    4.518501] sdhci-arasan ff160000.mmc: clk_ahb clock not found.
[    4.524113] sdhci-arasan ff170000.mmc: clk_ahb clock not found.
[    4.530260] zynqmp_fpga_manager firmware:zynqmp-firmware:pcap: failed to to get pcp ref_clk (-517)
[    4.539432] zynq-gpio ff0a0000.gpio: input clock not found.
[    4.546791] rtc_zynqmp ffa60000.rtc: setting system clock to 2020-06-20 13:08:41 UTC (1592658521)
[    4.552916] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[    4.690835] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
[    4.691737] ALSA device list:
[    4.694650]   No soundcards found.
[    4.698062] Warning: unable to open an initial console.
[    4.703285] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[    4.707451] Waiting for root device /dev/mmcblk0p2...
[    4.711790] cfg80211: failed to load regulatory.db

The debugger won't break in this driver for some reason, so I have no idea how to debug this. My best guess is that zynqmp_pm_clock_get_num_clocks returns some crazy high number, but I can't see how this could be a firmware issue.

markus-k commented 4 years ago

Turns out Vitis is one buggy giant mess not creating proper PMU firmware. Creating a separate project for the PMU firmware solved this problem.