andy-shev / linux

Linux kernel source tree
Other
25 stars 11 forks source link

Intel Edison does not work on Linux 4.7+ kernels #3

Closed amcduffee closed 7 years ago

amcduffee commented 7 years ago

Hi Andy,

I have attached the kernel config and dmesg log for a mainline 4.6.7 kernel that boots and has serial console output on Intel Edison.

I have attempted to use the attached config as a base for 4.7.0 and newer kernels, but can't seem to get the 8250 serial console to work.

Let me know if there is anything else I can do to help and thanks for taking a look at this!

-Anderson dmesg.txt edison_config.txt

andy-shev commented 7 years ago

What about kernel command line? Have you tried just standard one, i.e. console=ttyS2,115200n8 rootfstype=ramfs?

amcduffee commented 7 years ago

Yes, the kernel command line I have been using is: root=/dev/ram0 rw earlycon=uart8250,mmio,0xff010180,keep console=ttyS2,115200

With the above command line I get serial console output on a 4.6.7 kernel using the 'edison_config' from my original message. When I take the same 'edison_config' and use it as a base config for 4.7.0 and 4.8.0-rc8 kernels I no longer get any serial console output.

I just tested it and the same issue exists with your proposed command line: console=ttyS2,115200n8 rootfstype=ramfs

So, it looks like something happened from 4.6.7 -> 4.7.0 that might have broken serial console on Edison?

andy-shev commented 7 years ago

Looks like I may have a clue. At some point I had removed CONFIG_SERIAL_8250_MID=y from my config since it's on by default (follows CONFIG_SERIAL_8250 basically). Try to add that to _i386defconfig and recompile a kernel, or try the most recent one from my eds branch (And keep command line part as I pointed out). Besides that be sure that your kernel has this commit: https://github.com/andy-shev/linux/commit/47b34d2ef266e2c283b514d65c8963c2ccd42474

amcduffee commented 7 years ago

I have built a fresh 4.9-rc2 kernel from your 'eds' (21af512) branch. I used the i386_defconfig from the same branch and verified that CONFIG_SERIAL_8250_MID=y. Using your recommended kernel command line I still do not get any serial console output.

All of my attempts to get a post-mortem dump of __log_buf from the u-boot environment have so far failed. I use u-boot to load the kernel at 0x100000 and then boot it. After ~15 seconds, without serial console output, the Edison reboots and comes back to the u-boot environment. After the reboot I have observed that:

  1. 0x01000000 (phys_startup_32 in System.map) contains the ELF header of the kernel.
  2. 0x01cf2940 (__log_buf) doesn't contain anything resembling log entries.

I have attached the .config and System.map for the new 4.9-rc2 kernel. If you know any tricks or see where I might be making a mistake on extracting the post-mortem log please let me know!

System.map.txt config.txt

andy-shev commented 7 years ago

And what address you are using for initrd and how big is it?

amcduffee commented 7 years ago

The initrd that I am using is a 24MB cpio.gz loaded at 0x32000000. The initrd and load offset work just fine for a 4.6.7 mainline kernel and a 3.10.98 kernel built from the official Intel 'edison-linux.git' repository.

In an attempt to narrow down this problem much further I decided to go ahead and git bisect v4.6.7..v4.7 on the official 'linux-stable' repository since the break occurred between those kernel versions.

The result is a bit surprising but does explain why I don't get serial console output. The first bad commit from the bisect is 974f221 which is a change to how the kernel is decompressed in place. It occurred to me, after seeing this, that the issue was probably that I am selecting LZMA compression for my kernels and that 974f221 might have only been tested against GZIP. So, a quick check produced the following result:

  1. v4.6.7 full boot with functioning serial console with both LZMA and GZIP.
  2. v4.7 w/LZMA compression results in no serial console output. Decompression failure?
  3. v4.7 w/GZIP compression results in serial console output, but then I see the following stack trace due to an invalid OpCode which appears to be another unrelated issue:
Invalid Opcode (UnDefined Opcode)
EIP: 0008:[<00000000>] EFLAGS: 00010006
EAX: 01000000 EBX: 0000011e ECX: 00000002 EDX: 00000002
ESI: 00000000 EDI: 016321b0 EBP: 00000000 ESP: 01645e50
 DS: 0018 ES: 0018 FS: 0018 GS: 0018 SS: 0018
CR0: 00000011 CR2: 00000000 CR3: 00000000 CR4: 00000000
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Stack:
    0x01645e90 : 0x00000000
    0x01645e8c : 0x00000000
    0x01645e88 : 0x00000000
    0x01645e84 : 0x00000000
    0x01645e80 : 0x00000000
    0x01645e7c : 0x00000000
    0x01645e78 : 0x01645e96
    0x01645e74 : 0x016321b0
    0x01645e70 : 0x01633eb3
    0x01645e6c : 0x00e2b000
    0x01645e68 : 0x0062f76b
    0x01645e64 : 0x00000000
    0x01645e60 : 0x30303235
    0x01645e5c : 0x31312c32
    0x01645e58 : 0x53797474
    0x01645e54 : 0x01634f70
--->0x01645e50 : 0x00000000
    0x01645e4c : 0x00010006
    0x01645e48 : 0x00000008
    0x01645e44 : 0x00000000
### ERROR ### Please RESET the board ###
andy-shev commented 7 years ago

Thanks for information. I'm going to update my Edison board to newest official Yocto (in order to get newest possible U-Boot). And after I will try to look at this issue.

zyp commented 7 years ago

Hi, I ran into the same problem, bisected and ended up at the same commit that you did.

After experimenting with different load addrs without any luck, I tried CONFIG_RELOCATABLE=n, and now it's booting properly.

kees commented 7 years ago

Can you paste/attach your u-boot configuration? I seems like something is being placed somewhere the kernel isn't expecting and things are getting stomped on. And actually, can you attach the kernel image too? I want to double-check the header to make sure the kernel actually knows how large it is expecting to be after decompression, etc.

amcduffee commented 7 years ago

I am not sure exactly what you are asking for in terms of the u-boot configuration, but here is what is shown on the serial console at boot:

******************************
PSH KERNEL VERSION: b0182727
                WR: 20104000
******************************

SCU IPC: 0x800000d0  0xfffce92c

PSH miaHOB version: TNG.B0.VVBD.0000000c

microkernel built 23:15:13 Apr 24 2014

******* PSH loader *******
PCM page cache size = 192 KB 
Cache Constraint = 0 Pages
Arming IPC driver ..
Adding page store pool ..
PagestoreAddr(IMR Start Address) = 0x04899000
pageStoreSize(IMR Size)          = 0x00080000

*** Ready to receive application *** 

U-Boot 2014.04 (Aug 20 2014 - 16:08:32)

       Watchdog enabled
DRAM:  980.6 MiB
MMC:   tangier_sdhci: 0

Here is the 4.7.0 kernel image that doesn't decompress: bzImage32-4.7.zip

andy-shev commented 7 years ago

U-Boot configuration is everything you get in U-Boot console when run printenv command.

amcduffee commented 7 years ago

The u-boot environment is the default one that came with the Edison. I am currently just copying and pasting my own values on each boot until I get a stable configuration worth saving in u-boot.

I have attached the u-boot configuration anyway for reference. uboot_config.txt

andy-shev commented 7 years ago

@amcduffee, it can't be default since upstream kernel's (v4.1+) are using ttyS2 as a console. Besides that addresses as you pointed out are different. So, can you replace default (it doesn't make sense to look into it) with one you actually tried?

amcduffee commented 7 years ago

I think a bit of confusion might have developed here. I have left the default u-boot configuration in place so that I have something to fall back to. However, I am not using the default u-boot configuration to boot the upstream kernels.

From my last comment, I am currently just copying and pasting my own values on each boot until I get a stable configuration worth saving permanently in u-boot. The boot values I am inputting on every boot correspond to the ones discussed throughout this thread, but for brevity here they are:

setenv bootargs root=/dev/ram0 rw console=ttyS2,115200n8; printenv bootargs
ext4load mmc 0:a 0x100000 /bzImage32-4.6.7
ext4load mmc 0:a 0x32000000 /rootfs32.cpio.gz
zboot 0x100000 0 0x32000000 0x171cd94

The only value that changes is the bzImage that I want to boot (e.g. 4.6.7, 4.7.0, etc).

I think it is pretty clear at this point that there is NOT an issue with the serial console on 4.7.0+ kernels, as the bug report originally purported. Instead, as shown by the bisect, the issue is that the serial console is never initialized due to a possible decompression error when using LZMA for kernel compression. If GZIP compression is used instead then the kernel initializes the serial console and boots up to the point of encountering the 'Invalid Opcode' stack trace.

I think maybe this bug should be closed as invalid? Also, does it make sense to open new bugs regarding the following two points?

  1. The 'Invalid Opcode' stack trace.
  2. The issue related to commit 974f221 and using LZMA compression.
zyp commented 7 years ago

The stack trace is printed by u-boot, not the kernel, so it happens early enough that the kernel have not set up it's own fault handlers.

I've only tested GZIP decompression and I've seen both behaviors depending on which kernel build I'm trying to boot. I assume that's just a matter of whether the resulting code from the decompression error causes a deadlock or a fault first.

amcduffee commented 7 years ago

@zyp Hmm, that is an interesting point. I had thought after searching the upstream kernel source that the stack trace was coming from somewhere in arch/x86/kernel/, but after just now searching u-boot source as well it looks like you are probably right. Is there any way to know for sure that the stack trace comes from u-boot?

I've seen both behaviors depending on which kernel build I'm trying to boot.

I am not sure what you mean by this. When you say both behaviors are you referring to no serial console output and the stack trace? Also, how do your two builds differ?

andy-shev commented 7 years ago

I managed to update U-Boot on Edison board and load x86_64 kernel directly. Still can't reproduce the issue. @kees, do you have anything in mind?

kees commented 7 years ago

The only thing I can think of is that where the kernel was being loaded by u-boot was overlapping with something else. I don't see anything obvious. :(

andy-shev commented 7 years ago

So far I tried to boot your bzImage32-4.7 on original U-Boot. It fails. Kexec'ing works, so, kernel itself looks fine. Next what I did, I updated U-Boot (perhaps unnecessary step), added ignore_loglevel to the command line and boot it again with default (Ostro OS) environment. What I'm seeing here:

...
bootcmd_47=ext4load mmc 0:9 ${loadaddr} /bzImage32-4.7; setenv bootargs_console "console=ttyS2,115200n8"; setenv bootargs_debug "ignore_loglevel"; run mmc-bootargs; zboot ${loadaddr}
...
=> run bootcmd_47
5198368 bytes read in 140 ms (35.4 MiB/s)
Valid Boot Flag
Setup Size = 0x00003c00
Magic signature found
Using boot protocol version 2.0d
Linux kernel version 4.7.0-edison-77af84c8 (anderson@anderson-probook) #1 SMP PREEMPT Wed Sep 28 15:12:16 PDT 2016
Building boot_params at 0x00090000
Loading bzImage at address 100000 (5183008 bytes)
Magic signature found
Kernel command line: "rootwait root= rootfstype=ext4 console=ttyS2,115200n8 ignore_loglevel g_multi.ethernet_config=cdc systemd.unit=multi-user.target hardware_id=00 g_multi.iSerialNumber=2cdf1183f3d2f15b5b98ed0dcbdecbda g_multi.dev_addr=02:00:86:de:cb:da platform_mrfld_audio.audio_codec=dummy fsck.mode=skip"
Starting kernel ...
[    0.000000] Linux version 4.7.0-edison-77af84c8 (anderson@anderson-probook) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-159-g6e7c616) ) #1 SMP PREEMPT Wed Sep 28 15:12:16 PDT 2016
...
[    0.000000] Kernel command line: rootwait root= rootfstype=ext4 console=ttyS2,115200n8 ignore_loglevel g_multi.ethernet_config=cdc systemd.unit=multi-user.target hardware_id=00 g_multi.iSerialNumber=2cdf1183f3d2f15b5b98ed0dcbdecbda g_multi.dev_addr=02:00:86:de:cb:da platform_mrfld_audio.audio_codec=dummy fsck.mode=skip
...
[    2.887382] Waiting for root device ...
[    2.899899] mmc0: new DDR MMC card at address 0001
[    2.906492] mmcblk0: mmc0:0001 H4G1d 3.64 GiB
[    2.912507] mmcblk0boot0: mmc0:0001 H4G1d partition 1 4.00 MiB
[    2.918872] mmcblk0boot1: mmc0:0001 H4G1d partition 2 4.00 MiB
[    2.925207] mmcblk0rpmb: mmc0:0001 H4G1d partition 3 4.00 MiB
[    2.935150]  mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10

And after I get watchdog fired.

So, my conclusion that something wrong with your initrd, and serial apparently works.

UPDATE: Configured root parameter properly and get an output till systemd that fails to run getty by obvious reason (ttyMFD2 vs. ttyS2).

And even more...

Ostro OS 1.0+snapshot-20160930 edison ttyS2
edison login: root (automatic login)
************************************
*** This is a development image! ***
*** Do not use in production.    ***
************************************
[   94.254124] audit: type=1006 audit(1475210147.039:10): pid=1561 uid=0 subj=kernel old-auid=4294967295 auid=0 tty=(none) old-ses=4294967295 ses=1 res=1
root@edison:~# uname -a
Linux edison 4.7.0-edison-77af84c8 #1 SMP PREEMPT Wed Sep 28 15:12:16 PDT 2016 i686 GNU/Linux
andy-shev commented 7 years ago

So, based on the above, closing the issue for now. Feel free to open a new one with a more particular problem description (serial works, your kernel works).

amcduffee commented 7 years ago

Thanks for looking into this and providing an update. Can you clarify what version of U-Boot you mean by 'original U-Boot' because it might be the version I am still on. Also, what version of U-Boot did you upgrade to and was it built from your own Edison U-Boot repo here on Github?

andy-shev commented 7 years ago

@amcduffee, original U-Boot is dated back to v2014.04. The newest one I ported, and thus available in my tree here on GitHub, is v2016.11.

amcduffee commented 7 years ago

Thanks. I can now say with a high amount of certainty that the U-Boot version was the issue here.

I updated one of my Edisons to the newest v3.5 firmware and the U-Boot version stayed at v2014.04. After noticing this I went ahead and built the newest v2016.11 U-Boot from your GitHub repo and all kernels that were previously not booting are now providing serial console output and booting all the way into my initrd.

Simply put, it seems that 4.7+ upstream kernels don't work with the original v2014.04 U-Boot.

andy-shev commented 7 years ago

Ah, here is the discussion: https://lkml.org/lkml/2016/8/16/7