beagleboard / linux

The official Read Only BeagleBoard and BeagleBone kernel repository https://git.beagleboard.org/beagleboard/linux
http://beagleboard.org/source
Other
722 stars 565 forks source link

Over 1 minute boot on any 4.x RT on PocketBeagle #168

Closed patrickpoirier51 closed 6 years ago

patrickpoirier51 commented 6 years ago

This issue has been addressed but never resolved

Here is an example with 4_14 , the file system is mounted @ 17.7 [ 17.749711] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. O

Here is the same setup with 4_9 RT the file system is mounted @ 63.06 [ 63.061824] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)

The same SD card (with 4_9 RT) installed on a BBBlue; the file system is mounted @ 9.41 [ 9.412861] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. O pts: (null)

jadonk commented 6 years ago

Bug is confusing. 4.9 RT is slow? In what instances? What are the software and hardware deltas?

RobertCNelson commented 6 years ago

yes 4.14.x should be faster then 4.9.x, as this was a big mmc mainline change in 4.13.x or so...

patrickpoirier51 commented 6 years ago

Sorry for mixing releases , but 4_14 RT is not faster . I will upload timing later today, but it is more than 60 seconds

patrickpoirier51 commented 6 years ago

Here is the result 62.5 seconds

Linux beaglebone 4.14.40-bone-rt-r14 #1 PREEMPT RT Fri May 11 08:14:22 UTC 2018 armv7l GNU/Linux [ 62.523230] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)

RobertCNelson commented 6 years ago

Yeah, looks like RT performance of the mmc sucks:

uname -r ; dmesg | grep 'mounted filesystem'

4.14.39-ti-r47
[   14.660326] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)

4.14.39-ti-r47
[   14.665146] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)

4.14.39-ti-r47
[   14.672096] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)

4.14.39-ti-rt-r47
[   71.331787] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)

4.14.39-ti-rt-r47
[   70.419849] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)

Regards,

patrickpoirier51 commented 6 years ago

I dont think its related to mmc. Take the card and insert it in a BBBlue = You will get 9 seconds (as the example above) They have the same memory circuitry and almost the same system architecture both on the Blue and the Pocket.

RobertCNelson commented 6 years ago

Those runs where the same hardware/mmc, just between reboots and the two kernel versions..

Another observation: while cross building a new kernel, the initrd was missing on first bootup..

beagle@beaglebone:~$ uname -r ; dmesg | grep 'mounted filesystem'
4.14.40-ti-rt-r48
[    1.764171] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)

beagle@beaglebone:~$ sudo update-initramfs -ck `uname -r`
[sudo] password for beagle: 
update-initramfs: Generating /boot/initrd.img-4.14.40-ti-rt-r48

4.14.40-ti-rt-r48
[   73.123701] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)

Regards,

RobertCNelson commented 6 years ago

CONFIG_PREEMPT_RT_FULL=y

4.14.40-ti-rt-r48
[   72.719747] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)

CONFIG_PREEMPT_RTB=y

4.14.40-ti-rt-r48
[   15.231329] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)
patrickpoirier51 commented 6 years ago

Oh yes :+1: What would we be missing if we downgrade from FULL to RTB ?

RobertCNelson commented 6 years ago
config PREEMPT_RTB
    bool "Preemptible Kernel (Basic RT)"
    select PREEMPT_RT_BASE
    help
      This option is basically the same as (Low-Latency Desktop) but
      enables changes which are preliminary for the full preemptible
      RT kernel.

config PREEMPT_RT_FULL
    bool "Fully Preemptible Kernel (RT)"
    depends on IRQ_FORCED_THREADING
    select PREEMPT_RT_BASE
    select PREEMPT_RCU
    help
      All and everything

@jadonk what's your thoughts, on dropping to RTB?

RobertCNelson commented 6 years ago

Some more comparsions:

4.16.8-bone-rt-r10
[  112.855089] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)
4.14.40-bone-rt-r14
[   44.803267] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)
4.9.99-bone-rt-r10
[   42.525493] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)

PREEMPT_RT_FULL seems to be getting worse..

Regards,

patrickpoirier51 commented 6 years ago

Robert, Do you still getting 62.2% cpu load with top == 62.2 0.0 35:31.06 ktimersoftd ? Might resolve this as well ? https://github.com/beagleboard/linux/issues/166 ?

RobertCNelson commented 6 years ago

i'll have a deb file for testing shortly.. ;)

RobertCNelson commented 6 years ago

@patrickpoirier51 try this:

http://rcn-ee.online/builds/linux-image-4.14.40-ti-rt-r48.1_1cross_armhf.deb

Regards,

patrickpoirier51 commented 6 years ago

4.14.40-ti-rt-r48.1 [ 16.657148] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null) And TOP is normal And the arducopter programs starts Getting closer to a fully working PocketPilot :-)

Thanks Robert

RobertCNelson commented 6 years ago

awesome, i'll have a new image in a few hours with that kernel by default..

patrickpoirier51 commented 6 years ago

Great , I'll test when available and report

patrickpoirier51 commented 6 years ago

Well,
I cant make it work properly. A) The system is extremely slow (even is cpuclock @ 1000Mhz) current CPU frequency is 1000 MHz.

B) Cannot get the ecap running and its mapped properly config-pin -q p2_18 P2_18 Mode: pru_ecap PRU out put seems to work OK

C) some errors unhandled fault while the ArduCopter program is running beaglebone login: [ 34.331887] Unhandled fault: external abort on non-linefetch (0x1818) at 0xb6f28000 [ 34.339608] pgd = dc604000 [ 34.342327] [b6f28000] pgd=9c5d7831, pte=4a322343, ppte=4a322833 [ 37.255909] Unhandled fault: external abort on non-linefetch (0x1818) at 0xb6f4e000 [ 37.263629] pgd = db43c000 [ 37.266348] [b6f4e000] pgd=9ac78831, pte=4a322343, ppte=4a322833 [ 40.201485] Unhandled fault: external abort on non-linefetch (0x1818) at 0xb6f33000 [ 40.209205] pgd = dc604000 [ 40.211924] [b6f33000] pgd=9c676831, pte=4a322343, ppte=4a322833 [ 43.257497] Unhandled fault: external abort on non-linefetch (0x1818) at 0xb6fcd000 [ 43.265217] pgd = dc618000 [ 43.267936] [b6fcd000] pgd=9c5d9831, pte=4a322343, ppte=4a322833 [ 46.300052] Unhandled fault: external abort on non-linefetch (0x1818) at 0xb6fbd000 [ 46.307774] pgd = dc604000 [ 46.310493] [b6fbd000] pgd=9b455831, pte=4a322343, ppte=4a322833 [ 49.115576] Unhandled fault: external abort on non-linefetch (0x1818) at 0xb6f45000 [ 49.123298] pgd = dc61c000 [ 49.126018] [b6f45000] pgd=9c676831, pte=4a322343, ppte=4a322833 [ 51.821591] Unhandled fault: external abort on non-linefetch (0x1818) at 0xb6f8b000 [ 51.829312] pgd = dc6a4000 [ 51.832031] [b6f8b000] pgd=9c5c2831, pte=4a322343, *ppte=4a322833

jadonk commented 6 years ago

A) Can you run 'top' or anything else to see where the CPU cycles are going? Is this before you start running anything? How are you logging in?

B) Did you have pru_ecap working with any older kernel/device-tree? Did you confirm you are using the correct pin based on the PocketPilot pin?

C) Seems you are accessing a peripheral who doesn't have clocks enabled or isn't mapped properly. https://e2e.ti.com/support/embedded/linux/f/354/t/49197?Unhandled-fault-external-abort-on-non-linefetch-0x1018-at-0x4002100c-Bus-error has some info on the type of error. I'm not sure which of the numbers corresponds to the physical address being reached. I'd need to search the Linux source for that debug statement to see what the variables mean.

patrickpoirier51 commented 6 years ago

A) Running top show arducopter using twice the power (70-80%) as other rt kernel (30-40%), so for the same program its using mostly all the ressources.

B)Yes, there is oubviously something broken here.

C) Please note that this happens now on 4_14 as well, there is something broken on this release since a few days... I had to revert back to an older 4_14 rt to make the PRU start and not having this mapping error

jadonk commented 6 years ago

@RobertCNelson Do we have a way to profile the RT performance under "RTB"? I think it would be very reasonable to try.

jadonk commented 6 years ago

@patrickpoirier51 can you perform a bisect to identify the offending patch?

patrickpoirier51 commented 6 years ago

AS I reported on May 20 , Both option are bad The Full RT takesover 90 seconds The stripped down is unstable and is overloading the CPU

Please note that the latest kernel releases have broken the PRU on ardupilot, I have opened an issue on this with no response from Robert

patrickpoirier51 commented 6 years ago

This is the issue I am referring above https://github.com/RobertCNelson/dtb-rebuilder/issues/29

RobertCNelson commented 6 years ago

@jadonk good point, i'll break out the "RT" patchset into it's individual patches and test..

RobertCNelson commented 6 years ago

@patrickpoirier51 & @jadonk

Just to document the fun of RT... I see why RT isn't in mainline..

Currently 414 patches..

So far i've been able to test the first 250 and see boot-up times vary around 18~20 seconds... So yay nothing broken in the first half..

Now the second half (just built up to 301), pretty much every patch either add's a new opps or warning on bootup so to actually figure out if the next patch broke something is difficult..

I'll keep digging...

https://github.com/RobertCNelson/ti-linux-kernel-dev/blob/ti-linux-rt-4.14.y/patches/rt-patchset/series#L279

Regards,

patrickpoirier51 commented 6 years ago

Wrote on OCTAVO forum about this issue https://octavosystems.com/forums/topic/slow-bootup-on-preemptible-kernel-rt/

Here is the answer: Unfortunately, this is probably a software issue related to the boot process. The hardware differences between the OSD335x on the BeagleBone Blue and the OSD335x-SM on the PocketBeagle are small (the OSD335x-SM integrates the EEPROM on I2C0 and reduces the size of the devices). The differences in hardware would not account for the 4x difference in boot times you are seeing.

Our speculation is that there is an issue in the boot procedure whereby the board ID value for the PocketBeagle is not enumerated properly in a some case statement. This in turn causes some kind of time out, or alternate execution path which takes much longer than the execution path for the BeagleBone Blue, which is properly enumerated. Our suggestion would be to understand the startup entire Linux boot procedure for the PocketBeagle vs the BeagleBone Blue. If you start with the BeagleBone Blue, it should be straight forward to understand when the PocketBeagle does something different on boot.

As a test, you can re-write the EEPROM on the PocketBeagle with the value from the BeagleBone Blue, but still use the PocketBeagle device tree. If you ground the “WP” test point on the back of the PocketBeagle, you can update the value of the EEPROM. See our app note (https://octavosystems.com/app_notes/osd335x-eeprom-during-boot/) for more information about how to program the EEPROM.

What do you think of this ? Best Regards

RobertCNelson commented 6 years ago

Nah, it's just cape-universal + RT that slows things down.. If you disable cape-universal (which currently can't be disabled on the PocketBeagle in my default images) things speed up, plus v4.14.x after rt and the random hwrgn shanges things are just slow.. v4.19.x-rt just laned.. will have something to test tomorrow..

https://mirrors.edge.kernel.org/pub/linux/kernel/projects/rt/4.19/

Regards,

patrickpoirier51 commented 6 years ago

ok will test

patrickpoirier51 commented 6 years ago

..