damentz / liquorix-package

Liquorix Debian Package
https://liquorix.net
GNU General Public License v2.0
288 stars 23 forks source link

Latest update kernel not booting #29

Closed bubobih closed 4 years ago

bubobih commented 4 years ago

Hello latest update im stuck at booting

https://i.imgur.com/UMXvbJc.png

bubobih commented 4 years ago

After googling i find that add nosmt in grub and regenerate kernel fix a problem

damentz commented 4 years ago

Ok, thanks for the update. Here's the changelog for the last update:

linux-liquorix (5.6-4) unstable; urgency=medium

It could be that the change back to tickless idle (due to fixes in MuQSS v0.199/0.200), may be causing issues with your system.

When you do get a chance, can you see if nothreadirqs also solves your boot issues? My experience is that threaded IRQs cause issues on tickless idle systems, but not periodic tick (in the previous configuration). If this solves your boot issues I'll disable forced IRQ threading.

bubobih commented 4 years ago

thats all info that i can see from kvm... line "nosmt " currently fix and dedi work normal

damentz commented 4 years ago

Did you just update from a 5.5 kernel or have you been doing incremental updates? Just trying to understand if this was caused by a recent change or not. It seems to me though from what you're saying that it's an outstanding bug in kernel 5.6 that some systems can't boot without nosmt.

Do you have a link to the source that says to try nosmt?

bubobih commented 4 years ago

i use regular update via apt get so i have lates before this one.

bubobih commented 4 years ago

I cant now find it i google some error from that page and find that and try it

bubobih commented 4 years ago

it latest release same problem, also my hyperthreading is off.... i will recheck bios to see if is bios problem

bubobih commented 4 years ago

yap or nosmt or latest kernel disable ht i have only 8 cores.... currently i use old one till u solve it :)

damentz commented 4 years ago

The latest kernel I put out yesterday restores periodic tick + tickless idle. Can you verify if you need nosmt to boot?

bubobih commented 4 years ago

Yes i need still to use nosmt to boot and he show only 8 threads not 16.

damentz commented 4 years ago

Ok, so this sounds like a bug introduced with an upstream stable patch. I'd say as long as you don't need the extra threads, nosmt is fine. Especially if it's an Intel CPU, every vulnerability that comes out makes SMT less effective.

bubobih commented 4 years ago

I use old one i need threads :) no hurry u will fix it some day :)

bubobih commented 4 years ago

After your few updates i decide to try again. this is what i can read over kvm

https://i.imgur.com/FpyynYD.jpg https://i.imgur.com/l8QPlyL.jpg

https://i.imgur.com/eDSpkUG.jpg https://i.imgur.com/1H0l4rq.jpg

mrc4tt commented 4 years ago

Hi, @damentz

I have a new problem with the booting of AMD Ryzen CPU.

the server is stuck booting of: Version: 5.6.0-13.2-liquorix-amd64

RDRAND gives funky smelling output, might consider not using it by booting with "nordrand" chrome_wGcqWpSrXG

damentz commented 4 years ago

Hi @MikkelDK

I think you're having two issues. One is that you may need a bios and/or microcode update to solve the RDRAND issues. Or you can follow the instructions to disable nordrand.

According to this reddit thread, they also recommend installing haveged to force the system to generate entropy: https://www.reddit.com/r/archlinux/comments/f2qhyt/rdrand_gives_funky_smelling_output/

The rdrand check was initially added for buggy rdrand implementations on old AMD systems. But surprisingly, Ryzen processors are affected too if you have a bad BIOS and/or old microcode version.

As for your system not booting, that may be caused by one of the release candidate patches. I checked this morning and Greg just dropped "drm/amdgpu: bump version for invalidate L2 before SDMA IBs" this morning. Probably unrelated but considering we don't need the stable patches, I'll just revert them all and spin a new version.

mrc4tt commented 4 years ago

Hi @damentz

Danke! =) I will do push "update BIOS" or install haveged! =)

damentz commented 4 years ago

@MikkelDK how is the latest kernel working on your system, are you able to boot?

@bubobih I disabled Pressure Stall Information framework (CONFIG_PSI). I saw a report on the CK blog that enabling it caused boot issues a persons system. Not sure if it's system specific, but can you see if the latest kernel boots correctly without requiring nosmt?

bubobih commented 4 years ago

same problem....

jonathonf commented 4 years ago

Also still having issues with a system running Ubuntu 18.04 on a Ryzen 2700X (with the latest available BIOS).

5.6.0-10.1 boots fine, 5.6.0-10.2 through 5.6.0-13.5 fail to boot.

damentz commented 4 years ago

@jonathonf thanks for the feedback, I'm starting to wonder if it's caused by the cgroup options I enabled.

Next release I'll disable the CGROUP options I enabled as part of this thread:

https://techpatterns.com/forums/about2796.html - Is it possible to enable cgroup-related config?

It's also important to note, the person who requested these options be enabled wasn't interesting in running Liquorix long term, so I don't think anyone will care if they're turned off again.

bubobih commented 4 years ago

i still have same problem with latest one

damentz commented 4 years ago

@bubobih the only thing left for me to revert is the actual MuQSS v200 update itself, but this update fixes a lot of time accounting issues that have plagued MuQSS for the longest time.

One thing that did change is the code that detects runqueue sharing. Can you try booting with rqshare=mc or rqshare=all? Also, historically MuQSS has had issues on some systems with less run queues and rqshare=smt was required to boot at all.

bubobih commented 4 years ago

i didnt try because its pain to request kvm and then broke live full dedi then again fixing it every few days :) i do it few times but customers complain about downtime.... so currently i didnt try what u suggest

damentz commented 4 years ago

@bubobih so you didn't try with the change that disabled the PIDS/RDMA cgroups? I can revert it again but I restored the configuration assuming it didn't change your boot behavior.

Or are you referring to the rqshare boot options?

bubobih commented 4 years ago

I didnt try this .

One thing that did change is the code that detects runqueue sharing. Can you try booting with rqshare=mc or rqshare=all? Also, historically MuQSS has had issues on some systems with less run queues and rqshare=smt was required to boot at all

bubobih commented 4 years ago

Hello, today i try to use latest kernel again, no any addon its same problem i cant boot...

i even change hardware on datacenter i set

rqshare=mc - nothing same problem rqshare=all - nothing same problem

bbedward commented 4 years ago

I'm failing to boot as well, ubuntu 20.04 with ryzen 3800x on latest package from PPA.

If you need any more information let me know.

thubble commented 4 years ago

I don't use Liquorix but my own custom kernel which includes MuQSS. I have a very similar issue with a hang/error on boot unless I disable forced IRQ threading ("nothreadirqs" on kernel command line). I know this was suggested in an earlier comment but it wasn't clear whether anyone tried it.

jonathonf commented 4 years ago

nothreadirqs

There are a couple of references to this elsewhere.

Firstly, here it appeared to be an issue with USB/EHCI (though that's a low-latency kernel 3.13).

Secondly, with the ck patchset they may have made nothreadirqs the default because it can cause issues? Might it be worth asking CK if that's still the case with 5.6-ck2?

damentz commented 4 years ago

@jonathonf / @thubble thanks for the feedback. I'll try disabling forced IRQ threads for the next release.

bbedward commented 4 years ago

The latest release resolved the issue for me (5.6-19), works out of the box. Thanks damentz!

bubobih commented 4 years ago

I can confirm that finaly this kernel boot :)

damentz commented 4 years ago

Great! Marking as resolved.