clearlinux / distribution

Placeholder repository to allow filing of general bugs/issues/etc against the Clear Linux OS for Intel Architecture linux distribution
521 stars 29 forks source link

version 33070 update breaks Clear Linux boots to black frozen screen #1964

Open frostysnowman2 opened 4 years ago

frostysnowman2 commented 4 years ago

After upgrading to ver 33070 Clear Linux fails any fix for this issue?

miguelinux commented 4 years ago

@frostysnowman2, Can you help us to describe your platform?

bwarden commented 4 years ago

https://community.clearlinux.org/t/version-33070-problems/4560/3

lebensterben commented 4 years ago

There are also reports on IRC channel, and booting from old kernel won't work. A manual roll back to older OS version ( with swupd repair -m ) worked.

thiagomacieira commented 4 years ago

I rebooted 3 machines today to 33070 and saw no issue.

ahkok commented 4 years ago

There is at least one amdgpu system that confirmed this issue (IRC)

lebensterben commented 4 years ago

Also reported on the forum https://community.clearlinux.org/t/version-33070-problems/4560

inmanturbo commented 4 years ago

I had to roll back a bunch of nucs today to 33060. Skull Canyon NUC8i7HVC (Radeon RX Vega M GH Graphics)

ahkok commented 4 years ago
Product Name: NUC8i7HVK
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Polaris 22 XT [Radeon RX Vega M GH] [1002:694c] (rev c0)
5.6.11-949.native

I installed the 33070 kernel (949) on this system with 33040 and it boots without issues.

If the issue is with another part of the update, it's certainly the mesa update...

meeow commented 4 years ago

Confirming black screen issue with both MSI and Gigabyte RX 570 4GB, i9 7960x, build 33070

frostysnowman2 commented 4 years ago

@frostysnowman2, Can you help us to describe your platform?

AMD XFX Radeon R9 Fury series gpu Intel i5-6600K cpu on ASUS Z170 motherboard

LargePrime commented 4 years ago

VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga PRO [Radeon R9 285/380] (rev f1)

I think i was the reported IRC crash. swupd autoupdated automatically and crashed the ui.

jeremiah commented 4 years ago

Product Name: NUC8i7HVK
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Polaris 22 XT [Radeon RX Vega M GH] [1002:694c] (rev c0)
5.6.11-949.native

I'm on 5.6.12-950.native and 33080, but this bug is still present. NUC i7-8705G Radeon RX Vega M GL.

nidjan commented 4 years ago

Also have this problem. i7-6700 and MSI RX480. Now swupd repair to 33060 and swupd autoupdate disable . And wait

ahkok commented 4 years ago

Quick updates from the team: We were able to reproduce the issue, but it appears not all amdgpu systems are affected, which is complicating things a little bit. We have not yet identified whether reverting mesa or the amdgpu xorg driver fixes the issue, but this is the course we're taking. At the moment, there is not yet a release in progress with these bits reverted. Once it does, I'll post here.

Underlying the issue is an Xorg segfault at a time when the display is being initialized. This causes the system to then no longer be responsive (kbd/mouse wise). At the same time, SSH still functions and the kernel seems to be working properly.

You can at any time boot and add 3 to the kernel command line to avoid auto-starting Xorg.

wheerdam commented 4 years ago

I have this problem with 33090 (updated from 33050) with amdgpu and AMD RX 570 card, X fails to start but booting to multi-user.target worked. Here's the relevant stack trace from Xorg log:

[   231.851] (EE)
[   231.851] (EE) Backtrace:
[   231.853] (EE) 0: /usr/bin/X (OsSigHandler+0x2c) [0x55aadc5181ec]
[   231.856] (EE) 1: /usr/lib64/libpthread.so.0 (__funlockfile+0x70) [0x7f7ffd48ff9f]
[   231.932] (EE) unw_get_proc_name failed: no unwind info found [-10]
[   231.932] (EE) 2: /usr/lib64/xorg/modules/drivers/amdgpu_drv.so (?+0x0) [0x7f7ffcb8bfb0]
[   231.932] (EE) 3: /usr/bin/X (xf86CrtcCreateScreenResources+0x32) [0x55aadc3fcbf2]
[   231.932] (EE) 4: /usr/bin/X (dix_main+0x239) [0x55aadc37b579]
[   231.937] (EE) 5: /usr/lib64/haswell/libc.so.6 (__libc_start_main+0x102) [0x7f7ffd2a0362]
[   231.937] (EE) 6: /usr/bin/X (_start+0x2e) [0x55aadc35ef3e]
[   231.937] (EE)
[   231.937] (EE) Segmentation fault at address 0x10
[   231.937] (EE)
Fatal server error:
[   231.937] (EE) Caught signal 11 (Segmentation fault). Server aborting
[   231.937] (EE)
[   231.937] (EE)
ahkok commented 4 years ago

@wheerdam that's the same backtrace as we are seeing, thanks for the details.

ahkok commented 4 years ago

33100 will have the downgraded mesa bits. Please report your findings with any release equal or newer that number in here.

frostysnowman2 commented 4 years ago

33100 will have the downgraded mesa bits. Please report your findings with any release equal or newer that number in here.

It works the issue with my AMD R9 Fury gpu has gone away thanks for your quick fix!

inmanturbo commented 4 years ago

33100 will have the downgraded mesa bits. Please report your findings with any release equal or newer that number in here.

I can confirm success on 8 separate nucs:

Product Name: NUC8i7HVK
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Polaris 22 XT [Radeon RX Vega M GH]

We will definitely be staying on the rolling release, renewing our contract with our intel vendor and upgrading our nucs at the end of the year. Thanks team!

nidjan commented 4 years ago

Confirmed. Desktop with Polaris 10 now works.

ahkok commented 4 years ago

Thanks for confirming the fix is good. We now still have to figure out how to upgrade mesa without breaking this, so I'd like to keep this open until we complete that.

frostysnowman2 commented 4 years ago

Thanks for confirming the fix is good. We now still have to figure out how to upgrade mesa without breaking this, so I'd like to keep this open until we complete that.

Should we disable autoupdate until we get the go ahead from you re good upgrade of mesa just to be safe?

ahkok commented 4 years ago

You can re-enable auto update. We will make sure to properly test mesa update to v20 on the affected hardware before we push it out in an update.

LargePrime commented 4 years ago

works