MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.26k stars 21.43k forks source link

CentOS 7.6 / xorg 1.20 segfaults #26014

Closed wandhydrant closed 5 years ago

wandhydrant commented 5 years ago

Hello! Currently after following the above instructions, the Xorg server will segfault, both for a NC6 nor for a NV6, when running on a CentOS 7.6 VM - either a VM that was 7.6 from the start, or a 7.4 (RogueWave "7-CI cloud-init") that has been updated to 7.6.

In all cases, the nvidia module loads well into the kernel, "nvidia-smi" output looks good, but I get the following traceback in the Xorg logs:

[ 856.434] (II) xfree86: Adding drm device (/dev/dri/card0) [ 856.434] (II) Platform probe for /sys/devices/LNXSYSTM:00/device:00/PNP0A03:00/device:08/VMBUS:01/47505500-0001-0000-3130-444531303244/pcidc52:00/dc52:00:00.0/drm/card0 [ 856.438] (EE) [ 856.438] (EE) Backtrace: [ 856.438] (EE) 0: /usr/bin/X (xorg_backtrace+0x55) [0x55f519e4c185] [ 856.438] (EE) 1: /usr/bin/X (0x55f519c9b000+0x1b4e09) [0x55f519e4fe09] [ 856.438] (EE) 2: /lib64/libpthread.so.0 (0x7f3479e3f000+0xf5d0) [0x7f3479e4e5d0] [ 856.438] (EE) 3: /usr/bin/X (0x55f519c9b000+0xb5d48) [0x55f519d50d48] [ 856.438] (EE) 4: /usr/bin/X (xf86BusProbe+0x9) [0x55f519d2a0b9] [ 856.438] (EE) 5: /usr/bin/X (InitOutput+0x718) [0x55f519d37d78] [ 856.438] (EE) 6: /usr/bin/X (0x55f519c9b000+0x601b0) [0x55f519cfb1b0] [ 856.438] (EE) 7: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x7f3479a943d5] [ 856.438] (EE) 8: /usr/bin/X (0x55f519c9b000+0x4a4ce) [0x55f519ce54ce] [ 856.438] (EE) [ 856.438] (EE) Segmentation fault at address 0x0 [ 856.438] (EE) Fatal server error: [ 856.439] (EE) Caught signal 11 (Segmentation fault). Server aborting

Between CentOS 7.5 and 7.6, Xorg has been upgraded from 1.19 to 1.20. But Nvidia drivers seem to take this into account since mid-2018 ? When using CentOS 7.5 (with fixed versions in all /etc/yum.repos.d/.repo) Xorg does* work ! When using CentOS 7.4, installing Xorg creates dependency errors with EPEL; EPEL only support the latest CentOS.

I don't know when the breakage occurred as 7.6 has been out for a couple of months now and is marked as supported in this document - I can't be the only person to run Xorg on it? By last night, a NC6 freshly started with the image "RogueWave CentOS 7.6" was unable to start Xorg.


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

mimckitt commented 5 years ago

Thanks for the feedback! We are currently investigating and will update you shortly.

Karishma-Tiwari-MSFT commented 5 years ago

@dlepow Would you be able to share your insights on this issue? Thanks :)

Karishma-Tiwari-MSFT commented 5 years ago

@wandhydrant I am reaching out to the product teams internally and will share once I have an update.

Karishma-Tiwari-MSFT commented 5 years ago

The product team has recommended that the way to prevent Xorg and other tools from failing with long PCI ids on RHEL 7.6 is to install the LIS download. Thanks for bringing this to our attention. We will close this issue for now. I will share once I have more updates on this here. If you have more questions, tag me in a comment, I will reopen it and we will gladly continue the discussion.

wandhydrant commented 5 years ago

@Karishma-Tiwari-MSFT I do not fully understand what you mean by "install the LIS download"; of course I did follow the step "Install the latest Linux Integration Services for Hyper-V and Azure.", "wget https://aka.ms/lis" etc. I also specified the PCI Bus ID for the NV6 instance type. Still, Xorg fails for me on 7.6, and it works on 7.5, always having followed the instructions to the letter.

Karishma-Tiwari-MSFT commented 5 years ago

@wandhydrant Thanks for sharing those details. I am talking internally to the product team and will let you know as soon as I hear back. :)

raidlman commented 5 years ago

@Karishma-Tiwari-MSFT Why is this issue closed? I still get the exact same erros as @wandhydrant .

Karishma-Tiwari-MSFT commented 5 years ago

@wandhydrant @raidlman I am still talking internally with the product team on this. Can you please share which version of LIS are you using? CC: @shirgall

wandhydrant commented 5 years ago

I kept a VM stopped to be able to answer such questions :-)

# rpm -qa | grep hyper
microsoft-hyper-v-4.2.8.1-20190205.x86_64
microsoft-hyper-v-debuginfo-4.2.8.1-20190205.x86_64
kmod-microsoft-hyper-v-4.2.8.1-20190205.x86_64

# rpm -qa | grep -i nvidia
nvidia-driver-418.39-4.el7.x86_64
nvidia-driver-libs-418.39-4.el7.x86_64
dkms-nvidia-418.39-1.el7.x86_64
nvidia-driver-cuda-418.39-4.el7.x86_64
nvidia-driver-cuda-libs-418.39-4.el7.x86_64
nvidia-driver-devel-418.39-4.el7.x86_64
nvidia-driver-NVML-418.39-4.el7.x86_64
nvidia-libXNVCtrl-418.39-1.el7.x86_64
nvidia-driver-NvFBCOpenGL-418.39-4.el7.x86_64
nvidia-modprobe-418.39-1.el7.x86_64
nvidia-settings-418.39-1.el7.x86_64
nvidia-xconfig-418.39-1.el7.x86_64
nvidia-libXNVCtrl-devel-418.39-1.el7.x86_64
nvidia-persistenced-418.39-1.el7.x86_64

I just re-installed LIS, which gave me a minor update from 4.2.8.1 to 4.2.8.2. Same result, Xorg still segfaults.

raidlman commented 5 years ago

For me it's the GRID version of the drivers that are not working and I'm testing with NV60 instance types. Please let me know if you need any further information.

OS version

# hostnamectl
   Static hostname: XYZ
         Icon name: computer-vm
           Chassis: vm
        Machine ID: 97da09219a2d42489c8b8f748e6d2fb7
           Boot ID: 4195bdf4ba264918958b57db5e813f4b
    Virtualization: microsoft
  Operating System: CentOS Linux 7 (Core)
       CPE OS Name: cpe:/o:centos:centos:7
            Kernel: Linux 3.10.0-957.5.1.el7.x86_64
      Architecture: x86-64
# cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)

LIS version

# rpm -qa | grep hyper
microsoft-hyper-v-debuginfo-4.2.8.2-20190220.x86_64
kmod-microsoft-hyper-v-4.2.8.2-20190220.x86_64
microsoft-hyper-v-4.2.8.2-20190220.x86_64

# ./NVIDIA-Linux-x86_64-grid.run --version
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.92....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

nvidia-installer:  version 410.92
  The NVIDIA Software Installer for Unix/Linux.

  This program is used to install, upgrade and uninstall The NVIDIA Accelerated Graphics Driver Set for Linux-x86_64.

Nvidia SMI output

# nvidia-smi
Mon Mar 18 12:09:20 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.92       Driver Version: 410.92       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M60           On   | 000033B8:00:00.0 Off |                  Off |
| N/A   32C    P8    14W / 150W |      0MiB /  8129MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

PCI bus address

# nvidia-xconfig --query-gpu-info | awk '/PCI BusID/{print $4}'
PCI:0@13240:0:0

xorg.conf (slimmed down)

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "Tesla M60"
    BusID          "PCI:0@13240:0:0"
    Option         "AllowEmptyInitialConfiguration"
EndSection

/var/log/Xorg.0.log

# cat /var/log/Xorg.0.log
[  3594.782]
X.Org X Server 1.20.1
X Protocol Version 11, Revision 0
[  3594.782] Build Operating System:  3.10.0-693.17.1.el7.x86_64
[  3594.782] Current Operating System: Linux gns-systems-minimal-vpn 3.10.0-957.5.1.el7.x86_64 #1 SMP Fri Feb 1 14:54:57 UTC 2019 x86_64
[  3594.782] Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-957.5.1.el7.x86_64 root=UUID=12907c8a-6b2f-4981-b94c-f3cd772270a7 ro console=tty1 console=ttyS0,115200n8 earlyprintk=ttyS0,115200 rootdelay=300 net.ifnames=0 LANG=en_US.UTF-8
[  3594.782] Build Date: 29 January 2019  06:03:26PM
[  3594.782] Build ID: xorg-x11-server 1.20.1-5.2.el7_6
[  3594.782] Current version of pixman: 0.34.0
[  3594.782]    Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
[  3594.782] Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[  3594.783] (==) Log file: "/var/log/Xorg.0.log", Time: Mon Mar 18 12:15:20 2019
[  3594.783] (==) Using config file: "/etc/X11/xorg.conf"
[  3594.783] (==) Using config directory: "/etc/X11/xorg.conf.d"
[  3594.783] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[  3594.784] (==) No Layout section.  Using the first Screen section.
[  3594.784] (**) |-->Screen "Screen0" (0)
[  3594.784] (**) |   |-->Monitor "<default monitor>"
[  3594.784] (**) |   |-->Device "Device0"
[  3594.784] (==) No monitor specified for screen "Screen0".
        Using a default monitor configuration.
[  3594.784] (==) Automatically adding devices
[  3594.784] (==) Automatically enabling devices
[  3594.784] (==) Automatically adding GPU devices
[  3594.784] (==) Automatically binding GPU devices
[  3594.784] (==) Max clients allowed: 256, resource mask: 0x1fffff
[  3594.784] (==) FontPath set to:
        catalogue:/etc/X11/fontpath.d,
        built-ins
[  3594.784] (==) ModulePath set to "/usr/lib64/xorg/modules"
[  3594.784] (II) The server relies on udev to provide the list of input devices.
        If no devices become available, reconfigure udev or disable AutoAddDevices.
[  3594.784] (II) Loader magic: 0x5608870a2020
[  3594.784] (II) Module ABI versions:
[  3594.784]    X.Org ANSI C Emulation: 0.4
[  3594.784]    X.Org Video Driver: 24.0
[  3594.784]    X.Org XInput driver : 24.1
[  3594.784]    X.Org Server Extension : 10.0
[  3594.784] (II) xfree86: Adding drm device (/dev/dri/card0)
[  3594.784] (II) Platform probe for /sys/devices/LNXSYSTM:00/device:00/PNP0A03:00/device:08/VMBUS:01/47505500-0001-0000-3130-444531334632/pci33b8:00/33b8:00:00.0/drm/card0
[  3594.788] (EE)
[  3594.788] (EE) Backtrace:
[  3594.788] (EE) 0: /usr/bin/X (xorg_backtrace+0x55) [0x560886e14185]
[  3594.788] (EE) 1: /usr/bin/X (0x560886c63000+0x1b4e09) [0x560886e17e09]
[  3594.788] (EE) 2: /lib64/libpthread.so.0 (0x7fb554596000+0xf5d0) [0x7fb5545a55d0]
[  3594.788] (EE) 3: /usr/bin/X (0x560886c63000+0xb5d48) [0x560886d18d48]
[  3594.788] (EE) 4: /usr/bin/X (xf86BusProbe+0x9) [0x560886cf20b9]
[  3594.788] (EE) 5: /usr/bin/X (InitOutput+0x718) [0x560886cffd78]
[  3594.788] (EE) 6: /usr/bin/X (0x560886c63000+0x601b0) [0x560886cc31b0]
[  3594.789] (EE) 7: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x7fb5541eb3d5]
[  3594.789] (EE) 8: /usr/bin/X (0x560886c63000+0x4a4ce) [0x560886cad4ce]
[  3594.789] (EE)
[  3594.789] (EE) Segmentation fault at address 0x0
[  3594.789] (EE)
Fatal server error:
[  3594.789] (EE) Caught signal 11 (Segmentation fault). Server aborting
[  3594.789] (EE)
[  3594.789] (EE)
Please consult the The X.Org Foundation support
         at http://wiki.x.org
 for help.
[  3594.789] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[  3594.789] (EE)
[  3594.789] (EE) Server terminated with error (1). Closing log file.

Xorg fails right after probing the device path. The path is available though:

# ls -la /sys/devices/LNXSYSTM\:00/device\:00/PNP0A03\:00/device\:08/VMBUS\:01/47505500-0001-0000-3130-444531334632/pci33b8\:00/33b8\:00\:00.0/drm/card0/
total 0
drwxr-xr-x. 3 root root    0 Mar 18 11:33 .
drwxr-xr-x. 4 root root    0 Mar 18 11:33 ..
-r--r--r--. 1 root root 4096 Mar 18 11:32 dev
lrwxrwxrwx. 1 root root    0 Mar 18 11:32 device -> ../../../33b8:00:00.0
drwxr-xr-x. 2 root root    0 Mar 18 11:33 power
lrwxrwxrwx. 1 root root    0 Mar 18 11:15 subsystem -> ../../../../../../../../../../../class/drm
-rw-r--r--. 1 root root 4096 Mar 18 11:32 uevent
Karishma-Tiwari-MSFT commented 5 years ago

Thanks, @wandhydrant @raidlman for sharing the logs and details on this issue. Our product team confirmed that they are aware of this issue and have a bug filed internally. I will keep you informed on the updates.

JonMarbach commented 5 years ago

Same issue here (same callstack in the Xorg log with RH7.6). I have to say I ran into similar issues about a year ago and became very discouraged on using NV+Linux+Azure. Someone on the product team needs to be regularly QC-ing this process as it seems to regress very frequently.

Karishma-Tiwari-MSFT commented 5 years ago

@wandhydrant @raidlman @JonMarbach Thanks for bringing this to our attention. The product team is working on getting this fixed. In the meantime, the workaround suggested is to use RHEL version 7.5. I will share here as and when there is an update on this.

raidlman commented 5 years ago

@Karishma-Tiwari-MSFT I just tried with RHEL7.5. Same problem here. Could you please check with the product team which exact version of the RHEL image they are using?

Could you please add a link to the bug tracker where this issue is tracked by the product team?

OS version

# hostnamectl
   Static hostname: XYZ
         Icon name: computer-vm
           Chassis: vm
        Machine ID: 4ba4debe48a14ef699ade86d1e7f0cfd
           Boot ID: d03ff16397b94456b27f770fb424bde7
    Virtualization: microsoft
  Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
       CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server
            Kernel: Linux 3.10.0-862.11.6.el7.x86_64
      Architecture: x86-64
You have new mail in /var/spool/mail/root
# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Xorg version

# rpm -qa | grep xorg-x11-server
xorg-x11-server-Xorg-1.20.1-5.3.el7_6.x86_64
xorg-x11-server-common-1.20.1-5.3.el7_6.x86_64
xorg-x11-server-utils-7.7-20.el7.x86_64

LIS version

# rpm -qa | grep hyper
kmod-microsoft-hyper-v-4.2.8.2-20190220.x86_64
microsoft-hyper-v-4.2.8.2-20190220.x86_64
microsoft-hyper-v-debuginfo-4.2.8.2-20190220.x86_64

NVidia version

# ./NVIDIA-Linux-x86_64-grid.run --version
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.92....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

nvidia-installer:  version 410.92
  The NVIDIA Software Installer for Unix/Linux.

  This program is used to install, upgrade and uninstall The NVIDIA Accelerated Graphics Driver Set for Linux-x86_64.

NVidia SMI output

# nvidia-smi
Fri Mar 22 11:12:12 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.92       Driver Version: 410.92       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M60           On   | 00001D3B:00:00.0 Off |                  Off |
| N/A   34C    P8    14W / 150W |      0MiB /  8129MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

PCI bus address

# nvidia-xconfig --query-gpu-info | awk '/PCI BusID/{print $4}'
PCI:0@7483:0:0

xorg.conf

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "Tesla M60"
    BusID          "PCI:0@7483:0:0"
    Option         "AllowEmptyInitialConfiguration"
EndSection

/var/log/Xorg.0.log

# cat /var/log/Xorg.0.log
[   225.938]
X.Org X Server 1.20.1
X Protocol Version 11, Revision 0
[   225.938] Build Operating System:  2.6.32-754.2.1.el6.x86_64
[   225.938] Current Operating System: Linux gns-systems-minimal-vpn 3.10.0-862.11.6.el7.x86_64 #1 SMP Fri Aug 10 16:55:11 UTC 2018 x86_64
[   225.938] Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-862.11.6.el7.x86_64 root=UUID=59d38512-270e-426a-91bc-670559a31c10 ro console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300
[   225.938] Build Date: 13 February 2019  01:35:02PM
[   225.938] Build ID: xorg-x11-server 1.20.1-5.3.el7_6
[   225.938] Current version of pixman: 0.34.0
[   225.938]    Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
[   225.938] Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[   225.938] (==) Log file: "/var/log/Xorg.0.log", Time: Fri Mar 22 11:07:23 2019
[   225.939] (==) Using config file: "/etc/X11/xorg.conf"
[   225.939] (==) Using config directory: "/etc/X11/xorg.conf.d"
[   225.939] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[   225.939] (==) No Layout section.  Using the first Screen section.
[   225.939] (==) No screen section available. Using defaults.
[   225.939] (**) |-->Screen "Default Screen Section" (0)
[   225.939] (**) |   |-->Monitor "<default monitor>"
[   225.940] (==) No device specified for screen "Default Screen Section".
        Using the first device section listed.
[   225.940] (**) |   |-->Device "Device0"
[   225.940] (==) No monitor specified for screen "Default Screen Section".
        Using a default monitor configuration.
[   225.940] (==) Automatically adding devices
[   225.940] (==) Automatically enabling devices
[   225.940] (==) Automatically adding GPU devices
[   225.940] (==) Automatically binding GPU devices
[   225.940] (==) Max clients allowed: 256, resource mask: 0x1fffff
[   225.940] (==) FontPath set to:
        catalogue:/etc/X11/fontpath.d,
        built-ins
[   225.940] (==) ModulePath set to "/usr/lib64/xorg/modules"
[   225.940] (II) The server relies on udev to provide the list of input devices.
        If no devices become available, reconfigure udev or disable AutoAddDevices.
[   225.940] (II) Loader magic: 0x558fd56c1020
[   225.940] (II) Module ABI versions:
[   225.940]    X.Org ANSI C Emulation: 0.4
[   225.940]    X.Org Video Driver: 24.0
[   225.940]    X.Org XInput driver : 24.1
[   225.940]    X.Org Server Extension : 10.0
[   225.940] (II) xfree86: Adding drm device (/dev/dri/card0)
[   225.940] (II) Platform probe for /sys/devices/LNXSYSTM:00/device:00/PNP0A03:00/device:08/VMBUS:01/47505500-0001-0000-3130-444531334632/pci1d3b:00/1d3b:00:00.0/drm/card0
[   225.944] (EE)
[   225.944] (EE) Backtrace:
[   225.944] (EE) 0: /usr/bin/X (xorg_backtrace+0x55) [0x558fd54330f5]
[   225.944] (EE) 1: /usr/bin/X (0x558fd5282000+0x1b4d79) [0x558fd5436d79]
[   225.944] (EE) 2: /lib64/libpthread.so.0 (0x7f0fed870000+0xf5d0) [0x7f0fed87f5d0]
[   225.944] (EE) 3: /usr/bin/X (0x558fd5282000+0xb5d48) [0x558fd5337d48]
[   225.944] (EE) 4: /usr/bin/X (xf86BusProbe+0x9) [0x558fd53110b9]
[   225.944] (EE) 5: /usr/bin/X (InitOutput+0x718) [0x558fd531ed78]
[   225.944] (EE) 6: /usr/bin/X (0x558fd5282000+0x601b0) [0x558fd52e21b0]
[   225.944] (EE) 7: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x7f0fed4c53d5]
[   225.944] (EE) 8: /usr/bin/X (0x558fd5282000+0x4a4ce) [0x558fd52cc4ce]
[   225.944] (EE)
[   225.944] (EE) Segmentation fault at address 0x0
[   225.944] (EE)
Fatal server error:
[   225.944] (EE) Caught signal 11 (Segmentation fault). Server aborting
[   225.945] (EE)
[   225.945] (EE)
Please consult the The X.Org Foundation support
         at http://wiki.x.org
 for help.
[   225.945] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[   225.945] (EE)
[   225.945] (EE) Server terminated with error (1). Closing log file.

Device path

# ls -la /sys/devices/LNXSYSTM\:00/device\:00/PNP0A03\:00/device\:08/VMBUS\:01/47505500-0001-0000-3130-444531334632/pci1d3b\:00/1d3b\:00\:00.0/drm/card0
total 0
drwxr-xr-x. 3 root root    0 Mar 22 11:05 .
drwxr-xr-x. 4 root root    0 Mar 22 11:05 ..
-r--r--r--. 1 root root 4096 Mar 22 11:05 dev
lrwxrwxrwx. 1 root root    0 Mar 22 11:05 device -> ../../../1d3b:00:00.0
drwxr-xr-x. 2 root root    0 Mar 22 11:05 power
lrwxrwxrwx. 1 root root    0 Mar 22 11:04 subsystem -> ../../../../../../../../../../../class/drm
-rw-r--r--. 1 root root 4096 Mar 22 11:05 uevent
wandhydrant commented 5 years ago

@raidlman I think you may have started out with RHEL 7.5 but then you installed the latest Xorg packages (from 7.6). On 7.5 you should have Xorg 1.19.5, not 1.20.

To stay inside 7.5, I force the version inside the repository URLs like this:

find /etc/yum.repos.d -name '*.repo' -print0 | xargs -0 sed -i 's/$releasever/'7.5.1804/
yum clean all
raidlman commented 5 years ago

@wandhydrant Thanks for the hint. CentOS 7.5 is working now with the fixed releasever.

Karishma-Tiwari-MSFT commented 5 years ago

@raidlman Thanks for sharing the update. The bug is in internal Azure DevOps but I am tracking it as product team make progress there. I will keep you informed. @wandhydrant Thank you for helping out. I will keep this issue closed but will be sharing updates here.

acastaing commented 5 years ago

@Karishma-Tiwari-MSFT do you have any news ? This issue is causing to our customers a lot of issues (we started using our support plan but we were redirected here) Thank you for the feedback and eventually a fix date ?

Karishma-Tiwari-MSFT commented 5 years ago

@acastaing I am trying to get an update on this from the product team and will share it here. In the meantime,

the workaround shared by the product team is to use RHEL version 7.5.

dcui commented 5 years ago

This is a bug in the Xorg shipped in RHEL 7.6: https://bugzilla.redhat.com/show_bug.cgi?id=1665433

Karishma-Tiwari-MSFT commented 5 years ago

We currently do not have an ETA on this. We are relying on a bug fix in Redhat (link shared above). We will update the thread as we get a timeline on the fix for that.

milazzom commented 5 years ago

I have a customer using RHEL 7.6 in Azure (using the marketplace image) that is having the same issue.

Karishma-Tiwari-MSFT commented 5 years ago

@milazzom This is a bug in the Xorg shipped in RHEL 7.6: https://bugzilla.redhat.com/show_bug.cgi?id=1665433 The workaround shared by the product team is to use RHEL version 7.5.

milazzom commented 5 years ago

This is affecting existing VMs, so I'm not sure they can just revert to RHEL 7.5. I'll get involved on the RedHat side and see what's going on with getting a fix for RHEL 7.6.

Karishma-Tiwari-MSFT commented 5 years ago

This bug is being tracked under a new Bugzilla item. https://bugzilla.redhat.com/show_bug.cgi?id=1704513