NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source
Other
15.17k stars 1.27k forks source link

NVIDIA I2C driver issues #41

Open CalcProgrammer1 opened 2 years ago

CalcProgrammer1 commented 2 years ago

Copy/paste from an email I sent to linux-bugs@nvidia.com, I haven't verified this against this new open driver, but it has been an issue for a while with the proprietary kernel driver.

I am the lead developer of OpenRGB, an open source RGB lighting control application for Windows and Linux. Our goal is to create a universal RGB control app, talking directly to as many RGB lighting devices as possible. As RGB control is often only supported by official software in Windows, Linux users get left out. That's where we come in.

As you are probably aware, a lot of graphics cards have built-in RGB lighting that features software control. Most cards implement RGB control using the GPU's I2C interface. We're facing an issue controlling certain RGB devices over the NVIDIA GPU's I2C interface in Linux with the proprietary NVIDIA driver. The same code is working fine using NvAPI on Windows and using the Nouveau I2C implementation on Linux, so we believe this to be an issue specific to NVIDIA's proprietary Linux driver.

The cards we've been focused on lately are the ASUS 3xxx series cards, which all use a similar I2C RGB chip that is also found on some ASUS motherboards and various manufacturers' RGB DRAM modules. The chip comes from ENE. The I2C protocol used by this chip is a 16-bit address scheme where you first write a 16-bit address to the 0x00 register of the ENE chip, then perform either a read or write operation to a fixed chip register. This chip appears to be an SMBus chip, so we're using the SMBus functions in the Linux kernel as shown in these two accessor functions:

(ene_dev_id is the 8-bit I2C address of the ENE chip, ene_register is the 16-bit address in the chip that we are reading or writing)

unsigned char ENESMBusInterface_i2c_smbus::ENERegisterRead(ene_dev_id dev, ene_register reg)
{
    //Write ENE register
    bus->i2c_smbus_write_word_data(dev, 0x00, ((reg << 8) & 0xFF00) | ((reg >> 8) & 0x00FF));

    //Read ENE value
    return(bus->i2c_smbus_read_byte_data(dev, 0x81));
}

void ENESMBusInterface_i2c_smbus::ENERegisterWrite(ene_dev_id dev, ene_register reg, unsigned char val)
{
    //Write ENE register
    bus->i2c_smbus_write_word_data(dev, 0x00, ((reg << 8) & 0xFF00) | ((reg >> 8) & 0x00FF));

    //Write ENE value
    bus->i2c_smbus_write_byte_data(dev, 0x01, val);
}

The issue here appears to be with regards to i2c_smbus_write_word_data. To detect and verify that the chip exists on the bus, we do a series of i2c_smbus_read_byte_data calls and these all work fine on the NVIDIA proprietary Linux driver. However, after detecting the chip we then write a 16-bit address and attempt to read from it. Specifically, we try to read a region of the ENE chip's memory known to contain a version string. The expectation is that the word data write puts the 16-bit address of the byte we want to read into the chip, and the following byte data read from 0x81 returns one byte from the 16-bit address.

With other SMBus host controllers (Intel and AMD chipsets) as well as the NVIDIA GPU I2C on both Windows (NvAPI) and Linux (Nouveau), this works fine and we successfully retrieve the ENE RGB controller's version string. With the NVIDIA proprietary Linux driver, we read garbage. Since we know the i2c_smbus_read_byte_data function works with other manufacturers' NVIDIA GPU boards and for detecting the chip, I can only assume the issue is that the i2c_smbus_write_word_data function isn't working correctly. Note that i2c_smbus_write_byte_data does appear to work on several other manufacturers' GPU RGB chips so I have to assume it's specific to word data.

We have also observed issues with SMBus block operations, though doing the same block operations using I2C_RDWR ioctl (thus avoiding the SMBus layer) seem to work on the NVIDIA proprietary driver.

owenmylotte commented 2 years ago

I /think/ we have block commands working now, though it didn't make it into the 515 (or, now 520) release series. It should be in the release series after 520, though I can't give a firm schedule.

This is great news! I've been following this thread for months. Thank you for looking into this.

atta2022 commented 2 years ago

Any news? Tried to apply openrgb on my rig with 6 cards, 4 of them works, 2 not recognised.

sudo openrgb -l Attempting to connect to local OpenRGB server. Connection attempt failed Local OpenRGB server unavailable. Running standalone. [i2c_smbus_linux] Failed to read i2c device PCI device ID [i2c_smbus_linux] Failed to read i2c device PCI device ID 0: Gigabyte RTX3060 EAGLE OC 12G V2 Type: GPU Description: RGB Fusion GPU Location: I2C: /dev/i2c-23, address 0x63 Modes: [Direct] Breathing Flashing 'Dual Flashing' 'Color Cycle' 'Spectrum Cycle' Zones: 'GPU Zone' LEDs: 'GPU LED'

1: Gigabyte RTX3060 EAGLE OC 12G V2 Type: GPU Description: RGB Fusion GPU Location: I2C: /dev/i2c-18, address 0x63 Modes: [Direct] Breathing Flashing 'Dual Flashing' 'Color Cycle' 'Spectrum Cycle' Zones: 'GPU Zone' LEDs: 'GPU LED'

2: ASUS TUF RTX 3080Ti O12G GAMING Type: GPU Description: ENE SMBus Device Version: AUMA0-E6K5-0107 Location: I2C: /dev/i2c-28, address 0x67 Modes: Direct [Off] Static Breathing Flashing 'Spectrum Cycle' Rainbow 'Chase Fade' Chase 'Random Flicker' Zones: Unknown LEDs: 'Unknown LED 1' 'Unknown LED 2' 'Unknown LED 3' 'Unknown LED 4'

3: ASUS ROG STRIX 3090 O24G GAMING Type: GPU Description: ENE SMBus Device Version: AUMA0-E6K5-0107 Location: I2C: /dev/i2c-12, address 0x67 Modes: Direct [Off] Static Breathing Flashing 'Spectrum Cycle' Rainbow 'Chase Fade' Chase 'Random Flicker' Zones: Unknown LEDs: 'Unknown LED 1' 'Unknown LED 2' 'Unknown LED 3' 'Unknown LED 4' 'Unknown LED 5' 'Unknown LED 6' 'Unknown LED 7' 'Unknown LED 8' 'Unknown LED 9' 'Unknown LED 10' 'Unknown LED 11' 'Unknown LED 12' 'Unknown LED 13' 'Unknown LED 14' 'Unknown LED 15' 'Unknown LED 16' 'Unknown LED 17' 'Unknown LED 18' 'Unknown LED 19' 'Unknown LED 20' 'Unknown LED 21' 'Unknown LED 22'

sudo i2cdetect -l i2c-0 smbus SMBus I801 adapter at f040 SMBus adapter i2c-1 i2c i915 gmbus dpc I2C adapter i2c-2 i2c i915 gmbus dpb I2C adapter i2c-3 i2c i915 gmbus dpd I2C adapter i2c-4 i2c AUX B/DDI B/PHY B I2C adapter i2c-5 i2c AUX D/DDI D/PHY D I2C adapter i2c-6 i2c NVIDIA i2c adapter 1 at 1:00.0 I2C adapter i2c-7 i2c NVIDIA i2c adapter 4 at 1:00.0 I2C adapter i2c-8 i2c NVIDIA i2c adapter 5 at 1:00.0 I2C adapter i2c-9 i2c NVIDIA i2c adapter 6 at 1:00.0 I2C adapter i2c-10 i2c NVIDIA i2c adapter 7 at 1:00.0 I2C adapter i2c-11 i2c NVIDIA i2c adapter 8 at 1:00.0 I2C adapter i2c-12 i2c NVIDIA i2c adapter 1 at 3:00.0 I2C adapter i2c-13 i2c NVIDIA i2c adapter 3 at 3:00.0 I2C adapter i2c-14 i2c NVIDIA i2c adapter 5 at 3:00.0 I2C adapter i2c-15 i2c NVIDIA i2c adapter 6 at 3:00.0 I2C adapter i2c-16 i2c NVIDIA i2c adapter 7 at 3:00.0 I2C adapter i2c-17 i2c NVIDIA i2c adapter 8 at 3:00.0 I2C adapter i2c-18 i2c NVIDIA i2c adapter 1 at 4:00.0 I2C adapter i2c-19 i2c NVIDIA i2c adapter 5 at 4:00.0 I2C adapter i2c-20 i2c NVIDIA i2c adapter 6 at 4:00.0 I2C adapter i2c-21 i2c NVIDIA i2c adapter 7 at 4:00.0 I2C adapter i2c-22 i2c NVIDIA i2c adapter 8 at 4:00.0 I2C adapter i2c-23 i2c NVIDIA i2c adapter 1 at 5:00.0 I2C adapter i2c-24 i2c NVIDIA i2c adapter 5 at 5:00.0 I2C adapter i2c-25 i2c NVIDIA i2c adapter 6 at 5:00.0 I2C adapter i2c-26 i2c NVIDIA i2c adapter 7 at 5:00.0 I2C adapter i2c-27 i2c NVIDIA i2c adapter 8 at 5:00.0 I2C adapter i2c-28 i2c NVIDIA i2c adapter 1 at 6:00.0 I2C adapter i2c-29 i2c NVIDIA i2c adapter 3 at 6:00.0 I2C adapter i2c-30 i2c NVIDIA i2c adapter 5 at 6:00.0 I2C adapter i2c-31 i2c NVIDIA i2c adapter 6 at 6:00.0 I2C adapter i2c-32 i2c NVIDIA i2c adapter 7 at 6:00.0 I2C adapter i2c-33 i2c NVIDIA i2c adapter 8 at 6:00.0 I2C adapter i2c-34 i2c NVIDIA i2c adapter 1 at 7:00.0 I2C adapter i2c-35 i2c NVIDIA i2c adapter 5 at 7:00.0 I2C adapter i2c-36 i2c NVIDIA i2c adapter 6 at 7:00.0 I2C adapter i2c-37 i2c NVIDIA i2c adapter 7 at 7:00.0 I2C adapter i2c-38 i2c NVIDIA i2c adapter 8 at 7:00.0 I2C adapter

Fri Oct 28 16:00:56 2022
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 520.56.06 Driver Version: 520.56.06 CUDA Version: 11.8 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A | | 50% 44C P5 82W / 350W | 11800MiB / 24576MiB | 99% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce ... Off | 00000000:03:00.0 Off | N/A | | 50% 39C P5 106W / 350W | 11786MiB / 24576MiB | 99% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 NVIDIA GeForce ... Off | 00000000:04:00.0 Off | N/A | | 50% 40C P5 38W / 200W | 4722MiB / 12288MiB | 100% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 NVIDIA GeForce ... Off | 00000000:05:00.0 Off | N/A | | 50% 37C P5 32W / 200W | 1136MiB / 12288MiB | 100% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 4 NVIDIA GeForce ... Off | 00000000:06:00.0 Off | N/A | | 50% 40C P5 85W / 300W | 11524MiB / 12288MiB | 99% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 5 NVIDIA GeForce ... Off | 00000000:07:00.0 Off | N/A | | 50% 35C P5 80W / 300W | 11524MiB / 12288MiB | 99% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

cat /proc/driver/nvidia/gpus/0000\:01\:00.0/information Model: NVIDIA GeForce RTX 3090 IRQ: 135 GPU UUID: GPU-5b301eb8-e2db-77ef-647a-4cbc1a540cdb Video BIOS: 94.02.59.00.5f Bus Type: PCIe DMA Size: 47 bits DMA Mask: 0x7fffffffffff Bus Location: 0000:01:00.0 Device Minor: 0 GPU Excluded: No

cat /proc/driver/nvidia/gpus/0000\:03\:00.0/information Model: NVIDIA GeForce RTX 3090 IRQ: 136 GPU UUID: GPU-1b345b76-87ed-1017-e656-e93560d55088 Video BIOS: 94.02.42.00.a9 Bus Type: PCIe DMA Size: 47 bits DMA Mask: 0x7fffffffffff Bus Location: 0000:03:00.0 Device Minor: 1 GPU Excluded: No

cat /proc/driver/nvidia/gpus/0000\:04\:00.0/information Model: NVIDIA GeForce RTX 3060 IRQ: 137 GPU UUID: GPU-d9439839-dc25-e26d-e64d-22c29f46b1a7 Video BIOS: 94.06.2f.00.f5 Bus Type: PCIe DMA Size: 47 bits DMA Mask: 0x7fffffffffff Bus Location: 0000:04:00.0 Device Minor: 2 GPU Excluded: No

cat /proc/driver/nvidia/gpus/0000\:05\:00.0/information Model: NVIDIA GeForce RTX 3060 IRQ: 138 GPU UUID: GPU-9d950fbe-fa66-b53d-323c-ff7079cd223c Video BIOS: 94.06.2f.00.f5 Bus Type: PCIe DMA Size: 47 bits DMA Mask: 0x7fffffffffff Bus Location: 0000:05:00.0 Device Minor: 3 GPU Excluded: No

cat /proc/driver/nvidia/gpus/0000\:06\:00.0/information Model: NVIDIA GeForce RTX 3080 Ti IRQ: 139 GPU UUID: GPU-e9a3381e-0cef-29d1-d96c-a9abf81593b1 Video BIOS: 94.02.71.80.79 Bus Type: PCIe DMA Size: 47 bits DMA Mask: 0x7fffffffffff Bus Location: 0000:06:00.0 Device Minor: 4 GPU Excluded: No

cat /proc/driver/nvidia/gpus/0000\:07\:00.0/information Model: NVIDIA GeForce RTX 3080 Ti IRQ: 140 GPU UUID: GPU-43262d09-645f-44ea-e4b7-9514bebc8e75 Video BIOS: 94.02.71.80.cc Bus Type: PCIe DMA Size: 47 bits DMA Mask: 0x7fffffffffff Bus Location: 0000:07:00.0 Device Minor: 5 GPU Excluded: No

LarryDCJ commented 2 years ago

I'm available at larrydcj@icloud.com if you need a tester. I have an EVGA 3090 FTW3 Ultra.

aritger commented 1 year ago

I think all of the items mentioned in this Issue are addressed by the 525.35 release. I'd appreciate if testers could confirm.

SapphirusBeryl commented 1 year ago

I think all of the items mentioned in this Issue are addressed by the 525.35 release. I'd appreciate if testers could confirm.

I've got an EVGA 3080 FTW3, and can confirm the i2c interface is now working with 525.35. All functionality in OpenRGB pertaining to the RGB bar is in fact working.

$ i2cdetect -l
i2c-3   i2c         NVIDIA i2c adapter 1 at 8:00.0      I2C adapter
i2c-4   i2c         NVIDIA i2c adapter 5 at 8:00.0      I2C adapter
i2c-5   i2c         NVIDIA i2c adapter 6 at 8:00.0      I2C adapter
i2c-6   i2c         NVIDIA i2c adapter 7 at 8:00.0      I2C adapter
i2c-7   i2c         NVIDIA i2c adapter 8 at 8:00.0      I2C adapter
$ openrgb -l
0: EVGA GeForce RTX 3080 FTW3 Ultra LHR
  Type:           GPU
  Description:    EVGA Ampere RGB GPU Device
  Version:        1.01.15
  Location:       I2C: /dev/i2c-3, address 0x2D
  Modes: Off Direct Breathing 'Spectrum Cycle' 'Color Cycle' ['Rainbow Wave'] Wave Star 'Color Stack'
  Zones: 'Front Logo' 'End plate Logo' 'Back Logo' 'Addressable Header'
  LEDs: 'Front Logo' 'End plate Logo' 'Back Logo' 'Addressable Header'
atta2022 commented 1 year ago

Can confirm its still NOT working for me:

atta2k22@atta2k22-BTC-B250:~/LINUX/bzminer$ sudo openrgb -l Attempting to connect to local OpenRGB server. Connection attempt failed Local OpenRGB server unavailable. Running standalone. [i2c_smbus_linux] Failed to read i2c device PCI device ID [i2c_smbus_linux] Failed to read i2c device PCI device ID 0: Gigabyte RTX3060 EAGLE OC 12G V2 Type: GPU Description: RGB Fusion GPU Location: I2C: /dev/i2c-23, address 0x63 Modes: [Direct] Breathing Flashing 'Dual Flashing' 'Color Cycle' 'Spectrum Cycle' Zones: 'GPU Zone' LEDs: 'GPU LED'

1: Gigabyte RTX3060 EAGLE OC 12G V2 Type: GPU Description: RGB Fusion GPU Location: I2C: /dev/i2c-18, address 0x63 Modes: [Direct] Breathing Flashing 'Dual Flashing' 'Color Cycle' 'Spectrum Cycle' Zones: 'GPU Zone' LEDs: 'GPU LED'

2: ASUS TUF RTX 3080Ti O12G GAMING Type: GPU Description: ENE SMBus Device Version: AUMA0-E6K5-0107 Location: I2C: /dev/i2c-28, address 0x67 Modes: Direct [Off] Static Breathing Flashing 'Spectrum Cycle' Rainbow 'Chase Fade' Chase 'Random Flicker' Zones: Unknown LEDs: 'Unknown LED 1' 'Unknown LED 2' 'Unknown LED 3' 'Unknown LED 4'

3: ASUS ROG STRIX 3090 O24G GAMING Type: GPU Description: ENE SMBus Device Version: AUMA0-E6K5-0107 Location: I2C: /dev/i2c-12, address 0x67 Modes: Direct [Off] Static Breathing Flashing 'Spectrum Cycle' Rainbow 'Chase Fade' Chase 'Random Flicker' Zones: Unknown LEDs: 'Unknown LED 1' 'Unknown LED 2' 'Unknown LED 3' 'Unknown LED 4' 'Unknown LED 5' 'Unknown LED 6' 'Unknown LED 7' 'Unknown LED 8' 'Unknown LED 9' 'Unknown LED 10' 'Unknown LED 11' 'Unknown LED 12' 'Unknown LED 13' 'Unknown LED 14' 'Unknown LED 15' 'Unknown LED 16' 'Unknown LED 17' 'Unknown LED 18' 'Unknown LED 19' 'Unknown LED 20' 'Unknown LED 21' 'Unknown LED 22'

Sun Nov 13 12:23:44 2022
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.53 Driver Version: 525.53 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+

CalcProgrammer1 commented 1 year ago

Which card isn't working for you? It looks like your two ASUS cards are properly detected, as it reads the version string. RGB Fusion GPU should work even without the changes made here as it just uses byte accesses. Do note that OpenRGB has not been well tested against systems with multiple GPUs, maybe you're running into an issue due to that. Would you be able to test them one at a time?

NoX1De commented 1 year ago

I can confirm sadly that it is still not working for me after upgrading to the new 525.53 drivers (ASUS 3080 12GB ROG Strix GPU on Fedora 35)...what can I do to help? I am I missing something here or doing something wrong? I don't believe I am but please let me know...happy to assist with further testing etc. etc.

Mon Nov 14 12:17:43 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.53   Driver Version: 525.53       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
:~$ sudo i2cdetect -l
i2c-0   i2c         NVIDIA i2c adapter 1 at 29:00.0     I2C adapter
i2c-1   i2c         NVIDIA i2c adapter 3 at 29:00.0     I2C adapter
i2c-2   i2c         NVIDIA i2c adapter 5 at 29:00.0     I2C adapter
i2c-3   i2c         NVIDIA i2c adapter 6 at 29:00.0     I2C adapter
i2c-4   i2c         NVIDIA i2c adapter 7 at 29:00.0     I2C adapter
i2c-5   i2c         NVIDIA i2c adapter 8 at 29:00.0     I2C adapter
:~$ sudo openrgb -vv -l | grep ASUS
[ASUS Aura SMBus Motherboard] is enabled
[ASUS Aura SMBus Motherboard] no devices found
[ASUS Aura SMBus Motherboard] detection end
[ASUS Aura GPU (ENE)] is enabled
[ASUS Aura GPU (ENE)] no devices found
[ASUS Aura GPU (ENE)] detection end
[ASUS Aura GPU] is enabled
[ASUS Aura GPU] no devices found
[ASUS Aura GPU] detection end
:~$ sudo openrgb -l -v
Attempting to connect to local OpenRGB server.
Connection attempt failed
Local OpenRGB server unavailable.
Running standalone.
------------------------------------------------------
|               Start device detection               |
------------------------------------------------------
Initializing HID interfaces: Success
------------------------------------------------------
|             Detecting I2C interfaces               |
------------------------------------------------------
Registering I2C interface: /dev/i2c-3 Device 10DE:220A Subsystem: 1043:886B
Registering I2C interface: /dev/i2c-1 Device 10DE:220A Subsystem: 1043:886B
Registering I2C interface: /dev/i2c-4 Device 10DE:220A Subsystem: 1043:886B
Registering I2C interface: /dev/i2c-2 Device 10DE:220A Subsystem: 1043:886B
Registering I2C interface: /dev/i2c-0 Device 10DE:220A Subsystem: 1043:886B
Registering I2C interface: /dev/i2c-5 Device 10DE:220A Subsystem: 1043:886B
------------------------------------------------------
|               Detecting I2C devices                |
------------------------------------------------------
------------------------------------------------------
|               Detecting HID devices                |
------------------------------------------------------
[X570 AORUS ELITE] Registering RGB controller
[Gigabyte RGB Fusion 2 USB] successfully added
------------------------------------------------------
|              Detecting other devices               |
------------------------------------------------------
------------------------------------------------------
|                Detection completed                 |
------------------------------------------------------
0: X570 AORUS ELITE
  Type:           Motherboard
  Description:    IT8297BX-GBX570
  Version:        0x00060001
  Location:       HID: /dev/hidraw1
  Serial:         redacted
  Modes: [Direct] Static Breathing Blinking 'Color Cycle' Flashing
  Zones: 'D_LED1 Bottom' 'D_LED2 Top' Motherboard
  LEDs: 'Back I/O' 'CPU Header' PCIe 'LED C1/C2'
CalcProgrammer1 commented 1 year ago

If the GPU isn't being detected at all it may not yet be added to OpenRGB. Even without the patch it looked like ASUS ENE GPUs were being detected.

On Mon, Nov 14, 2022 at 11:20 AM NoX1De @.***> wrote:

I can confirm sadly that it is still not working for for me after upgrading to the new 525.53 drivers (ASUS 3080 12GB ROG Strix GPU on Fedora 35)...what can I do to help?

Mon Nov 14 12:17:43 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.53 Driver Version: 525.53 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+

:~$ sudo i2cdetect -l i2c-0 i2c NVIDIA i2c adapter 1 at 29:00.0 I2C adapter i2c-1 i2c NVIDIA i2c adapter 3 at 29:00.0 I2C adapter i2c-2 i2c NVIDIA i2c adapter 5 at 29:00.0 I2C adapter i2c-3 i2c NVIDIA i2c adapter 6 at 29:00.0 I2C adapter i2c-4 i2c NVIDIA i2c adapter 7 at 29:00.0 I2C adapter i2c-5 i2c NVIDIA i2c adapter 8 at 29:00.0 I2C adapter

:~$ sudo openrgb -vv -l | grep ASUS [ASUS Aura SMBus Motherboard] is enabled [ASUS Aura SMBus Motherboard] no devices found [ASUS Aura SMBus Motherboard] detection end [ASUS Aura GPU (ENE)] is enabled [ASUS Aura GPU (ENE)] no devices found [ASUS Aura GPU (ENE)] detection end [ASUS Aura GPU] is enabled [ASUS Aura GPU] no devices found [ASUS Aura GPU] detection end

:~$ sudo openrgb -l Attempting to connect to local OpenRGB server. Connection attempt failed Local OpenRGB server unavailable. Running standalone. 0: X570 AORUS ELITE Type: Motherboard Description: IT8297BX-GBX570 Version: 0x00060001 Location: HID: /dev/hidraw1 Serial: redacted Modes: [Direct] Static Breathing Blinking 'Color Cycle' Flashing Zones: 'D_LED1 Bottom' 'D_LED2 Top' Motherboard LEDs: 'Back I/O' 'CPU Header' PCIe 'LED C1/C2'

— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/open-gpu-kernel-modules/issues/41#issuecomment-1314113429, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIAY7DEKA4GUWGNOJNL3ZLWIJYHJANCNFSM5VWWIISA . You are receiving this because you were mentioned.Message ID: @.***>

NoX1De commented 1 year ago

I see, is there anything I can do to help get the process started to have the GPU added to OpenRGB or is there already an open issue to get the ASUS 3080 12GB ROG Strix and additional ASUS GPU's supported now that the Nvidia drivers seem to have the required code changes? I would assume there's an open issue now that I re-read the original issue here, so maybe I just need to wait for the changes in OpenRGB, that said let me know if I can help at all.

CalcProgrammer1 commented 1 year ago

We just need the PCI Vendor ID, Device ID, Subvendor ID, and Subdevice ID along with the official marketing name of the card (link to ASUS website would be good). There's a 99% chance it has the same i2c chip at the same address as all of the other 3xxx ASUS cards but to be safe we have an allow list of known GPU IDs so we don't accidentally try controlling any I2C device with the wrong protocol.

CalcProgrammer1 commented 1 year ago

Also, make sure you're using the latest pipeline/git version of OpenRGB. A lot of GPUs have been added to the list since 0.7.

aritger commented 1 year ago

Thanks for the all the discussion.

If I understand correctly, the issues @atta2022 and @NoX1De mention are different than what was originally reported here. I'd like to avoid this Issue turning into an unbounded "OpenRGB+NVIDIA doesn't work" thread that morphs over time and stays open forever.

Given @SapphirusBeryl's test results, I think the originally-reported problems are resolved, and anything else can be treated as separate Issues.

I'll give it another day or two for any other feedback that suggestions the original problems tracked here are still not working correctly with 525.35.

atta2022 commented 1 year ago

ok, will report my issue to /dev/null, thank you.

aritger commented 1 year ago

There is no need for snark, @atta2022

All I suggested was that separate problems should be tracked as separate Issues, which doesn't seem an unreasonable engineering practice. If you believe the problem you are seeing is the same as the problem originally reported in this Issue, then let's work it here. Otherwise, please file a separate Issue.

From the comments above, @CalcProgrammer1 already suggested to help isolate the problem you are seeing by testing the GPUs individually. Further, there was an indication that the path you are hitting through OpenRGB wouldn't have been impacted by the particular driver bug we were originally tracking in this Issue. Maybe your configuration is tripping on an issue in OpenRGB, or maybe you've found a different driver bug. In either case, separating different problems into different github Issues will make it easier to solve the problems.

NoX1De commented 1 year ago

Thanks @CalcProgrammer1, given the comment above- in an effort to not clutter this issue if it's not due to the originally reported issue here I've pinged you at the link below to move the conversation there, it seems my card may already have support added in OpenRGB given that issue, so I'm not sure what is going on at this point? Am I missing something? This also may beg the question that there may still be some problems related to the original issue reported here, hence this comment on this thread again. For now, I will take the conversation elsewhere unless we discover my problem is due to the original reported in this issue since related dialog is seemingly discouraged here.

https://gitlab.com/CalcProgrammer1/OpenRGB/-/issues/2508

NoX1De commented 1 year ago

Hi there @aritger, I've done some testing with the new 525.53 drivers and modifying the LED colors for my ASUS 3080 12GB ROG Strix GPU and it seems that there may still be some issues with the Nvidia driver. I swapped back to the Nouveau drivers and everything works great, any color I specify is set as expected. The issue is that when attempting to specify a certain color to change the LED(s) to when using the latest version of OpenRGB with the 525.53 drivers it will not set the LED(s) to the specified color code and just always sets the color to blue no matter what color code is specified. The Nvidia drivers do work with some "preset" modes (e.g. 'Spectrum Cycle', Rainbow) but any mode where you attempt to manually specify a color/colors does not work and it will always just be set to blue when using the Nvidia 525.53 drivers.

See more detailed comments here: https://gitlab.com/CalcProgrammer1/OpenRGB/-/issues/2508

CalcProgrammer1 commented 1 year ago

I believe this is due to the handling of SMBus block operations. The mode is set using standard byte operations but the colors are set using 3 byte blocks. An SMBus block write operation of 3 bytes of data actually contains 5 bytes of data and I expect NVIDIA is getting it wrong in your case. It should send the on-chip register address, the data size (3), and then the 3 data bytes (color data).

In addition on the ENE controller it first performs a word write containing a 16 bit ENE register, but this part of the NVIDIA driver has been fixed already if your version string looks good and the mode setting works.

On Fri, Nov 18, 2022 at 6:28 PM NoX1De @.***> wrote:

Hi there @aritger https://github.com/aritger, I've done some testing with the new 525.53 drivers and modifying the LED colors for my ASUS 3080 12GB ROG Strix GPU and it seems that there may still be some issues with the Nvidia driver. I swapped back to the Nouveau drivers and everything works great, any color I specify is set as expected. The issue is that when attempting to specify a certain color to change the LED(s) to when using the latest version of OpenRGB with the 525.53 drivers it will not set the LED(s) to the specified color code and just always sets the color to blue no matter what color code is specified. The Nvidia drivers do work with some "preset" modes (e.g. 'Spectrum Cycle', Rainbow) but any mode where you attempt to manually specify a color/colors does not work and it will always just be set to blue when using the Nvidia 525.53 drivers.

See more detailed comments here: https://gitlab.com/CalcProgrammer1/OpenRGB/-/issues/2508

— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/open-gpu-kernel-modules/issues/41#issuecomment-1320385452, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIAY7HQUB4P3QWVDHDMSLLWI7DFTANCNFSM5VWWIISA . You are receiving this because you were mentioned.Message ID: @.***>

darrellenns commented 1 year ago

Tested successfully with 525.60 and EVGA 3070 FTW3 Ultra. Setting various RGB modes and colors via OpenRGB all worked as expected.

NoX1De commented 1 year ago

I updated to 525.60.11 and the latest version of OpenRGB (0.81)...and I just wanted to report/confirm that setting specific colors is still not working with my ASUS 3080 12GB ROG Strix GPU. All attempts to set a specific color still just only sets the LED(s) to blue aside from the pre-configured modes (e.g. Spectrum Cycle, Rainbow...)

$ sudo openrgb -l
0: ASUS ROG STRIX RTX 3080 O12G
  Type:           GPU
  Description:    ENE SMBus Device
...snip...
$ openrgb -V
OpenRGB 0.81, for controlling RGB lighting.
  Version:       0.81
  Build Date         Wed, 07 Dec 2022 00:33:26 -0500
  Git Commit ID      cf610fa559fcde6f2d49ee8b9c1a53f7b3694642
  Git Commit Date    2022-12-06 21:43:50 +0100
  Git Branch         master
  Wed Dec  7 00:42:58 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11    Driver Version: 525.60.11    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
atta2022 commented 1 year ago

I gave up, also with version 0.81 openrgb and new drivers nvidia its still NOT possible to set all my cards. 2 simply not identified, what is weird because temp sensor is also being read by (mining) software.

gardotd426 commented 1 year ago

I wanted to make sure I came back to confirm that my EVGA XC3 Ultra 3090 is now detected when using the 525 branch of the NV driver and current master of OpenRGB (built from source yesterday).

The XC3 Ultra has two zones - both EVGA lohos, one along the top of the card and another on the backplate. Both show up in the main tab of the OpenRGB GUI, and I can set them to specific colors with no issue, and the two zones also show up as options for Hardware-Sync lighting in the HWSync plugin. I would say that for my card at least, the current state of things amounts to 99% compatibility. The only thing that I believe a reasonable user would expect to work but that still doesn't is using the GPU as the SOURCE for the HWSync plugin.

To clarify, I can go right now and add both zones to my AIO and have the GPU lighting change depending on my CPU temperature, or I could sync it to anything else detected by lm-sensors. I imagine most of us here already know why it doesn't work - Nvidia doesnt use the standard sysfs method of reporting thermals, power draw, etc. Instead they use the nvcontrol X11 extension, aka libxnvcntrl. This is how GreenWithEnvy is able to show temps, fan speeds, set fan curves, etc. Its also why we don't have fan control/thermal monitoring in Wayland (@ the NV engineers here: surely this is coming soon along with VRR in Wayland, right? I mean VRR support plus a method for controlling temps, clocks, power limits, etc are basically the only things NV lacks on Wayland vs AMD, and NVControl can't make the jump, and X is dead).

But even without an lm-sensors way to monitor temps for OpenRGB, I think we could probably pretty easily get it working in the interim. @CalcProgrammer1 should I post on the GitLab about how best to go about sending thermal data to OpenRGB? I mean GWE has a button you can click that will pop out a new window with rolling graphs showing current and past fan speed, temps, clocks, everything. The code Rob uses to get the data is already there, its just in python. But he would probably be willing to lend a hand, and maybe until NV has a universal method for reporting such things, we could add a patch fhat checks to see if the user is on X11 and has libxnvcntrl installed, and if so then OpenRGB (or the plugin, idk) would show the GPU data as options in stuff like HWSync.

Or maybe we could send the data from GWE to OpenRGB through its server protocol?

gardotd426 commented 1 year ago

I gave up, also with version 0.81 openrgb and new drivers nvidia its still NOT possible to set all my cards. 2 simply not identified, what is weird because temp sensor is also being read by (mining) software.

@atta2022 ....mining software? What are you mining, literal coal? It'd be more of a real thing than GPU cryptomining. So if you're not mining, then why on earth would you use mining software to get temps?

Because we already have a great GPU clocks/power limits/fan curve/temperature app for NV on X11 on Linux. GreenWithEnvy.

And there's not a single thing weird about 2 of your cards reporting their temps to your mining software, because thats literally a completely different plane of existence than using SMBUS/i2c to control RGB lighting. Thermal get reported through a different subsystem.

I would follow the devs' recommendations and isolate your issue and then report it in a new thread either here or at the OpenRGB gitlab, depending on whose issue it is.

CalcProgrammer1 commented 1 year ago

I wanted to make sure I came back to confirm that my EVGA XC3 Ultra 3090 is now detected when using the 525 branch of the NV driver and current master of OpenRGB (built from source yesterday).

The XC3 Ultra has two zones - both EVGA lohos, one along the top of the card and another on the backplate. Both show up in the main tab of the OpenRGB GUI, and I can set them to specific colors with no issue, and the two zones also show up as options for Hardware-Sync lighting in the HWSync plugin. I would say that for my card at least, the current state of things amounts to 99% compatibility. The only thing that I believe a reasonable user would expect to work but that still doesn't is using the GPU as the SOURCE for the HWSync plugin.

To clarify, I can go right now and add both zones to my AIO and have the GPU lighting change depending on my CPU temperature, or I could sync it to anything else detected by lm-sensors. I imagine most of us here already know why it doesn't work - Nvidia doesnt use the standard sysfs method of reporting thermals, power draw, etc. Instead they use the nvcontrol X11 extension, aka libxnvcntrl. This is how GreenWithEnvy is able to show temps, fan speeds, set fan curves, etc. Its also why we don't have fan control/thermal monitoring in Wayland (@ the NV engineers here: surely this is coming soon along with VRR in Wayland, right? I mean VRR support plus a method for controlling temps, clocks, power limits, etc are basically the only things NV lacks on Wayland vs AMD, and NVControl can't make the jump, and X is dead).

But even without an lm-sensors way to monitor temps for OpenRGB, I think we could probably pretty easily get it working in the interim. @CalcProgrammer1 should I post on the GitLab about how best to go about sending thermal data to OpenRGB? I mean GWE has a button you can click that will pop out a new window with rolling graphs showing current and past fan speed, temps, clocks, everything. The code Rob uses to get the data is already there, its just in python. But he would probably be willing to lend a hand, and maybe until NV has a universal method for reporting such things, we could add a patch fhat checks to see if the user is on X11 and has libxnvcntrl installed, and if so then OpenRGB (or the plugin, idk) would show the GPU data as options in stuff like HWSync.

Or maybe we could send the data from GWE to OpenRGB through its server protocol?

This is not an issue with NVIDIA's driver at this point. I think the appropriate place to post this would be on the Hardware Sync Plugin GitLab (https://gitlab.com/OpenRGBDevelopers/OpenRGBHardwareSyncPlugin IIRC) if you're having an issue with a missing data source in the plugin, or you could post a feature request to GWE to add OpenRGB SDK integration (not sure that's in scope of their project). OpenRGB itself does not provide hardware sync or effects, so opening an issue on the OpenRGB GitLab is not the way to go.

colinjmatt commented 1 year ago

I updated to 525.60.11 and the latest version of OpenRGB (0.81)...and I just wanted to report/confirm that setting specific colors is still not working with my ASUS 3080 12GB ROG Strix GPU. All attempts to set a specific color still just only sets the LED(s) to blue aside from the pre-configured modes (e.g. Spectrum Cycle, Rainbow...)

Confirmed this is also still the case on an ASUS Strix LC 3080 TI with nvidia 525.60.11-3 on Arch Linux

dev-m-mulvey commented 1 year ago

Confirming that the same issues mentioned by @colinjmatt and @NoX1De occur using openrgb v0.81 and nvidia 525.60.11 on a system using an asus TUF RTX 3080Ti O12G card and manjaro linux (xfce) OS.

owenmylotte commented 1 year ago

Hello. I'm wondering if there is a specific thread for the issue mentioned above? (@CalcProgrammer1 @KidA3995 @colinjmatt @NoX1De)

I'll expand on the symptoms a bit in case it gives a clue to what is going wrong:

So, the only color that can come out of the LEDs is blue, effectively.

Thanks for all of the help so far.

NoX1De commented 1 year ago

Hello. I'm wondering if there is a specific thread for the issue mentioned above? (@CalcProgrammer1 @KidA3995 @colinjmatt @NoX1De)

I'll expand on the symptoms a bit in case it gives a clue to what is going wrong:

  • OpenRGB sucessfully recognizes my card (ASUS ROG STRIX 3060 O12G V2 GAMING) and is able to set different modes as far as default color cycles and time dependent flashing things is concerned.
  • However, the colors are being written incorrectly and this is happening in a very specific way.

    • The red channel writes blue to the LEDs.
    • The blue channel and green channel both write an extremely faint but just barely visible red to the LEDs.

So, the only color that can come out of the LEDs is blue, effectively.

Thanks for all of the help so far.

AFAIK this is still a Nvidia driver issue per the details found in this comment https://github.com/NVIDIA/open-gpu-kernel-modules/issues/41#issuecomment-1320561614 I am not sure where this currently stands but @aritger was the last Nvidia person to comment, is there perhaps any update on the issue described in the comment I've linked here from @CalcProgrammer1 in terms of this being rectified in a newer driver version or otherwise? There seems to be a number of people wondering if this will be fixed at this point.

WACOMalt commented 11 months ago

I just want to confirm that I see the same behavior with manual color control on my "Asus Rog Strix 3090 O24G Gaming". Spectrum Cycle and Rainbow work fine. But any manual color control results in Blue. I am testing in v0.9 of OpenRGB with NVidia drivers 535 (535.129.03-0ubuntu0.22.04.1 full version name).

I'm happy to test further if there's anything I can contribute. Is there any separae issue yet for this on NVidia's end?

aritger commented 11 months ago

My apologies, this fell off my radar. Thanks to @CalcProgrammer1's diagnosis in https://github.com/NVIDIA/open-gpu-kernel-modules/issues/41#issuecomment-1320561614, hopefully this is fairly easy to resolve. I've filed NVIDIA internal bug 4396674 for the always-blue symptom.

MG-5 commented 9 months ago

My apologies, this fell off my radar. Thanks to @CalcProgrammer1's diagnosis in #41 (comment), hopefully this is fairly easy to resolve. I've filed NVIDIA internal bug 4396674 for the always-blue symptom.

Is there some progress to the bug 4396674?

I also can confirm the same problem described in https://github.com/NVIDIA/open-gpu-kernel-modules/issues/41#issuecomment-1431452938 on ASUS ROG STRIX 3070Ti

Kwandos commented 8 months ago

My apologies, this fell off my radar. Thanks to @CalcProgrammer1's diagnosis in #41 (comment), hopefully this is fairly easy to resolve. I've filed NVIDIA internal bug 4396674 for the always-blue symptom.

I have the same problem on my ASUS TUF GAMING V2 3070 ti, wondering if there will be a fix?

Andresdiaz16 commented 8 months ago

I have the same issue with a ASUS ROG STRIX 3080ti, Openrgb -v 0.9.2, nvidia-drivers = 550.54.14, on Arch Linux if I can help testing let me know.

GHJebus commented 7 months ago

I'm seeing what I believe to be this same issue with my ASUS 4070 Ti Super. Windows works fine with setting addressable colors via Armoury Crate or OpenRGB however under linux OpenRGB colors are 'garbled'. And it's random on reboot. In one case only Red could be set and it appeared Blue, and in another case Red is Teal, and Green and Blue are both Green. Though after a fresh boot from a power off state it presents consistently as Red is Blue and Green/Blue are ignored.

Because OpenRGB color addressing works in Windows, I can only assume it's something with the linux nvidia driver. I'm currently using driver 550.54-14, and a fork of OpenRGB 0.91 with added support for the ASUS TUF RTX 4070 Ti Super O16G Gaming card.

I'm happy to help with debugging/patching to assist getting to the root of the issue. Though, in the mean time, are there any updates with the filed bug 4396674?

Thanks!

colinjmatt commented 4 months ago

My apologies, this fell off my radar. Thanks to @CalcProgrammer1's diagnosis in #41 (comment), hopefully this is fairly easy to resolve. I've filed NVIDIA internal bug 4396674 for the always-blue symptom.

@aritger has any progress been made on this?

aritger commented 4 months ago

Sorry, I'm not aware of any progress, yet.

MG-5 commented 3 months ago

It is quite disappointing to see that after two years this issue is still open. Obviously, no emphasis is being placed on rectifying the error. Does it have to be?

To summarise for you: It also works with the Windows and Noveau drivers, so it can't really be difficult to see what the differences are in I2C/SMBus communication between the working drivers and this Linux driver. In addition, the author of the thread has already described a possible source of error in quite some detail.

colinjmatt commented 3 months ago

It is quite disappointing to see that after two years this issue is still open. Obviously, no emphasis is being placed on rectifying the error. Does it have to be?

To summarise for you:

It also works with the Windows and Noveau drivers, so it can't really be difficult to see what the differences are in I2C/SMBus communication between the working drivers and this Linux driver. In addition, the author of the thread has already described a possible source of error in quite some detail.

Sadly the Linux community is a second class one to companies like nVidia. No one will prioritise this high enough to be resolved. I will very likely look to AMD for my next GPU as I've had far better results with Mesa.

steventid commented 3 weeks ago

I'm fairly confident I've come up with a workaround for this. Since I currently dual boot between Windows 10 and Pop OS, I can also confirm that OpenRGB works just fine to set the LEDs in Win10 but not under Linux. Specs: ASUS TUF RTX 4090 O24G, shows up as ENE SMBus Device version AUMA0-E6K5-0107, I2C bus 12, address is 0x67

Edit: forgot to add I'm currently on nvidia-driver 560.35.03

I'm working on a better write up and a merge request right now for OpenRGB. Hopefully I can find some testers to confirm it's working on hardware other than my own.

Hopefully something I found may be of use to someone at Nvidia to help them fix a possible bug in the driver that's been around for at least 2 years from the age of this post. More news to come soon!

colinjmatt commented 3 weeks ago

Hopefully something I found may be of use to someone at Nvidia to help them fix a possible bug in the driver that's been around for at least 2 years

Don't count on nvidia fixing anything here. At least the OpenRGB team would be open to your workaround though.

steventid commented 3 weeks ago

TL; DR: Merge request here. It's an off-by-one error, most likely in the Nvidia driver.

Detailed write up to hopefully help Nvidia because I still have faith in them!

Disclaimer: I'm extremely new to hardware programming and i2c in general, and spent about 3 days testing things on my own hardware to figure out what was going wrong with various methods, including using a python script and smbus2 to poke random values into the lights on the side of a $2,000 piece of hardware. For science!

I've only been able to test on 2 cards, both of which are registered by OpenRGB as: ASUS TUF RTX 4090 O24G ENE SMBus Device AUMA0-E6K5-0107 I2C: /dev/i2c-12, address 0x67

Dual boot Windows 10 and Pop OS 22.04 LTS using Nvidia Driver 560.35.03

This helped me confirm that the problem isn't in OpenRGB, as the packets being sent from the 2 different i2c_smbus interfaces are identical, and the light works exactly as intended under Windows.

Each packet contains a data block, which should contain 1 byte for the length of data in the packet, then the actual bytes of data. OpenRGB makes 34 bytes available per packet, the ones I'm looking at all come from an ENESMBusController and have a length of 3, specifically the Red, Blue, Green channel data in that order (this will be relevant soon).

So to set 1 LED to red, we send: [3, 255, 0, 0] //Length: 3 bytes of data, Red=255, Blue=0, Green=0

Due to a probable bug in the proprietary Nvidia driver's handling of i2c packets, the packet is misinterpreted.

I noticed @CalcProgrammer1 said he was able to set the lights correctly 1 byte at a time. So I wrote up a python script using smbus2 and started poking at things. Sure enough, setting 1 byte at a time worked as intended and the lights were the correct colors!

Then, because the data seemed to be shifted by a color channel (blue instead of red) I started poking things more. I found that if I put an extra byte at the front of the data, like this: [255] + [255, 0, 0] this would give me a correct RED led from my python script. (python already calculated the length for me internally so I didn't have to do that, so effectively I was sending [4, 255, 255, 0, 0], I just didn't completely understand that yet)

If I sent a longer packet, something like [255, 0, 0, 255, 0, 0, 0, 0, 255, 0, 0, 255], as long as I added 1 extra byte on the front, it would color all 4 lights correctly, as the registers on the card are all sequentially ordered (I found this out when doing the single byte writes earlier). Or I could do groups of 2 lights, etc. I did quite a bit of testing to make sure I understood what values in what positions were making which changes to the LEDs.

The only other difference is that OpenRGB was sending packets as a 5 and python was using 8, which I learned from the header files are constants defined as:

#define I2C_SMBUS_BLOCK_DATA        5
#define I2C_SMBUS_I2C_BLOCK_BROKEN  6
#define I2C_SMBUS_BLOCK_PROC_CALL   7           /* SMBus 2.0 */
#define I2C_SMBUS_I2C_BLOCK_DATA    8

So back to why are we getting blue instead of red?.

What I had to do under C++ in OpenRGB to fix the problem is this:

When the i2c_smbus_linux interface gets a packet in it's i2c_smbus_xfer method which contains some very specific data from how it's being sent by OpenRGB, specifically the packet is:

On address 0x67 contains I2C_SMBUS_WRITE for the read_write parameter contains 0x03 for the command parameter block[0] in the data parameter is equal to 3, because all of the OpenRGB packets are 3 bytes for RBG ordered bytes

I changed the packet from this: [3, 255, 0, 0]

Into this: [4, 255, 255, 0, 0]

I confirmed that by removing the length check in the ENESMBusController so I could send all 12 bytes at once, so long as I prepended any random byte to the front (between the length and the actual color data), it now works to set the lights correctly, as long as I modify the packet before actually sending it out from the interface to the hardware:

  1. Length stored in data->block[0] has to be incremented from 3 to 4. If not, the color still does not display properly (probably because data->block is a pointer and was just full of random data after the first 3 bytes were set by OpenRGB, as they should be ignored by the driver anyway after the length value worth of bytes has been read.
  2. I used memmove() to copy the bytes 1 position to the right (the 255 in the example above just happens because the memmove leaves the original 255 there and makes a copy of the 3 bytes, shifted 1 position to the right)
  3. The message type in the packet has to be changed from 5 to be either 6 or 8 (the constants above). I was confused at first why 6 works, but based on [this] (https://android.googlesource.com/kernel/common/+/4b2643d7d9bdcd776749e17f73c168ddf02e93cb) article, the block broken is a fallback to an older protocol handler.

In summary, I'm about 99% certain that in the Nvidia Proprietary Linux driver's i2c handler, the length is being read correctly from the data block, but then the data bytes are being read starting from block[2] instead of block[1]. By inserting a single byte after the length into the packet before sending it to the driver, my LEDs display the proper color, and as far as I can tell the driver just ignores the extra byte. (However if the same bug happens when the driver is sending packets, that could cause other issues)

Sending a single RGB packet was fine in Windows, so that made me think it wasn't a hardware issue. Then, when no matter how many LEDs worth of data I sent, as long as I put a SINGLE byte in the front of them (after the length), all 4 of my lights colored correctly in Linux as well, so I'm fairly confident that this is a bug in the driver code.

If there is anyone from Nvidia who may have read this far:

In the i2c handler, when you get a packet of type I2C_SMBUS_BLOCK_DATA, It should be interpreted as:

length = data->block[0]
copy length bytes starting at data->[1]. 

However, the copy starts at data->[2] by mistake.

Thanks so much for anyone who took the time to read this far, and if you'd like to test the code, or if you have a card that doesn't show up as address 0x67, you can reach out on gitlab or on the OpenRGB discord!

Edit: Typos, formatting.

steventid commented 3 weeks ago

More technical details to what the error seems to be after spending a few more hours testing.

Again TL;DR: Proprietary driver seems to be handling BLOCK data as WORD data internally.

More tech details, putting this here in hopes that someone from Nvidia can check things out on their side to see if this can be tracked down.

The packets OpenRGB is sending are marked as I2C_SMBUS_BLOCK_DATA, which should be interpreted as a length byte + data values.

The kernel driver seems to be interpreting them as I2C_SMBUS_WORD_DATA instead. Upon more testing, it also seems to be interpreting I2C_SMBUS_I2C_BLOCK_DATA as WORD data as well, which explains why in my first workaround, I still had to increase the packet length and shift the data over by 1 byte.

The following code change in OpenRGB can also bypass the issue, in addition to the fix I posted in the merge request on OpenRGB

//shift blue into hi byte, shift red into lo byte
data->word = (data->block[2] << 8) | data->block[1];
ioctl(handle, I2C_SMBUS, &args);

//shift grn to hi byte, which masks lo byte
data->word = data->block[3];
return ioctl(handle, I2C_SMBUS, &args);

I figured this out because with these specific GPUs, when sending a packet with [3, 0, 0, 0] for length of 3 and RBG 0, 0, 0 (to turn the lights off), you actually get a very faint Red color. So on a whim I tried setting the length packet to 255 and everything else to 0, and I got FULL RED!

Further, setting length to 0 and the next byte to 255 gave FULL BLUE.

I literally filled the entire buffer with 255 and nothing changed, just got magenta because the first 2 bytes were for Red and Blue. If I kept the buffer full of 255 and fiddled with indices 0 and 1, I'd get random shades of red, blue, or magenta. Nothing I did could make green show up.

However, (I said originally I'm new to i2c protocols) I did learn that sending 2 consecutive writes to the same device on many devices will apparently allow you write consecutive words and will increment the write location automatically, so it'll just go to the next channels on the LED in this case. Sure enough, if I move the value that was originally intended to be Green into the data->word (which effectively moves it to block[0], since data is a union), you then get the green channel to work as intended as well.

So effectively, if you send 2 packets and keep the type I2C_SMBUS_BLOCK_DATA, the Nvidia driver will interpret them as:

[Red, Blue] [Green, ?? ] -> If you step through the code line by line, it technically sends the ?? byte to the red channel of the next LED in the line, but by setting data->word to the byte in block[3], it effectively masks out the 4th byte to a 0 so there's no weird flicker. It hasn't seemed to hurt anything on my GPU, and even without making any changes, the Nvidia driver IS possibly still writing 1 additional byte past the end of the LEDs anyway.

I personally wasn't able to get my card to even show up using nouveau driver, however I didn't try very long because I've already confirmed that the code works fine for Windows on the same hardware. I'll probably take a look at the open kernel drivers here and see if there's anything I can find that might account for this, but again it may be in the proprietary driver and I can't see that code 🤣