Askannz / msi-perkeyrgb

Linux CLI tool to control per-key RGB lighting on MSI laptops.
MIT License
210 stars 39 forks source link

Sending an effect without transitions bricks the backlight #24

Open rodolpheh opened 4 years ago

rodolpheh commented 4 years ago

After trying to create and use an effect through a configuration file, I completely lost the ability to have keyboard backlight, following an hidapi error :

At first reboot, the SteelSeries keyboard wasn't showing up in lsusb. It started showing up again after a full shutdown. I tried to update the UEFI BIOS firmware (following the MSI instructions) to see if it would reset the keyboard but it did not.

Here is the configuration file that made my keyboard crash and triggered the hidapi error (sorry I did not logged it and I can't make it appear again) :

all steady ff0000
arrows steady 0000ff
numpad steady 00ff00

effect test
  start ffffff
  wave 0 0 x 100 out
end

all effect test

My laptop model is a MSI GE65 Raider 9SE

My system is 5.3.7-arch1-1-ARCH x86_64 GNU/Linux

My msi-perkeyrgb version is 1.4_effects_alpha, built from the AUR this morning (which shows a bump to version 2 normally)

The SteelSeries keyboard is showing in lsusb as Bus 001 Device 004: ID 1038:1122 SteelSeries ApS SteelSeries KLC

And finally, greping SteelSeries in dmesg give me :

[    2.689366] usb 1-9: Product: SteelSeries KLC
[    2.689367] usb 1-9: Manufacturer: SteelSeries
[    2.690642] hid-generic 0003:1038:1122.0004: hiddev2,hidraw3: USB HID v1.11 Device [SteelSeries SteelSeries KLC] on usb-0000:00:14.0-9/input0
[    2.691132] input: SteelSeries SteelSeries KLC as /devices/pci0000:00/0000:00:14.0/usb1/1-9/1-9:1.1/0003:1038:1122.0005/input/input13
[    2.747859] hid-generic 0003:1038:1122.0005: input,hidraw4: USB HID v1.11 Device [SteelSeries SteelSeries KLC] on usb-0000:00:14.0-9/input1

When I now use msi-perkeyrgb, it doesn't output anything.

Did you happen to stumble on such issue ? Does anyone knows a way to solve this ?

Let me know if you need more information

rodolpheh commented 4 years ago

Adding here the report from lsusb. If anyone could compare and see if there is something wrong in mine (for example the ** UNAVAILABLE ** maybe ?).

lsusb_-vs_001:004.log

Askannz commented 4 years ago

I tried your config and the exact thing happened.

  1. Some Python exception caused by an HID error, and all lights stop working
  2. After first reboot, more usb errors in dmesg and the controller doens't appear in lsusb
  3. After a second reboot, the keyboard is back in lsusb but with no lights and doesn't respond to anything.

Fuck. This is pretty serious. I'm going to pull away the new effects features until we can figure out wtf is happening.

Askannz commented 4 years ago

My power button light still works normally, though.

I'm going to try running the SteelSeries software from a windows partition and see if it can still pick the keyboard up.

Askannz commented 4 years ago

Can't fix it from Windows either. SteelSeriesEngine does pick up the keyboard and acts as if everything is normal, but the lights stay off.

@TauAkiou any idea ? Just to be clear, I'm not blaming you (or anyone) for this. But you probably know the HID protocol of SteelSeries better than me at this point. Do you think there's any command that could be sent to the RGB controller to get it out of this broken state ?

TauAkiou commented 4 years ago

I'm going to have to look into this. It looks like the USB device really doesn't like receiving a block that has no transitions. That's bad design on MSI's part if it breaks the entire keyboard. I'm going to look into the code that the program generates using that and take a look to see if there's a way to hard reset the controller.

TauAkiou commented 4 years ago

One thing I would try doing in the meantime is resetting the Embedded Controller, which may reset the keyboard and fix the issue. MSI provides instructions here: https://forum-en.msi.com/index.php?topic=112416.0

rodolpheh commented 4 years ago

I already tried resetting the EC by holding the button under the laptop for up to 30 secondes (with power disconnected), with no effect. I'll try again just in case i did it wrong.

EDIT: I tried again, this time letting the laptop rest for a few minutes. It did not change anything.

saif-ellafi commented 4 years ago

Oh wow. This is terrible! Can I suggest to completely remove tag 2.0, branches and releases?

Is it possible you get help from official MSI people?

TauAkiou commented 4 years ago

It might be worth reaching out to MSI to secure a solution to this. There has to be a way to factory-reset the lighting controller, and if there isn't, Steelseries really messed up when it came to the software design of this keyboard.

I believe the issue was caused because the above configuration contains no transitions, and for whatever reason the Steelseries device completely locks whenever it receives an effect without one. This is something I missed during testing, as I never loaded a configuration file that had no transitions attached to it. The fix is simple; make sure that an effect block has at least one transition in it before sending it to the controller.

That also being said, the code broke two keyboards, and I'm still quite embarrassed that I didn't catch that during my own testing.

rodolpheh commented 4 years ago

I do not know anyone from MSI, and i'm rather skeptical about them helping us. Plus, isn't SteelSeries responsible for the fimware and protocol ?

Worst case, my laptop is still under warranty, but i'd rather spend one month trying to make the keyboard work than trying to explain to my reseller (mainstream IT reseller) what happens and why they should exchange it and not reinstall/format Windows & SteelSeriesEngine (as they will probably suggest even if I don't have any Windows). But this doesn't solve @Askannz issue, and, in fact, doesn't solve the issue at all.

rodolpheh commented 4 years ago

I anyone opens a ticket on MSI website, please add the link to the ticket to this issue so we can track it

TauAkiou commented 4 years ago

I did a cursory dump of both packets to check for alignment or other issues; it seems as though my theory is correct: An effect with no transitions will break the controller.

To test this, I took the broken configuration and ran it through a slightly modified version of the utility that dumped the packet to a file rather then sending it to the keyboard. There were no significant alignment differences and everything was in the right place. The only difference is that the effect blocks and millisecond fields were entirely empty.

I'm going to push a change to my fork that prevents users from pushing transition-less effects to the keyboard, and add a big warning to the documentation to never push a transition-less effect lest this happen.

Our next step should be seeing how we can get @rodolpheh and @Askannz 's keyboards working again.

TauAkiou commented 4 years ago

This is for a standard Steelseries Mouse/Keyboard, but these instructions might help, since they are in a sense the same kind of device:

https://linustechtips.com/main/topic/172028-how-to-fix-steelseries-firmware/

rodolpheh commented 4 years ago

I don't have a Windows partition but I will try to mess around with a VM. I'll let you know if I notice any changes.

Askannz commented 4 years ago

FWIW I never could get SteelSeries to detect the keyboard from within a VM.

I tried messing around with the device manager in Windows, but no amount of uninstalling the device made SteelSeries prompt a firmware update.

I also tried going into the hidden MSI BIOS but no "magic reset button" jumped out. There's stuff related to USB but I don't understand half of it.

Askannz commented 4 years ago

Uh, whatever I did on the Windows side seems to have made things worse on Linux: https://pastebin.com/5QmHH6UM

TauAkiou commented 4 years ago

Fucking yikes.

This is a pretty serious firmware issue on Steelseries' side if a malformed command is completely hosing the controller like this.

I have a trouble ticket with Steelseries open over how to perform a hard reset or how to reupload the firmware. If there's any information you can suggest I bring up, post it here. I'm still debating over whether or not to link them to this project/issue.

I'm also debating dropping all of the effects code and simply leaving the documentation. If a malformed command to the effects engine can break the interface with no chance at recovery, then there are likely to be more issues that can cause similar problems.

Askannz commented 4 years ago

The USB errors wouldn't go away even after a reboot, but I left the laptop off overnight and now it's back to the state where it's silently accepting HID commands. Weird. Makes me think there may be a timed overcurrent protection tripping off somewhere.

I have a trouble ticket with Steelseries open over how to perform a hard reset or how to reupload the firmware. If there's any information you can suggest I bring up, post it here. I'm still debating over whether or not to link them to this project/issue.

Thanks, hopefully they'll respond. I mean, if we can hose the controller from Linux like this, then a borked update to their own software could potentially do it too, so maybe they'll take this seriously.

TauAkiou commented 4 years ago

This is a shot in the dark, but if the device seems like it's accepting commands, at the least, would it be possible to send it a set of fresh effects and reset the internal tables that way?

The effects code assigns effect numbers sequentially; if we send a packet to assign all keys to a new, working effect or force-overwrite effect slot 0 with a fresh E0, would that be enough to force the keyboard back into a operable state?

An effects file like:

all effect slot1

effect slot0
  start ffffff
  trans 000000 250
  trans ffffff 350
end

effect slot1
   start ff0000
   trans 00ff00 500
   trans 0000ff 500
   trans ff0000 650
end
Askannz commented 4 years ago

That's a good idea, but it didn't work sadly. I tried your config but the keyboard just ate it up and nothing changed. Same thing when adding more effect slots.

As a last resort I even tried the original config that caused this mess (can't make it worse right ?), but that didn't do anything either, not even an error like last time.

rodolpheh commented 4 years ago

I also tried everything @Askannz tried, with no effects. Just to know, what is the model of your computer @Askannz ? Also, what are you using to debug the communication (I saw that there is a script to read from the HID but I'm not sure I'm using it right) ?

ErrorErrorError commented 4 years ago

The effects code assigns effect numbers sequentially; if we send a packet to assign all keys to a new, working effect or force-overwrite effect slot 0 with a fresh E0, would that be enough to force the keyboard back into a operable state?

Are you referring to the commands that Steelseries sends when starting the app? I noticed that it those packets it seems to reset the effects packet, but if that was the case, then why doesn't the keyboard lock up like @Askannz and @rodolpheh?

TauAkiou commented 4 years ago

@ErrorErrorError

@Askannz and @rodolpheh ran the configuration file listed at the top of the list that had no transitions attached to it. This, for whatever reason, appears to be interfering with any commands being sent.

My theory is that the controller firmware doesn't properly handle 0ms length effects or 0ms transitions properly. A zero-ms long effect or an effect with no transition blocks (this might also be a problem with 0ms transitions as well) could be causing a logic error in the firmware that crashes the controller.

Since MSI/Steeleseries made the smart decision to store the previously loaded effect/key blocks in NVRAM, the controller will reload the bad NVRAM block and lock up the controller again.

ErrorErrorError commented 4 years ago

Dang, well I guess I am still confused as to why the SteelSeries Engine sends effects package with no transitions or anything, except for the 0b command at the beginning and the effect id.

Do you think SteelSeries is resetting the effects corresponding to the ID? I uploaded my pcap file when I was testing the effects package. All the numbers before 79 is what the Engine sent when it was barely starting up. Anything beyond 79 is the effects I messed with but that part is irrelevant. colorshift-dump.pcap.zip

Edit: Wait, have they tried just setting all the keys as Steady and reset all the effects package? Or like they literally cannot send any keys?

Askannz commented 4 years ago

@rodolpheh

Just to know, what is the model of your computer @Askannz ?

MSI GE63VR-7RE

Also, what are you using to debug the communication (I saw that there is a script to read from the HID but I'm not sure I'm using it right) ?

I used Wireshark on Windows to dump the binary packets sent by SteelSeriesEngine. If you're referring to https://github.com/Askannz/msi-perkeyrgb/tree/master/documentation/utils, those are just convenience scripts for visualizing the dumped packets, not capturing them.

TauAkiou commented 4 years ago

I'm still communicating with SteelSeries over this; no updates thus far, but I hope we can find a solution and fix the broken keyboards.

I'm going to continue work on the Effects engine, but I'm keeping it in my fork until we can fix these keyboards and make sure it doesn't happen again. I'm still holding hope that we can find a solution to this.

rodolpheh commented 4 years ago

Hello @TauAkiou, did you get any news from SteelSeries so far ?

TauAkiou commented 4 years ago

I'm still in contact with them, but things are going very slowly. I've mentioned the project, and sent them a dump of the broken 0b packet, but still no mention of a fix or firmware patch for this (fairly critical) issue.

I'm going to hopefully keep the dialog open as long as possible, and I'm still hoping something comes of this.

andreasomaini commented 4 years ago

Hi, SteelSeries Apex M800 owner here, looking at this repository I think the protocol of MSI's keyboards and mine is similar, but I still have not looked deeply enough to be sure.

Anyway, the M800 has a special binary command that refreshes it, this is the dump from Wireshark and you can have a look at my implementation here.

I don't know at all if it can help, but since it broke somebody's keyboard I think it is worth a try.

rodolpheh commented 4 years ago

Thank you @andreasomaini for your insights. Did this special binary command helped in recovering your user's keyboard ?

I don't know anything about the protocol implemented by msi-perkeyrgb but maybe @Askannz could wrap up a quick version to try this command ? I'm willing to try it out if I can get my hand on a version implementing it.

ErrorErrorError commented 4 years ago

Hi, SteelSeries Apex M800 owner here, looking at this repository I think the protocol of MSI's keyboards and mine is similar, but I still have not looked deeply enough to be sure.

They're similar but the MSI Laptop Keyboards sends packets as size 524 and for your keyboard it sends 514, which I don't know if that may cause problems if you send a 524 packet to your keyboard or sending 514 packet size to an MSI Laptop Keyboard.

Anyway, the M800 has a special binary command that refreshes it,

Looks like the special binary command just resets the current state of the effect to the new effect. Unfortunately that command wouldn't fix the issue with the MSI keyboard being bricked.

andreasomaini commented 4 years ago

if you send a 524 packet to your keyboard

Mine just happens to ignore the 515th bytes on, sending a lot ( I just tried like 40 kb ) does nothing.

Anyway, during the testing on mine I got it to brick a few time, and the only way I managed to make it work again was unplugging it. I understand this would be a drastic solution, but maybe is the only solution for those with a bricked keyboard

ErrorErrorError commented 4 years ago

Anyway, during the testing on mine I got it to brick a few time, and the only way I managed to make it work again was unplugging it. I understand this would be a drastic solution, but maybe is the only solution for those with a bricked keyboard

As soon as you bricked your keyboard you would unplug it immediately? Or would you wait a while? Because I've bricked my keyboard once but I immediately shut my computer off and when it restarted it wasn't bricked anymore but it went to the previous effect before getting bricked.

Since MSI/Steeleseries made the smart decision to store the previously loaded effect/key blocks in NVRAM, the controller will reload the bad NVRAM block and lock up the controller again.

It most likely deals with this but in my case I shut it off immediately before letting it write to the nvram.

andreasomaini commented 4 years ago

As soon as you bricked your keyboard you would unplug it immediately?

I have never tried waiting, I always unplugged within a minute (typically within a few seconds), but I guess in my case time doesn't matter

TauAkiou commented 4 years ago

Perhaps disconnecting the battery for a few minutes might help? You'd have to open the machine to try, but if we need a 'true disconnect', removing the battery is really the only option we have.

rodolpheh commented 4 years ago

Is it time to assume that SteelSeries or MSI just doesn't give a flying f*ck, and will never ever do anything to try to tackle this issue ?

Also, should we then assume that their hardware is just badly designed and that it is not an issue for them as they expect it to run in a fully proprietary environment on which they have full control to avoid triggering their bugs ?

10/10 will avoid in the future then...

TauAkiou commented 4 years ago

I'm certainly not impressed with Steelseries or MSI; there should definitely be a way to clear the keyboard NVRAM from userspace and restart from scratch.

They (steelseries) told me to go contact MSI about finding a solution; I haven't started on that as of yet since I've run into some personal problems. I'll see about pestering them about it this month.

Honestly, my GS65 has never worked quite right under Linux; I certainly wouldn't recommend MSI laptops if your intention is running Linux on them.

rodolpheh commented 4 years ago

I did not have much compatibility issues in Linux with my MSI. I'm more concerned on the support/community aspect of MSI. Of course I did not expect any opening to the opensource community but their support seems to basically follow a format/reinstall pattern due to the obscurity and secrets surrounding the inner working of their systems. Again, I did not expect much from a mainstream computer manufacturer but given the price of their computer, I would love to get more than a black box.

This said, I'd like to thank you for your support, and wish you the best.

jqassar commented 4 years ago

I'm certainly not impressed with Steelseries or MSI; there should definitely be a way to clear the keyboard NVRAM from userspace and restart from scratch.

They (steelseries) told me to go contact MSI about finding a solution; I haven't started on that as of yet since I've run into some personal problems. I'll see about pestering them about it this month.

Honestly, my GS65 has never worked quite right under Linux; I certainly wouldn't recommend MSI laptops if your intention is running Linux on them.

Not to hijack, but as I was about to install Linux on precisely that model (and looking at this repo for lighting support), what kinds of issues are you running into? Are they significant enough that it's not worth the trouble to go native Linux?

hugglesfox commented 4 years ago

@jqassar I'm currently writing this on an MSI GS65 Stealth 8se running Ubuntu 19.10. Keyboard lighting control is not an issue thanks to this repo and I'm using isw in order to control fan speed (although that is rarely used as I just let it do it automatically). The main issue I have had is the fact that you can't suspend the laptop without it hard locking the wifi and the only way to fix it is a reboot. Personally this isn't too much of an issue thanks to fast boot times and hibernation but is something to be aware of. The arch wiki has a page detailing all the flaws and potential work arounds for the laptop on linux.

TauAkiou commented 4 years ago

I recently sent out an e-mail to MSI, and they haven't returned anything to me (Even an acknowledgement, though this could be due to COVID-19) so I'm very skeptical as to whether or not I'm even going to get anything back from them. As for Steelseries, they basically told me that "they would look into it" and told me to contact MSI.

I don't know if we're ever going to reach a conclusion for the broken keyboards, which is beyond frustrating.

I'm personally considering writing my GS65 off, and getting something else that has better known Linux compatibility. I've got plenty of strange issues from the Wifi stopping working after sleep, to (more frustratingly) the trackpad stopping after sleep. The USB3 ports also don't work on mine anymore, and with the newer NVIDIA drivers I've also lost access to the HDMI and Displayport jacks (thanks to the lack of Reverse PRIME.)

rodolpheh commented 4 years ago

This is unfortunate and pretty much infuriating... Typical response from level 1 support... @TauAkiou do you have an address for the ticket ? Is it possible that we can bump the ticket and ask for a second look from the customer service ?

As for unplugging the battery, I hesitate on doing so since my laptop is still under warranty (I think), @Askannz did you try it yourself or are you in the same situation as me ?

Askannz commented 4 years ago

@rodolpheh Yes, I remember unplugging the battery and even pressing the reset switch on the back, but to no effect.

TauAkiou commented 4 years ago

I've tried contacting Steelseries, I've tried sending a message to MSI. They don't seem to be willing to help. If anyone else wants to try and help me pester them, I'd more then appreciate the help but at this point I'm practically forced to write it off.

ErrorErrorError commented 3 years ago

I've tried contacting Steelseries, I've tried sending a message to MSI. They don't seem to be willing to help. If anyone else wants to try and help me pester them, I'd more then appreciate the help but at this point I'm practically forced to write it off.

Guess your best bet is to replace the keyboard but even then I am not sure if the keyboard is attached to the top panel where the keyboard rests.

rodolpheh commented 3 years ago

I'm pretty sure every messages sent to them get lost in the first layer of helpdesk between suggestions of Windows reinstallation and propositions for RMA. Best bet would be to know someone from the inside.

NovHak-Linux commented 3 years ago

For what it's worth, I found this article on Steelseries' support page. So in our laptop case, it's possible that there is a reset procedure too. Of course, unplugging the keyboard can be tricky, but I suppose holding the keys while the system is rebooting could do the trick (the kb lights shut down for a few seconds).

That being said, we still don't know what key combination should be pressed. I've read someone saying that it can be reset from the Steelseries engine app, but maybe he's wrong, I don't know (I'm on Linux and didn't install the app on Windows yet).

Bonfi97 commented 3 years ago

Has anyone tried reflashing the EC firmware using the tool that you can download from the msi page of your laptop? (I would try it myself, but I haven't brick my rgb and I'm not willing to try)

rodolpheh commented 3 years ago

@Bonfi97 Unfortunately MSI only gives a Windows software for this. I can maybe try with FreeDOS on a USB stick but I'm feeling pretty uncomfortable doing this (afraid of more bricking).

I looked at @NovHak-Linux suggestion above and it seems that this only applies to Apex keyboards, I can't find anything about factory reset on integrated laptop keyboards. Of course there is a very high probability that there is a command or a key combination to factory reset the keyboard but neither MSI nor SteelSeries seems to be kind enough to share any of this...

I think my best bet is to pester MSI or SteelSeries, maybe we'll land on another person willing to actually communicate with the engineers and find a solution.

NovHak-Linux commented 3 years ago

@Bonfi97 Reflashing the EC firmware will likely have no effect, since the keyboard seems to maintain its own logic by itself. I've unplugged the battery and reset the EC, and my keyboard still retained its configuration (btw mine isn't bricked either).

@rodolpheh I don't know if all SteelSeries laptop keyboards are the same on this respect, but mine has a SteelSeries icon on the F9 key, that I think is supposed to work on Windows by pressing Fn+F9. Maybe holding those keys while the machine (hence the keyboard) is booting would do something ? Just a wild guess... (I'm lazy to try myself since in case that works, i would have to reconfigure my keyboard)