UltimateHackingKeyboard / firmware

Ultimate Hacking Keyboard firmware
Other
418 stars 66 forks source link

Freeze bug #172

Closed mondalaci closed 5 years ago

mondalaci commented 6 years ago

Firmware versions larger than 8.2.5 sometimes freeze. When it happens, your UHK gets unresponsive all of a sudden. Your UHK may not be affected at all, or it can take hours, days or even weeks for the freeze to hit.

The reason we haven't yet fixed this bug is because it happens so rarely, and only on certain UHKs. We need your feedback, so if you're reading this please update to firmware 8.4.5 via Agent.

If your UHK freezes, run the following script and paste its output to provide diagnostics information, so that we can hopefully figure out what the hell is going on. (You'll need another keyboard to do this if your UHK froze).

Pull the Agent repo and build it. Then run packages/usb/get-debug-info.js. It'll output something like this:

I2cWatchdog:919988055 | I2cSlave:306651339 | I2cWatch:950237 | I2cRecovery:0 | KeyMatrix:783778471 | UsbReport:157778114 | Time:101366130 | UsbGeneric:237 | UsbBasic:41632 | UsbMedia:252 | UsbSystem:4 | UsbMouse:110380
I2cWatchdog:919997815 | I2cSlave:306654592 | I2cWatch:950247 | I2cRecovery:0 | KeyMatrix:783783350 | UsbReport:157782993 | Time:101367237 | UsbGeneric:238 | UsbBasic:41632 | UsbMedia:252 | UsbSystem:4 | UsbMouse:110380
I2cWatchdog:920007573 | I2cSlave:306657846 | I2cWatch:950257 | I2cRecovery:0 | KeyMatrix:783788300 | UsbReport:157787816 | Time:101368361 | UsbGeneric:239 | UsbBasic:41640 | UsbMedia:252 | UsbSystem:4 | UsbMouse:110380
I2cWatchdog:920017331 | I2cSlave:306661098 | I2cWatch:950267 | I2cRecovery:0 | KeyMatrix:783793238 | UsbReport:157792640 | Time:101369482 | UsbGeneric:240 | UsbBasic:41644 | UsbMedia:252 | UsbSystem:4 | UsbMouse:110380
I2cWatchdog:920027090 | I2cSlave:306664351 | I2cWatch:950277 | I2cRecovery:0 | KeyMatrix:783798182 | UsbReport:157797462 | Time:101370593 | UsbGeneric:241 | UsbBasic:41644 | UsbMedia:256 | UsbSystem:4 | UsbMouse:110380
I2cWatchdog:920036849 | I2cSlave:306667604 | I2cWatch:950287 | I2cRecovery:0 | KeyMatrix:783803092 | UsbReport:157802316 | Time:101371711 | UsbGeneric:242 | UsbBasic:41644 | UsbMedia:258 | UsbSystem:4 | UsbMouse:110380
I2cWatchdog:920046607 | I2cSlave:306670858 | I2cWatch:950297 | I2cRecovery:0 | KeyMatrix:783808105 | UsbReport:157807084 | Time:101372838 | UsbGeneric:243 | UsbBasic:41644 | UsbMedia:269 | UsbSystem:4 | UsbMouse:110380
I2cWatchdog:920056367 | I2cSlave:306674110 | I2cWatch:950307 | I2cRecovery:0 | KeyMatrix:783812989 | UsbReport:157811962 | Time:101373962 | UsbGeneric:244 | UsbBasic:41644 | UsbMedia:270 | UsbSystem:4 | UsbMouse:110380
I2cWatchdog:920066125 | I2cSlave:306677364 | I2cWatch:950317 | I2cRecovery:0 | KeyMatrix:783821101 | UsbReport:157814081 | Time:101375047 | UsbGeneric:245 | UsbBasic:41644 | UsbMedia:270 | UsbSystem:4 | UsbMouse:110531
I2cWatchdog:920075884 | I2cSlave:306680616 | I2cWatch:950327 | I2cRecovery:0 | KeyMatrix:783831414 | UsbReport:157814333 | Time:101376086 | UsbGeneric:246 | UsbBasic:41644 | UsbMedia:270 | UsbSystem:4 | UsbMouse:110783
I2cWatchdog:920085643 | I2cSlave:306683869 | I2cWatch:950337 | I2cRecovery:0 | KeyMatrix:783838898 | UsbReport:157816997 | Time:101377160 | UsbGeneric:247 | UsbBasic:41644 | UsbMedia:270 | UsbSystem:4 | UsbMouse:110904

One line will be displayed per second. Most variables should be incrementing all the time automatically, but you'll have to manually increment some. Make sure to increment UsbBasic by typing regular scancodes on your UHK. Make sure to increment UsbMedia by typing media scancodes (for example volume up and down) on your UHK. Make sure to increment UsbMouse by mousing with your UHK. As for UsbSystem, you can increment it by typing system scancodes such as sleep, but this one is not terribly important. Every other variable should be incrementing automatically all the time with the exception of I2cRecovery which may spontaneously increment once in a while.

luteijn commented 6 years ago

Agree with @irwand that the fix is somewhat hacky and a bit contrary to what you stated before that you don't want to more or less blindly reset the keyboard with a watchdog to recover from freezes.

That being said, I think this is still a great step forward, and a workaround for (a sub set of) the freezes is more than welcome as a step 1 - glad you changed your mind on that :). It will hopefully result in less worry about this aspect and freeing up capacity to work on other things. Hope that it will also lead to eventually finding the underlying reasons for things getting out of sync in the first place, and perhaps finding a more 'soft' way to recover from them. Although as I mentioned before, sometimes just restarting when a protocol fails, is the best approach.

mondalaci commented 6 years ago

Admittedly, my understanding of the core issue is limited. However given that it only occurs once in a couple of months, I think it's a miracle that I could even implement a reliable workaround. It'd likely be ridiculously expensive to get to the bottom of this, and if my fix turns out to be as good as I expect it to be, we won't delve any deeper.

@joshginter Your freezes clearly differ from my freezes. When your UHK locks up, the USB and I2C periperals of your right half MCU seem to be down. Chances are your whole right half MCU locks up. I'll look into whether this can be solved in some way. Maybe a watchdog can reset the MCU in such a case, although even that wouldn't be ideal because your UHK would reenumerate over USB which is a noticable interruption.

Not sure how to test static discharge, especially in a safe way. We put our fair share of ESD suppressors into the UHK, but one keyboard died during EMC testing regardless. Do the freezes usually happen when you touch the male or female pogo pin connectors, or can you associate the freezes to any specific event?

joshginter commented 6 years ago

The two freezes I've documented on the other thread have happened right as I was sitting down (which is a situation likely to cause static build up). And typing my password. My hands were on the keys not touching the pogo pins. I think my thumb tends to sit pretty close to the "edge" of the space and mod keys. Which puts it close to the pogo pins, but definitely not touching them.

asgeir commented 6 years ago

Have you ruled race conditions out?

From a quick glance at the documentation it looks like USB callbacks are executed as interrupt handlers. So what happens when i.e. this handler https://github.com/UltimateHackingKeyboard/firmware/blob/master/right/src/usb_interfaces/usb_interface_system_keyboard.c#L46 preempts the main loop while it's executing something like https://github.com/UltimateHackingKeyboard/firmware/blob/master/right/src/usb_report_updater.c#L470 ?

It seems to me that the final value of UsbReportUpdateSemaphore will be undefined based on which instruction got preempted by the interrupt. Assuming that this compiles to LOAD, OR, STORE an interrupt can happen after the LOAD but before the STORE and invalidate the result from the OR operation.

Wouldn't it be safer to use atomic reads/writes or use two variables to handle the signaling? One which the interrupt handlers only read and the main loop only writes, and another which the main loop only reads and the interrupt handlers only write.

mondalaci commented 6 years ago

@asgeir Honestly, I'm not sure. What's your take on this @eltang?

eltang commented 6 years ago

@asgeir I think you might be correct. If a USB report is sent particularly quickly and the interrupt happens at some point before the STORE, then UsbReportUpdateSemaphore will be stuck with a non-zero value. To test this theory, I'll introduce some artificial delays into the code which should induce freezing if a race condition is indeed happening.

eltang commented 6 years ago

I added delays of various lengths but was unable to trigger any freezes. In addition, the symptoms that a race condition would cause are inconsistent with those described in #189. Given these, it's unlikely that a race condition is the problem. My current theory is that the semaphore getting stuck on a non-zero value is merely a symptom of a bug that happens deeper in the USB stack, meaning that 8b69a25 may not have any effect in the event of a real freeze.

kbdh commented 6 years ago

I got my UHK 2 days ago (mini batch 34) and updated the firmware to 8.5.2. My system is Debian 9.

Today there were 2 freezes. In both cases Agent wasn't able to load the keyboard configuration, it showed a message "Cannot write to HID device".

The first time it happened when I was typing on the left half of the UHK. The LED display went black. Pressing a key on either half didn't result in any action. After disconnecting and reconnecting to USB the UHK was working again.

The second time I was mousing when the freeze occured. The LED display was still showing the selected keymap and mouse layer. Pressing a key on either half didn't result in any action. The LED display didn't change when pressing MOD, Fn or Mouse. After disconnecting and reconnecting the left keyboard half the UHK was working again.

Is there anything else I could try when this happens again?

(I am still reading through all the info and instructions and so far wasn't successful to get get-debug-info.js working - it failed with "TypeError: Cannot read property 'write' of null").

mondalaci commented 6 years ago

Thanks for the detailed report, and sorry for the freezes!

Please try to reproduce the freezes with firmware 8.2.5.

The "cannot read..." and "cannot write" messages were displayed either because the USB generic HID interface of your UHK was frozen, or because you didn't update the udev rule that will be shipped with the upcoming Agent version.

Please keep me in the loop regarding future freezes. It's worth to reconnect the halves first to see if your UHK recovers, then try Agent to see if it works, and if all else fails, reconnect USB.

kbdh commented 6 years ago

Thanks for the hint about udev, that was indeed the reason for the "TypeError: Cannot read property 'write' of null".

I had another freeze yesterday, same as the first one after typing on the left keyboard half. This time I reconnected the halves instead of reconnecting USB and the UHK recovered.

Then I had another problem, still with firmware 8.5.2: the LED display was switched off and showed no reaction when pressing MOD, Fn or Mouse. It also didn't show the selected keymap anymore. Apart from that the UHK was still working. I was able to type, mouse or switch keymaps. get-debug-info.js showed the "TypeError: Cannot read property 'write' of null". After reconnecting the halves the LED was on again, but get-debug-info.js still showed the same error. Only after reconnecting USB get-debug-info.js was working again.

Today in the morning I installed firmware 8.2.5 and didn't have any freeze the whole day.

Edit 2018-10-18:

The problem with the LED display switched off (but the UHK working otherwise) occurred several times during the last days with firmware 8.5.2. I found out that after reconnecting the halves or even without reconnecting the halves Agent 1.2.11 from the releases page is able to find the UHK, but the Agent built from source is only able to find the UHK after reconnecting USB (reconnecting the halves is not sufficient). I tested this several times now and it was always the same. I think I was using tree e333022 first and am now using tree a4e3696 to build Agent.

mondalaci commented 6 years ago

Your case is quite strange. The fact that your UHK recovers after reconnecting the halves suggests that the left half freezes, not the right as it is usually the case.

Do you use the stock bridge cable that we provided?

kbdh commented 6 years ago

Yes, I am using the stock bridge cable.

mondalaci commented 6 years ago

Well, let's wait a couple days to see if your UHK freezes with 8.2.5. I don't have a better idea right now.

ManuelLevi commented 6 years ago

Hi everyone.

My keyboard frequently freezes.

Today, after freezing twice in an hour or two, I've updated the firmware to 8.5.2, the last version I could find to see if it would not freeze anymore. It still does.

This the best report I can give at the moment:

I was frequently changing between base, mouse and mod modes. I was browsing and I use a lot of browser shortcuts and the mouse feature on the UHK.

The LED indicators are activated when FN, MOD or MOUSE keys are pressed, in either side of the keyboard. That seems to be working.

Joining the keyboard together doesn't change the behavior.

UHK agent shows the message "Loading keyboard configuration...Hang tight!". It doesn't update after a couple minutes.

After disconnecting the keyboard, I got a message on the top right corner saying "could not read from HID device".

After connecting the keyboard again, the agent loads it in a second or two, and everything seems to be working fine again.

Hardware: Using the cable provided with the UHK (the only one I have); Running on a MacBook Pro with OS High Sierra 10.13.6.

mondalaci commented 6 years ago

Hi @ManuelLevi, thank you for your report, and sorry about this issue!

Which firmware version were you running before 8.5.2 with which you also experienced the freezes?

When your keyboard freezes, what happens when you simply reconnect the halves (but don't reconnect USB)? Does it unfreeze your keyboard?

ManuelLevi commented 6 years ago

Hi!

I was using the one it came with, 8.2.5 if I'm not mistaken.

That's what I meant with "Joining the keyboard together doesn't change the behavior.". Nothing changed when I reconnected the halves.

Seeing both halves have an effect on the LED's made me feel like it was the OS that decided to ignore the keyboard (maybe because it misbehaved before?).

Could this happen?

(I just had 2 other freezes this morning)

irwand commented 6 years ago

@ManuelLevi, seems that your "freezes" might just be because the USB stopped working, maybe related to https://github.com/UltimateHackingKeyboard/firmware/issues/189 ? I'm on Windows though.

kbdh commented 6 years ago

I have used firmware 8.2.5 for two days without any freeze, but I was missing the macro functionality. So I switched back to firmware 8.5.2 and will see if it freezes again. I will probably also test the UHK with another computer and maybe try different USB ports (if that makes sense?)

mondalaci commented 6 years ago

@irwand I agree that @ManuelLevi is probably also affected by #189. I will follow up there.

@kbdh Testing your UHK with another computer and with different USB ports does make sense. Interested about your further findings.

Worst case scenario: We may end up replacing all the affected keyboards. According to all the freeze bug related reports we received, the failure rate is about 0.3% which we can definitely handle.

This is quite an umbrella issue, so I think I will close it in the near future. Both #189 and @kbdh's issue differ from what I have fixed. The root cause may share similarities, but still.

mondalaci commented 6 years ago

@Jopie01 @joshginter @kbdh @irwand Please shoot an email to support@ultimatehackingkeyboard.com guys, and include your GitHub user name and order id. I want to correlate all the available information on GitHub and our manufacturing database, and possibly get one on one with you.

jwr commented 6 years ago

A data point: I had freezes with 8.4.5 about once a week. I upgraded to 8.5.2 6 days ago and just had my first freeze, but this time it seems different: only the left half froze. The right half was working normally.

I was able to unplug and re-plug the left half and things were back to normal.

I find it slightly suspicious that these freezes occur after several days. I should start taking notes of the exact time.

mondalaci commented 5 years ago

@jwr Damn! Your report made me realize that I screwed up the I2C watchdog of the left keyboard half starting from firmware 8.4.3. Thank you so much!

Anyone who encountered with freezes which were solved by reconnecting the keyboard halves, please upgrade to the newly released firmware 8.5.3. Chances are it'll fix your UHK.

kbdh commented 5 years ago

So far the only problem which still occurs with my UHK on firmware 8.5.3 is the one I mentioned earlier where the LED display turns off. I have now created a separate issue for this.

Apart from this no more freezes with my UHK since upgrading to firmware 8.5.3. five days ago.

rbrt86 commented 5 years ago

I'm afraid I also have the freeze bug since the beginning. Updating to 8.5.3 stopped it for a day, yesterday it came back once and now I have it again almost every 10 minutes. This also happended after upgrading to 8.4.5, first day was fine, but after that more issues.

Unplugging and replugging the keyboard helps, I just plug it in the front USB port and about every 10 minutes when it stops responding I have to re-insert the usb cable

Note: I'm using it for work, connected to the client pc (w10) on which I do not have Admin access. Not sure if it's related to the client, maybe some USB scanning tool interferes with the keyboard or something?

irwand commented 5 years ago

@rbrt86 : when it freezes, does any of the LED respond to keypress? or is it completely dead?

mondalaci commented 5 years ago

@rbrt86 Very sorry to hear this! What do you mean by "USB scanning tool"? Do you have a specific hardware attached or software running?

rbrt86 commented 5 years ago

@irwand having the freezebug as we speak, I can press mod, mouse or FN and all those leds light up. The keystrokes however are not registered on the pc, I can type whatever I want and nothing happens. I attached a dumb keyboard to the second front USB port and that works fine. Putting the who halves together doesnt fix the issue, reconnecting the usb temporarily is the only workaround I have right now.

@mondalaci I connected the UHK to a workstation from the client which handles sensitive data. I do not have Admin access and I do not know if there are security measures running in the background to check for USB storage devices, I can imagine they run some sort of audit tooling somehow. No specific hardware is attached, just the keyboard and an optical mouse. I've checked windows Device Manager, it just shows up as an HID-keyboard in there and says it is functioning properly. Reconnecting the keyboard doesnt show any changes in device manager (it disappears obviously and then shows up again the same). Is there any additional debugging I can do?

mondalaci commented 5 years ago

@rbrt86 After the freeze kicks in, what happens if you reconnect the halves? Does it unfreeze your UHK?

irwand commented 5 years ago

@rbrt86 it sounds like your problem is different from this freeze issue. Your problem seems similar to issue I raised here: https://github.com/UltimateHackingKeyboard/firmware/issues/189 You might want to try plugging the keyboard to a usb hub instead of straight to the computer, and see if that helps any. It'll be a good data point.

jwr commented 5 years ago

Just wanted to report that since the upgrade to 8.5.3 on 2018-10-19 I did not see any freezes or issues (writing this on 2018-10-31). The problem that affected me seems to have been fixed. :-)

mondalaci commented 5 years ago

@jwr Very glad to hear it! :) Thanks for your feedback!

jfieber commented 5 years ago

Installed 8.5.3 when it was posted. Nothing that could be construed as a freeze since before or after the update. (Order 39566 for reference, is there a serial number on these that I haven't noticed?)

mondalaci commented 5 years ago

@jfieber There's a serial number label on the PCB, but I don't advise disassembling any UHKs unless needed.

richrd commented 5 years ago

Today I encountered my first issue with the UHK, the left half froze, but the right half still works fine as we speak. I've had my keyboard for something like two months now.

Unfortunately the debug script didn't find my UHK and returned null instead. This is the output I got:

dev 835 / 8501 : usage_page:b1f8, usage:17f
dev 45e / 39 : usage_page:2e32, usage:2f31
dev 1d50 / 6122 : usage_page:2d31, usage:2e32
dev 1d50 / 6122 : usage_page:2d31, usage:2e32
dev 1d50 / 6122 : usage_page:2d31, usage:2e32
dev 1d50 / 6122 : usage_page:2e32, usage:2f34
dev 835 / 8502 : usage_page:2d31, usage:2e32
dev 5ac / 8290 : usage_page:2d31, usage:2e32
dev 5ac / 8290 : usage_page:2e32, usage:2f35
dev 5ac / 273 : usage_page:2e32, usage:2f35
dev 5ac / 273 : usage_page:eca0, usage:310
dev 5ac / 273 : usage_page:3a35, usage:2e31
dev 5ac / 273 : usage_page:3a35, usage:2e31
/home/richard/code/oss/agent/packages/usb/get-debug-info.js:20
    device.write(uhk.getTransferData(payload));
           ^

TypeError: Cannot read property 'write' of null
    at getDebugInfo (/home/richard/code/oss/agent/packages/usb/get-debug-info.js:20:12)
    at Object.<anonymous> (/home/richard/code/oss/agent/packages/usb/get-debug-info.js:66:1)
    at Module._compile (module.js:653:30)
    at Object.Module._extensions..js (module.js:664:10)
    at Module.load (module.js:566:32)
    at tryModuleLoad (module.js:506:12)
    at Function.Module._load (module.js:498:3)
    at Function.Module.runMain (module.js:694:10)
    at startup (bootstrap_node.js:204:16)
    at bootstrap_node.js:625:3

Let me know if there's any other info I can provide.

mondalaci commented 5 years ago

@richrd Please update your UHK to the latest 8.5.3 firmware which should fix the issue.

Nnarol commented 5 years ago

Hi!

I have some sort of freezing bug as well. It started in 2 days after getting the UHK. I used it with 2 laptops successfully, but when I tried to use it with a 3rd one, it froze right away.

The firmware version for both halves of the keyboard is 8.5.3 according to the UHK Agent. The laptop it does not work with is a Lenovo y510p.

The keyboard was plugged in right before the PC booted, and the LED-s showed garbled characters during the GRUB bootloader screen. The CapsLock indicator and the triangle (whatever it stands for) was lit up. I tried to use the arrow keys in the bootloader menu to select Windows 8.1. The menu has a countdown, which stopped immediately after I pressed Mod + j (my "Down" arrow), meaning that the PC recognized the fact that a key was pressed, although nothing else happened, the GRUB menu selection didn't move.

After bootup, no key seemed to work. The LEDs were still garbled, and turned blank after a short while. I plugged the UHK back into the laptop it previously worked with, but the same thing happened. Then I pulled the cable out and plugged it back again after rebooting the laptop, and ever since it works again with that laptop, but never the one I encountered the issue with.

Plugging the keyboard back into the problematic laptop, the LEDs were always completely dark from that moment on, and sometimes only the right keyboard half worked, but mostly neither. The exact symptom was that at the time of plugging in, the LEDs correctly showed the default key map, and turned dark after pressing the first key.

All of the above observations were made with the keyboard in split mode. With its halves joined, the LEDs light up when plugged into the problematic laptop, both halves work, although individual key presses sometimes don't register. The LEDs show anomalies when typing, some of the additional ones to those making up the letters flashing in and out, and all of them turning black after a while. While joined, the two halves were still connected by the cable.

The cable I used was not the original one, but a 3 meters long one I bought afterwards, since I found the original way too short. I only used the original cable for about 1-2 hours before one of my colleagues took pity on me and lent me a longer one; and on most machines, the keyboard has been working fine.

An issue that may or may not be unrelated though, was that the mouse layer sometimes turned into the fn layer for short bursts and then back again, when I used a 5m long cable of the same sort. I concluded that it must have been too long for the signal to arrive between the halves with proper intensity.

mondalaci commented 5 years ago

@Nnarol Please give a try to the original bridge cable that we provided with the problematic laptop, and let me know how it works.

Nnarol commented 5 years ago

@mondalaci I could not reproduce the issue with either the original nor the 3m long cable. The only difference I can think of was that the laptop was not plugged into the wall, and was running off a battery. It does cause a difference in how the touchpad makes the pointer move with this model, so it might not be irrelevant. The UHK is at my workplace now, I'll have to bring the charger of the laptop next time.

mondalaci commented 5 years ago

@nnarol Thanks for the follow up! Noisy power supply seems to be the cause of your issue.

We've been flashing firmware 8.5.3 for a while to UHKs, and haven't received any reports that are due this bug, so I consider this to be fixed, and closing it.

rbrt86 commented 5 years ago

@rbrt86 it sounds like your problem is different from this freeze issue. Your problem seems similar to issue I raised here: #189 You might want to try plugging the keyboard to a usb hub instead of straight to the computer, and see if that helps any. It'll be a good data point.

sorry for the late response, I've got myself a powered usb hub recently and I can confirm that the freezes no longer appear, apparently it was not related to the firmware!