UltimateHackingKeyboard / firmware

Ultimate Hacking Keyboard firmware
Other
418 stars 66 forks source link

Freeze bug #172

Closed mondalaci closed 5 years ago

mondalaci commented 6 years ago

Firmware versions larger than 8.2.5 sometimes freeze. When it happens, your UHK gets unresponsive all of a sudden. Your UHK may not be affected at all, or it can take hours, days or even weeks for the freeze to hit.

The reason we haven't yet fixed this bug is because it happens so rarely, and only on certain UHKs. We need your feedback, so if you're reading this please update to firmware 8.4.5 via Agent.

If your UHK freezes, run the following script and paste its output to provide diagnostics information, so that we can hopefully figure out what the hell is going on. (You'll need another keyboard to do this if your UHK froze).

Pull the Agent repo and build it. Then run packages/usb/get-debug-info.js. It'll output something like this:

I2cWatchdog:919988055 | I2cSlave:306651339 | I2cWatch:950237 | I2cRecovery:0 | KeyMatrix:783778471 | UsbReport:157778114 | Time:101366130 | UsbGeneric:237 | UsbBasic:41632 | UsbMedia:252 | UsbSystem:4 | UsbMouse:110380
I2cWatchdog:919997815 | I2cSlave:306654592 | I2cWatch:950247 | I2cRecovery:0 | KeyMatrix:783783350 | UsbReport:157782993 | Time:101367237 | UsbGeneric:238 | UsbBasic:41632 | UsbMedia:252 | UsbSystem:4 | UsbMouse:110380
I2cWatchdog:920007573 | I2cSlave:306657846 | I2cWatch:950257 | I2cRecovery:0 | KeyMatrix:783788300 | UsbReport:157787816 | Time:101368361 | UsbGeneric:239 | UsbBasic:41640 | UsbMedia:252 | UsbSystem:4 | UsbMouse:110380
I2cWatchdog:920017331 | I2cSlave:306661098 | I2cWatch:950267 | I2cRecovery:0 | KeyMatrix:783793238 | UsbReport:157792640 | Time:101369482 | UsbGeneric:240 | UsbBasic:41644 | UsbMedia:252 | UsbSystem:4 | UsbMouse:110380
I2cWatchdog:920027090 | I2cSlave:306664351 | I2cWatch:950277 | I2cRecovery:0 | KeyMatrix:783798182 | UsbReport:157797462 | Time:101370593 | UsbGeneric:241 | UsbBasic:41644 | UsbMedia:256 | UsbSystem:4 | UsbMouse:110380
I2cWatchdog:920036849 | I2cSlave:306667604 | I2cWatch:950287 | I2cRecovery:0 | KeyMatrix:783803092 | UsbReport:157802316 | Time:101371711 | UsbGeneric:242 | UsbBasic:41644 | UsbMedia:258 | UsbSystem:4 | UsbMouse:110380
I2cWatchdog:920046607 | I2cSlave:306670858 | I2cWatch:950297 | I2cRecovery:0 | KeyMatrix:783808105 | UsbReport:157807084 | Time:101372838 | UsbGeneric:243 | UsbBasic:41644 | UsbMedia:269 | UsbSystem:4 | UsbMouse:110380
I2cWatchdog:920056367 | I2cSlave:306674110 | I2cWatch:950307 | I2cRecovery:0 | KeyMatrix:783812989 | UsbReport:157811962 | Time:101373962 | UsbGeneric:244 | UsbBasic:41644 | UsbMedia:270 | UsbSystem:4 | UsbMouse:110380
I2cWatchdog:920066125 | I2cSlave:306677364 | I2cWatch:950317 | I2cRecovery:0 | KeyMatrix:783821101 | UsbReport:157814081 | Time:101375047 | UsbGeneric:245 | UsbBasic:41644 | UsbMedia:270 | UsbSystem:4 | UsbMouse:110531
I2cWatchdog:920075884 | I2cSlave:306680616 | I2cWatch:950327 | I2cRecovery:0 | KeyMatrix:783831414 | UsbReport:157814333 | Time:101376086 | UsbGeneric:246 | UsbBasic:41644 | UsbMedia:270 | UsbSystem:4 | UsbMouse:110783
I2cWatchdog:920085643 | I2cSlave:306683869 | I2cWatch:950337 | I2cRecovery:0 | KeyMatrix:783838898 | UsbReport:157816997 | Time:101377160 | UsbGeneric:247 | UsbBasic:41644 | UsbMedia:270 | UsbSystem:4 | UsbMouse:110904

One line will be displayed per second. Most variables should be incrementing all the time automatically, but you'll have to manually increment some. Make sure to increment UsbBasic by typing regular scancodes on your UHK. Make sure to increment UsbMedia by typing media scancodes (for example volume up and down) on your UHK. Make sure to increment UsbMouse by mousing with your UHK. As for UsbSystem, you can increment it by typing system scancodes such as sleep, but this one is not terribly important. Every other variable should be incrementing automatically all the time with the exception of I2cRecovery which may spontaneously increment once in a while.

joshginter commented 6 years ago

My freezes up to this point show no output from that script. It seems like the whole HID communication gets locked up. Do you expect that to be different for other freezes?

mondalaci commented 6 years ago

I'm not sure, but I hope so. Otherwise folks won't be able to supply us information of what goes on inside of their UHKs without a debug probe.

tigerkeys commented 6 years ago

Is there a way to run the debug script under windows? Agent under windows says Cannot write to HID device.

mondalaci commented 6 years ago

The debug script should work under every OS. Agent or the script says "Cannot write to HID device"?

tuxlifan commented 6 years ago

Could you provide the debug script as AppImage? To install npm just to provide debug info seems a bit inconvenient (not that I have yet experienced a freeze yet, but I only just upgraded the firmware...). Thanks!

joshginter commented 6 years ago

Agent gets a little red error "toast" in the app that has that message. It happened unexpectedly otherwise I'd have tried to take a screen shot.

mondalaci commented 6 years ago

@tuxlifan Thanks for the idea! We'll look into it.

@ert78gb Is there a way to add the agent/packages/usb/get-debug-info.js script to the AppImage, so that users can easily execute it without building Agent? I assume it's problematic because it depends on node and libraries, so I don't think there's an easy way.

@joshginter When does this toast error happens? Totally random or upon freezes? Do you remember the message?

ert78gb commented 6 years ago

@mondalaci We could add agent/packages/usb/get-debug-info.js to the AppImage or any package, but this js file is not enough. Need the agent/packages/usb/uhk.js and the whole node_modules folder.

The whole node-hid package and the usb handler codes deployed in the app.asar file. I think duplicate everything is not a good solution.

I suggest to create a command line argument --freeze-debug-info=true and when it passed to the agent app it will start to write the debug info into the console instead of start the ui.

mondalaci commented 6 years ago

@ert78gb Great idea! I think we should rather wait until the next monthly update though. I'll ask people to share their experience regarding the freeze. It may turn out that it's a non-issue on every but a handful of UHKs which are faulty in some ways.

joshginter commented 6 years ago

Sorry. I confused this case and the old one. I originally thought you were asking me about the error because I mentioned in in my previous report on my most recent failure. I only saw it when I unplugged the frozen UHK. Sorry for the confusion.

mondalaci commented 6 years ago

@joshginter No problem, appreciate your feedback!

Jopie01 commented 6 years ago

Got a freeze yet, but during an update of my Fedora Linux. The dmesg shows that the USB disconnected: usb 4-1: USB disconnect, device number 4

Then messages show up:

[ 8332.839477] usb 4-1: new full-speed USB device number 5 using ohci-pci
[ 8332.979473] usb 4-1: device descriptor read/64, error -62
[ 8333.234446] usb 4-1: device descriptor read/64, error -62
[ 8333.477542] usb 4-1: new full-speed USB device number 6 using ohci-pci
[ 8333.624464] usb 4-1: device descriptor read/64, error -62
[ 8333.871453] usb 4-1: device descriptor read/64, error -62
[ 8333.979526] usb usb4-port1: attempt power cycle
[ 8334.427439] usb 4-1: new full-speed USB device number 7 using ohci-pci
[ 8334.843416] usb 4-1: device not accepting address 7, error -62
[ 8334.979391] usb 4-1: new full-speed USB device number 8 using ohci-pci
[ 8335.396418] usb 4-1: device not accepting address 8, error -62
[ 8335.396474] usb usb4-port1: unable to enumerate USB device
[ 8337.580348] usb 4-1: new full-speed USB device number 9 using ohci-pci
[ 8337.721376] usb 4-1: device descriptor read/64, error -62
[ 8337.969353] usb 4-1: device descriptor read/64, error -62
[ 8338.211351] usb 4-1: new full-speed USB device number 10 using ohci-pci
[ 8338.352406] usb 4-1: device descriptor read/64, error -62
[ 8338.600371] usb 4-1: device descriptor read/64, error -62
[ 8338.707402] usb usb4-port1: attempt power cycle
[ 8339.155356] usb 4-1: new full-speed USB device number 11 using ohci-pci
[ 8339.571340] usb 4-1: device not accepting address 11, error -62
[ 8339.707345] usb 4-1: new full-speed USB device number 12 using ohci-pci
[ 8340.123351] usb 4-1: device not accepting address 12, error -62
[ 8340.123436] usb usb4-port1: unable to enumerate USB device
[ 8348.875138] usb 4-1: new full-speed USB device number 13 using ohci-pci

I disconnected the keyboard, reconnected it and is worked again.

mondalaci commented 6 years ago

@Jopie01 Which firmware, and since how long have you been running >8.2.5 firmwares?

Jopie01 commented 6 years ago

@mondalaci I'm using 8.4.5 since 30th of August and before that I have used 8.4.2. When I'm not using my system, I switch it completely off. This freeze happened during a normal system-update of Fedora. But I was also switching USB from one Arduino to another back and forth. I will keep an eye on it.

mondalaci commented 6 years ago

@Jopie01 Thanks! How often do these freezes generally occur?

@joshginter Do you happen to use a longer/different bridge cable than the one that was provided with your UHK?

joshginter commented 6 years ago

No. I use the included bridge cable.

mondalaci commented 6 years ago

@joshginter Thanks! Would you please shoot a picture of your usual setup? I'd like to see how much the cable is stretched and its close surroundings. Also, have you ever experienced chatter? Chatter and freezes might be related.

Jopie01 commented 6 years ago

@Jopie01 Thanks! How often do these freezes generally occur?

@mondalaci This was the first time in months. I can't say anything about the earlier freezes because I don't know the situation anymore :smile:

joshginter commented 6 years ago

image That is my setup at work. Sometimes my phone will be between the keyboard halves. My home setup is very similar but the keyboard halves are usually connected as I find "connected" to be preferred for gaming.

Since you're worried about the bridge cable I figured I'd include a picture of the bot I carry the keyboard in. It's a snug fit and pushes the cable up a bit but certainly not in a way that should be a problem for that kind of coiled cable. image

mondalaci commented 6 years ago

@Jopie01 A single freeze in months... That's exactly why it's been mission impossible to fix this bug so far. :)

@joshginter Your setup looks fine to me, and I don't think it should make freezes frequent, however I noticed you mentioned that sometimes your phone is in the middle of the halves. Is there a correlation between your phone being between the halves and the freezes? Also, how frequently do you experience key chatter, if ever?

joshginter commented 6 years ago

I don't think there is a correlation. About half of my freezes have come at home with the keyboard connected not split. The most recent two at work came when I was signing in so my phone probably wasn't there.

I've not experienced chatter (to my knowledge) but I'm a pretty haphazard typer. So it's possible its happened rarely and I just attributed it to my shitty typing. Definitely not as common a problem as some people see in the other thread.

mondalaci commented 6 years ago

@joshginter When you use your keyboard merged, is the bridge cable connected? If so, would you disconnect the bridge cable while using your keyboard merged and watch for freezes?

joshginter commented 6 years ago

Ya I keep it connected. I can disconnect see how it goes. Though I havent been using my Pc at home a ton the last few weeks.

mondalaci commented 6 years ago

@joshginter Thank you! Appreciate very much the loads of feedback you've been providing.

irwand commented 6 years ago

fwiw, I'm still on 8.2.5 firmware, and I've seen it freezes. It's random and rare, and I don't have any insight into pattern or anything. But I'm just saying that potentially this bug is older than 8.2.5.

mondalaci commented 6 years ago

@irwand How rare?

I've been testing the freeze, and I find it increasingly probable that it may not be an issue at all. So far, I reproduced it about once over months. This might be a hardware issue. For example, if a user touches the pogo pins and a static discharge happens, a voltage spike may freeze the CPU. That kind of stuff.

According to my knowledge, @joshginter is the only person who experiences the freeze frequently (not sure how often).

irwand commented 6 years ago

good question. It's hard to say.. although I haven't had it for the past 2 weeks.. Maybe more. So it's rare enough for me, that it's not quite yet a big issue. I wanted to keep watching my issue, but I thought I should say something to this thread. If you're pretty sure it's HW voltage spike issue, why not release the new shiny firmware builds? Then more people will try them out. :) That said, I'll install the new firmware, and will watch the behavior some more.

joshginter commented 6 years ago

I would not describe my freezes as "frequent", just more common than anyone else... :p. Initially they were more common, but have been getting rarer and rarer. It is not inconceivable that all of my freezes may have been a result of static electricity, I get up and sit down pretty often. Though, I don't ever really touch the pogo pins nor have I ever noticed or felt a "spark" of static discharge. So it couldn't have been a lot of static discharge ever causing an issue for me.

mondalaci commented 6 years ago

@irwand The voltage spike is just a wild guess on my part. We'll definitely be releasing new firmware builds in the future, although improvement in this department is not expected until reproducing the issue gets possible.

@joshginter Thanks for elaborating!

Frankly, I feel embarassed about these freezes, even if they are rare, but to this date I wasn't able to reproduce them more often than once in a couple of months which makes them fixing practically impossible. I'll keep trying nonetheless.

irwand commented 6 years ago

no worries man. As engineers we know how hard it is to debug intermittent issues. :) One suggestion I have is to always have debug mode that we can turn on from the release, without needing to build anything. This will lower the bar for people to get more diagnostics data for hard to reproduce bugs. Like the --freeze-debug-info=true idea, except change that to --debug-info=true, and always include that in all future builds.

mondalaci commented 6 years ago

@irwand Yeah, but the thing is, UHKs seem to be unresponsive via USB upon freezes, in which case Agent is unable to query debug data via USB, so it may be a futile effort to implement debug capabilities in Agent itself.

irwand commented 6 years ago

@mondalaci Sure, but I always appreciate debugging capabilities anyway in release code, so customers are more empowered to help debug issues. Another idea is to use Fn+some-key to trigger capture of firmware state and store that state somewhere to be pulled out later. Not sure if you have a writable storage you can use or not for this purpose. Or maybe use the keymap LED to communicate data to user, although admittedly that might be kind of painful, since there's only 3 chars.. :) Do we know at this point whether it's just USB that's down or the whole firmware locked up? Next time it freezes I should play with changing the keymaps to see if firmware is somewhat functional.

millerjl2 commented 6 years ago

When I've experienced the freezes the display was completely non-responsive. Nothing when I hit mod, fn, mouse, caps lock, etc. Once it locked when I was holding FN down and the fn LED was lit, but didn't respond upon being released, tapped again, or any other mod keys being pressed.

mondalaci commented 6 years ago

Please let me know guys whether you can connect to your UHK with Agent after it frooze. If so, the debug capability may be worthwhile.

joshginter commented 6 years ago

When the freeze happened for me agent gets stuck on the "loading configuration" section. Significantly more details can be found in the older issue.

mondalaci commented 6 years ago

@joshginter Thanks, but this should be enough. The USB communication got screwed in this case.

luteijn commented 6 years ago

Luckily, there are several secondary communication channels out of the keyboard, besides the usb. These could be quite useful to help debug crashed keyboard. If e.g. the Fn-led still reacts to the left Fn-key, then at least one knows the microprocessors on both halves are still running and i2c is still working. (left to read the key, send to right half, and then back to the left side to actually switch the led on. A simple script for people to follow to determine how frozen the keyboard is, would be useful too.

Ideally, I think there should be a special debug 'macro' bound to an unused Fn-layer key (on right half, so it can work without left side) and when a crash occurs, users can run the macro by pushing the key. Information could be dumped to the i2c (in a way easy to read out with a spare arduino or converted to rs232 with a board from aliexpress so people can make a set-up at home) or stored on keyboard PROM to be later read out by agent.

Once you have a documented and easy/offficial way to send (and maybe receive) internal state stuff out to i2c integrated for debug purposes with the keyboard, you also open up the hackability for people from maker/hackerspaces etc. to make their own hard and software expansions for the keyboard, freeing up capacity on your end to work on the core firmware and agent software, the official trackball etc. expansions and new hardware variants like a non-staggered layout, or more than 60% setups...

joshginter commented 6 years ago

In my freezes so far nothing works. None of the layer keys result in changes to the lcd. So my money would be on such an onboard debugging system not being useful for this particular problem.

luteijn commented 6 years ago

If 'nothing works' after a crash, then it would be even more important to be able to just continuously dump out debug state over a secondary channel and just log it all for later forensics. There seems to be an unused uart on the first two pins of the K22F. Would have been nice if the first 4 pins were broken out to a debug header, but maybe that is not too hard to hack in. Of course, that would just cover the HW side, still would need some software support in the form of some sort of monitor task...

mondalaci commented 6 years ago

I think I fixed the freeze bug! Please read on.

A freeze happened to me yesterday. See freeze-log.txt

I was mousing while the freeze kicked in. The LED display frooze. Pressing the keys didn't result in any actions. Reconnecting the halves didn't make a difference. Suspending and resuming my PC didn't make a difference either.

The keyboard and mouse USB interfaces were not working, but the generic USB interface was working, allowing me to capture diagnostics data. I2C communication, keyboard matrix scanning, and the timer were also working.

Based on the debug variables, keyboard matrix scanning was working which meant that the main loop was spinning. It also meant that UpdateUsbReports() got called from time to time, but UsbReportUpdateCounter wasn't incrementing which could only happen because UsbReportUpdateSemaphore was 0 which could only happen because one of the USB interfaces skipped a beat for some reason.

Then I exposed UsbReportUpdateSemaphore as an USB variable in firmware 8.5.0 via {get,set}-variable.js which made me able to artifically reproduce the bug by running set-variable.js usbReportSemaphore 1 and resolve it with ./set-variable.js usbReportSemaphore 0 without reconnecting my UHK.

At that point, despite of being able to resolve the bug via the command line, the firmware was still affected, so I made UsbReportUpdateSemaphore reset if it gets stuck for 100ms in firmware 8.5.1. Then I tried to artifically reproduce the freeze again by running set-variable.js usbReportSemaphore 1 and just as expected, the firmware recovered virtually immediately, so I couldn't reproduce the bug anymore.

The fact that I was able to access my UHK via USB and some other users weren't suggests me that there are different situations involved. Some UHKs might hard lock which may be due to a physical issue like a static discharge. I expect such issues to be unusually rare and only constrained to various environments.

Please give firmware 8.5.1 a try, guys! Very interested about your feedback.

richrd commented 6 years ago

That sounds like brilliant news! I'm haven't experienced the freeze bug, but I've followed it closely here on github. This made me wonder what usbReportSemaphore actually is? Anyway, great that the culprit seems to be have been found! :)

mondalaci commented 6 years ago

My understanding about usbReportSemaphore is somewhat limited, but according to my knowledge it's responsible for keeping the polling of the USB interfaces in sync. Without it, mouse movement would be far from smooth. I think it's also responsible to resolving a previous freeze bug. I really do hope that this time we nailed it. :)

joshginter commented 6 years ago

@mondalaci How would I go about testing your new functions? I want to go to version 8.5.0 where the bug is still present. And play around with recreating the freeze. I want to see if the behavior is consistent with the freezes I've seen in the past. I'll note if there are any major differences. Then I'll move to 8.5.1 and see if anything new happens.

mondalaci commented 6 years ago

@joshginter Sounds like a plan!

joshginter commented 6 years ago

I was hoping for some further instructions. Do I need a recent version of agent to use the .js files you mention? Do I need to do anything special to reach the "generic USB" interface?

mondalaci commented 6 years ago

You can use the aforementioned $agentDir/packages/usb/set-variable.js script after building the most recent Agent repo to artifically reproduce and resolve the freeze. But it shouldn't be required. The generic USB interface is working if Agent can talk to your UHK.

joshginter commented 6 years ago

Ok. I'll give it a try and let you know results/problems. Maybe tonight, but probably not until the weekend. Great work!

mondalaci commented 6 years ago

Thanks for your help! Really appreciated!

irwand commented 6 years ago

but UsbReportUpdateCounter wasn't incrementing which could only happen because UsbReportUpdateSemaphore was 0 which could only happen because one of the USB interfaces skipped a beat for some reason.

@mondalaci I really like that you found something there.. but it seems like the fix is a bit "hacky"? It seems that we still need to figure out why "one of the USB interfaces skipped a beat for some reason"? Again, I'm grateful for the fix, but it seems like there's something else still lurking down there.. :) I'm running 8.5.1 now, I'll watch for this bug. Thanks man!

joshginter commented 6 years ago

So I did what we talked about. When the semaphore gets set to 1, the apparent behavior of the keyboard does seem to be like the freezes I've seen in the past. The usb-debug-info.js reporting information is decidedly unlike the natural freezes I've seen though. The other thing I notice is that with this forced freeze when I disconnect and and then reconnect the left keyboard half the display turns on and all 3 layer leds turn on (mod, fn, and mouse). In my natural freezes before that display would not turn back on.

The 3 diplay and 3 LEDs being on is consistent with the original failure description @mondalaci included in issue 148. So this appears to be a solution/work-around to the original issue that was described but not for the other form of issue reported by me.

Static seems to be the most likely cause of my problem. Anyone have any idea on how to safely test the idea that static discharge is causing my version of the issue?