Open xkonni opened 1 month ago
Thank you for the information. I have received some reports, but I have not yet been able to identify what the cause is. I will review some of the design policies and try to improve them.
if you need any further information or have an idea how to fix existing pcbs I'm all ears!
@xkonni can you let us know the distro and kernel versions please?
sure, here are my 3 test computers
Is yours cherry or chocolate? How about the communication between the left and right sides. Is the USB connection unstable only?
I got two 4.1 here, one cherry, one choc. they behave exactly the same. On a usual day they work. Does not matter which one I use.
Then after a while the usb issues appear. Changing usb from left to right does not help, switching cherry to choc does not help. My left side is normally plugged in via usb, right via trrs. Sometimes the left side still works, right does not. But then replugging the left or switching to the right just leads to more usb errors in the kernel log.
A cold boot sometimes helps.
Thank you for sharing the details!
I’m also experiencing some disconnect issues with my Core v4.1. When I plug it in and use it to practice on keybr.com, it works well. However, if I set it aside (while it’s still plugged in), switch to browsing, or use my Mac keyboard, the Core v4.1 stops responding when I try to use it again, even though the LED remains on. Tell me if I can help somehow troubleshooting...
I'm having similar issues as well. Seems to be that the non-plugged side disconnects most often, but sometimes I'll get disconnects on the plugged in side as well. I saw the LEDs flicker when this happened, which isn't suprising, but it was a series of very short bursts of flickers which makes me think it's a Power-related issue.
This is just a guess, but from some reports I've heard it seems to be a power supply issue. There are some parts that are not very well designed, and some of them may be defective.
I'd like to isolate some of the root causes, and I'd appreciate any information you could give me.
I'd like to isolate some of the root causes, and I'd appreciate any information you could give me.
* Does the problem happen on a different PC? * Does the problem happen on another Corne v4 (if you have one)?
I've had the issue on a Mac. I do have a spare PC that I can test with later this weekend and report back.
No spare Corne v4.1 keyboards assembled to test with easily.
Update:
Set up:
I used the Corne V4.1 all day today, about 10 hours of use during work. I experienced a total of about 10 losses of function, some back to back, with varying amount of time between them.
Left hand (slave side) had about 6-7 losses of function. Right hand (master side) had about 3-4 losses of function. On one occasion, losses of function occurred every 30 seconds and required the keyboard to be reflashed.
Each time there was loss of function, it was preceded by LED flickering.
Hope this helps! I'm happy to set up more Corne V4.1s to test in varying conditions.
Another update:
I swapped my keyboard out with another Corne v4.1 PCB this evening. I did this to verify that there were no hardware issues with the first PCB. I also used the same firmware to avoid SW variation.
I confirmed the same behavior with keyboard lockup on one side resulting in requiring a power cycle to recover.
This is a big issue! Right now I can't use my (5) Corne v4.1's nor can I use a v4.1 as my daily driver with these reliability issues.
@foostan have you looked into this any further?
Thank you for your confirmation. Unfortunately, this problem does not occur in my environment, so I cannot investigate further.
Hi, i can report i have the same issue. The keyboard locks up so much it's impossible to use. I tested the keyboard with two different set of pcbs (both chocolate), with multiple computers (mostly linux, a windows out of desperation). I also tried flashing a custom KMKFw one-side-only setup and, later, a custom QMK firmware. Multiple USB cables, HUBs, no Hubs, Hid-remapper in front of the keyboard. Same result. The two pcbs were sourced from different vendors in Europe, i tested both.
How can we help you further investigate this issue?
edit: i forgot to add, the keyboards seems to lock a less with QMK.
FYI the second USB port will work (opposite hand) however your special keybinds may behave differently from your custom layout; found this to be a pleasant surprise considering a USB port joint was damaged on mine and the opposite ha d allows me to work around the one broken USB jack. Not sure of this will solve your problem but worth a shot and worth knowing it appears to be different from older branches in that way.
Another possibility is that the PCB is simply damaged. Please also contact your supplier for further information.
@foostan 4 different PBCs from 2 different vendors show the exact same exact issue, both used as a pair and as a single unit (with a custom firmware). The same issue reported by other user in the this thread. I assembled the keyboard myself, and inspected the second set of PCB i got very carefully when i received them: the only reason I bought a second set was to test if my unit was the issue. The custom software was tested on an generic RP2040, to test the stability: no issues for days while the same (KMKfw) software running on the corne has usb issues after a few minutes. I can reproduce this with all my 4 units (2 left and 2 right ones) and it works fine on any other RP2040 i tested.
I had spent quite a lot of time trying to debug and i'm 100% positive it's not a single unit, it's not my computer, the usb cable or simila.
What i'm hoping to get here is some help in further debugging what is an issue with the USB on the keyboard, and hopefully find a solution/workaround to help the other user that may have the same issue.
So, in that light, is there any other info i can provide?
@chadhakala no, the usb port is not damaged at all.
Thank you for sharing the details. I'm glad you're being helpful.
So what you're reporting means is that the issue is more likely to occur with KMKfw than with QMK? I'll give KMKfw a try. Thanks again.
@foostan I don't think he's saying the KMKfw is worse, but rather by testing the same firmware on a generic RP2040 and on the Corne v4.1 board, the issue is only present on the Corne v4.1. This eliminates as many noise factors as conveniently possible.
The help needed is some debugging on the Corne v4.1 USB HW design to understand what's unique to the design that's causing the issue.
Please let us know if you need data. I am fully willing to support as needed. I would love to help solve this.
I'm sorry, of course. I didn't mean KMKfw is worse. I would like to isolate the problem and investigate the cause in detail.
Thank you for your cooperation. Let's share information on this issue.
@foostan 4 different PBCs from 2 different vendors show the exact same exact issue, both used as a pair and as a single unit (with a custom firmware). The same issue reported by other user in the this thread. I assembled the keyboard myself, and inspected the second set of PCB i got very carefully when i received them: the only reason I bought a second set was to test if my unit was the issue. The custom software was tested on an generic RP2040, to test the stability: no issues for days while the same (KMKfw) software running on the corne has usb issues after a few minutes. I can reproduce this with all my 4 units (2 left and 2 right ones) and it works fine on any other RP2040 i tested.
I had spent quite a lot of time trying to debug and i'm 100% positive it's not a single unit, it's not my computer, the usb cable or simila.
What i'm hoping to get here is some help in further debugging what is an issue with the USB on the keyboard, and hopefully find a solution/workaround to help the other user that may have the same issue.
So, in that light, is there any other info i can provide?
@chadhakala no, the usb port is not damaged at all.
@alessiocurri My apologies--I didn't realize this thread was all about the lockup; while,I have faced this issue and other unique issues for which I do not have systematic evidence for being a USB fault.
The last time I used the corne I did face this exact lock up issue and stop using it completely for that reason, I am following all these threads so my apologies, little embarassed for chiming in didn't even read the full thread; I'm pretty sure I meant to respond to a different comment in the thread and was unaware there was even an issue for lockup.
@dahmwern exactly what i meant ;)
@foostan here https://gist.github.com/alessiocurri/18e6b0c48a74c37dee766a71a22ac62a you can find my config for a left-only corne 4.1, no TRRS cable nor right side necessary. This script will run fine on any circuitpython, i tried on versions 8.x, 9.1 and 9.2 (no changes).
To install kfmfw I just copied the kmkfw files from the github repo, added the neopixel.py library (you can also use the .pyc, it should be the same) and my code.py and boot.py (the latter is not strictly necessary). Please note the default layer is empty. To test you need to switch to another layer, the leds will highlight the active keys.
In this config i can replicate the usb lockup on average in 20 minutes, using all 4 boards (tested without the switches, with, no difference). The easier way to check the status is to use a serial console con the virtual com port exposed. There you can find the python REPL. That virtual serial port will disappear when the usb issue presents itself.
@ChadHacksaLot no prob at all, probably it's me owning an apology... in my reply i have been very blunt, probably a tad much :)
Just wanted to add that I'm experience what sounds like the exact same issue with my corne v4.1.
Same behaviour everyone above is describing - sometimes one side becomes unresponsive, sometimes both, sometimes it works nearly all day, sometimes it's only seconds or minutes until the next lockup after disconnecting and reconnecting the USB cable.
One time, the right side even changed colour to the pattern pictured below:
Same behaviour when plugged into
Desktop PC running Windows 10
The same PC running Linux Mint 22 Cinnamon
Pixel 6 phone (Android 14)
lsusb
during failure, both sides, Linux Mint
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 003: ID 0665:5161 Cypress Semiconductor USB to Serial
Bus 001 Device 004: ID 3434:d030 Keychron Keychron Link
Bus 001 Device 005: ID 0b05:18a3 ASUSTek Computer, Inc. AURA MOTHERBOARD
Bus 001 Device 006: ID 8087:0aaa Intel Corp. Bluetooth 9460/9560 Jefferson Peak (JfP)
Bus 001 Device 011: ID 1532:008f Razer USA, Ltd Razer Naga Pro
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
lsusb
after disconnect/reconnect
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 003: ID 0665:5161 Cypress Semiconductor USB to Serial
Bus 001 Device 004: ID 3434:d030 Keychron Keychron Link
Bus 001 Device 005: ID 0b05:18a3 ASUSTek Computer, Inc. AURA MOTHERBOARD
Bus 001 Device 006: ID 8087:0aaa Intel Corp. Bluetooth 9460/9560 Jefferson Peak (JfP)
Bus 001 Device 011: ID 1532:008f Razer USA, Ltd Razer Naga Pro
Bus 001 Device 023: ID 4653:0004 foostan Corne v4
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
I absolutely adore this keyboard when it works, I would love to assist in some way. I have career experience in PCB design and experience in micro-controller firmware programming - let me know what I can do to help or point me in a direction please :)
@viscount-monty I've experienced that same LED pattern during lockup. Your description is consistent with my experience.
What information do you need to analyze the PCB design for potential USB related comms issues?
It seems that it may or may not occur depending on the environment. I don't know under what conditions it occurs, but has anyone noticed any electrical abnormalities when the problem occurs, such as a short interruption or a significant drop in voltage or current?
@foostan I have not noticed any abnormalities. I have experienced this at both work and at home. All my other peripherals continue to function normally (USB C mouse, USB C dongles, Bluetooth connected devices).
Thinking back, the occurrences seem completely random, there's no consistent environmental trigger.
That makes me think the USB connection itself is on the border of instability always and when slight fluctuations on the bus occur, the connection drops.
@foostan i've checked with both a cheap usb analyzer in front of the keyboard and by adding in series an usb-c power injector to make sure this is not an issue with the 5v rail. I think it's environmental tho, and EM: I think the issue happens more often when my mobile is close to the keyboard (which is most of the time). But i have no hard data to show.
To test if this an issue with the USB traces, as hinted by @fabianmuehlberger in #229, I bought an usb isolator to see if going usb full-speed instead of high speed makes any difference. I went the isolator way because i have not found a config to force an USB port to go less than high speed. I'm also planning of using an usb-c power injector from a battery to overcome the 200ma limit of the (very old...) insulator. It will arrive on Monday, i will test then if this changes anything.
@dahmwern I've got the PCB design files open in KiCad and was looking through them when I had a minor breakthrough - hopefully others can confirm/replicate. As I'm sure @foostan can attest, debugging a PCB for noise issues can be a potentionly time, labour, cost, and equipment intensive process, so ideally the scope is narrowed as much as possible prior to beginning the process.
Thanks in part to @alessiocurri for giving me the idea of EM interference from a phone which I was able to investigate more thoroughly.
I put together a quick and dirty bash script (gist) to monitor the connection state of my corne keyboard, logging and notifying me of every time it disconnects.
I found that the keyboard would remain connected without issues all night while I was sleeping (in another room) with my phone also out of the room.
I then found that I could reliably cause a disconnection by placing my phone on or near either side of the keyboard. Note that placing it near the non-USB connected side causes and undectable lock up of that side, with the other side (USB connected, not in close proximity to phone), well, undectable to the bash script. When you're using the keyboard you certainly notice one side stop working!
At this stage it seems very much confirmed that the disconnects are caused by EMI, a common source being a smartphone, but that is not a great sample size!
First step: could everyone please try to replicate and confirm the issue? We need to confirm the following to move forward with this theory:
Thanks to everyone's input so far, and thanks in advance for any attempts to replicate the issue and gather the data/logs to back it up!
Thank you for sharing! That script looks good 👍 I'll try to replicate and confirm the issue.
@viscount-monty great work! I'll test this today as well.
Thanks! And my apologies - I forgot to change the specific device ID in the lsusb
command and just grep
for corne
.
Let me know if anyone needs a hand getting the script up and running 🤔
"Good" news/bad news situation:
"Good" news: Now that we "know" EMI are the probable issue, i can reproduce it in 20 seconds, by moving my mobile closer and switch off the wifi. This (i assume) makes the 4/5g modem talk and the keyboard is in a non-functional state in seconds. By closer i mean around 20cm from the usb connected side (the left, in my case). Bad: The insulartor makes no difference at all, probably because it's an USB2.0 device, and not 1.1 as described in the amazon page :rage: .
Btw, this shades a bit of light on why (in my experience) QMK was more "stable": i was testing it on another machine during the night, while not there.
I still would like to test forcing USB1.1 (mostly to find a use for the keyboard and not have to just recycle the switched in another project), but have no idea how. If anybody has a suggestion, or has an idea on how to try and shield the USB traces, it would be more than welcome.
@alessiocurri you hit the nail on the head! I can trigger the failure by following the identical procedure!
Phone with WiFi on: No impact Turn WiFi off: Immediately BOTH my test Corne v4.1s locked up at the same time.
I'm going to run the script over night with both Corne v4.1's connected, far away from any connected devices and see if they are stable for a 24 hour period.
@viscount-monty can you modify the script to run on Windows? My work machine is a mac but I'm running this test on my SurfacePro spare computer. I'm not really a linux user.
I turned off the Wi-Fi on my iPhone and moved it closer to corne to continue data communication, and the current situation where the closer one gets locked was reproduced. I have not yet been able to reproduce the problem stably, so I will continue to look for a way to do so for a while.
@foostan I have a theory there... That could be down to 5G band variation internationally. Google doesn't seem to bring up much easy consensus on bands in Japan quickly.
I will need to research 5G band variations globally to understand if that's a reasonable limitation on testing repeatability. At the very least, it should be considered an environmental factor that may impact individual testing.
@alessiocurri nice work - disabling the WiFi on my phone allowed me to more reliably replicate the issue by placing my phone near the corne.
To everyone attempting to replicate - does switching off the WiFi AND starting a data-intensive process (like playing a YouTube video) replicate the issue more reliably? This should trigger a whole bunch of cellular transmissions. Without this they would be potentially very brief and sporadic.
And, inversely, when placing the phone in 'aeroplane mode' so that wireless features are turned off, does this allow you to place the phone on or near the corne for an extended period without experiencing a disconnect? While initial brief tests seem to indicated that this is the case, unfortunately I can't test this for an extended period until later in the evening as I am expecting SMSs and a phone call 🤣️
@foostan just to confirm, the computer cannot detect a lockup of the non-USB connected side, only the user can notice either no input at all, or a key getting 'stuck' and continuing to repeat without user input (I'm guessing the last command received from the non-USB connected side must have been a 'key down'.
@dahmwern unfortunately I know very little about windows powershell/batch scripts. Would you be able to run a python script? I could probably throw something together quickly. Also, excellent point about the variation in 5G bands: the wikipedia article lists 5G bans from~450 Mhz through to multiple GHz... Wild.
I did some testing with airplane mode -
Having my phone next to the keyboard and turning on airplane mode immediately causes the secondary half to become unresponsive (note to be clear my airplane mode does not affect wireless or bluetooth). N=2
Having my phone on airplane mode next to the keyboard does not appear too have any impact, at least over a short duration
Turning airplane mode off does not immediately impact the keyboard, but makes it unresponsive after a few seconds. N=2
If you are typing at the time the unresponsiveness is triggered, the key gets stuck eg you type ttttttttttttttttttttttttttttt
Location/ 5G bands are Australia
To find a workaround, yesterday i tired with some aluminium foil (the food safe one i got from the kitchen). I wrapped the keyboard completely (including the keys...) in a 2/3 layers and run a simple python code (print i++, in a nutshell) on the rp2040. The keyboard did not get stuck while trying my worst with my mobile. While this works, it's not the most practical solution (plus your partner may think you a bit crazy...). I ordered some conductive filament and i will try to reprint the case and the top plate with that. I'm not sure the filament (with a resistance of 1.5ohm per cm) will work as well as the foil, nor if the (very necessary) gaps for the switches are an issue. The alternative is to order a full metal case, but that looks expensive for a test, around 70 euros per side. Hopefully i will have some results with the conductive filament, good or bad, in around a week.
Hey @alessiocurri since you mentioned me and the issue I brought up. This really looks like an EMI problem with the USB differential pair. wTo be clear, I am not high speed signal or EMI engineer, I am just an hobbyist. So I counld not test your hypothesis, since I am missing expesive testing equipment, and knowledge. Not sure how to debug this hardware issue without the propper tools, usually a test signal would be sent through the lines and an eye pattern would be measured, A high speed oscilloscope would be needed, as well as an isolated environment. Maybe the easier option is to make a test PCB: move the IC to a suitable place with propper USB routing, and build it.
@fabianmuehlberger you are absolutely right, the right thing to do would be to move the IC closer to the USB port, or in general re-route to maintain signal integrity 'till the micro. The test (with the filament) are more for the people (like me) that will not get a new board and wants to try and savage the situation.
ps Thank you for bringing up the issue in the first place, it helped a lot when trying to find the issue.
Just wild speculation on my part. I don't own a corne-v4. But @viscount-monty suggested that the slave half was similarly effected, which suggests maybe the USB thing is a red herring.
The crystal used is not a part I have seen in an RP2040 design before (although perhaps it is well used in Japanese keyboard scene?). The RP2040 hardware design guide is very explicit that deviating from the recommended crystal may lead to instability. The selected part has quite different load capacitance to the Abracon ABM8-272-T3, so perhaps the values of the load capacitors and series resistor need tweaking? Maybe shielding just the crystal might be an interesting experiment?
Thank you for all your various suggestions, they are accommodating and informative. It seems like a good idea to check the basics, such as moving the IC closer to the USB, paying attention to the length and thickness of the wires, and reviewing the crystal parts. I'm open to any suggestions here and will make a prototype. Thank you again for your cooperation.
Considering the current case, the distance between the IC and the USB doesn't seem to matter much, because this issue also occurs at one side which is not connected USB.
If you are familiar with the SWD interface and can reproduce the failure, you may be able to power the board via the OLED header, connect to SWD and see if you can get the MCU to fail without any USB connection present at all.
@george-norton i tested this with circuitpython and a simple program that light the leds in sequence: when the USB goes down, the mcu is still running fine. I think the processor itself is not affected, but the software is: when KMKfw try to send data via USB, the python core crashes after a few seconds. I assume QMK as a similar behavior, but i have not tested it.
@foostan I'm an hobbist in this field (electronics), so this is a bit of guesswork. I think the secondary unit is stuck because it cannot sync with the primary, which in turn is stuck trying to write on the USB (at least with the default qmk firmware). I have verified both the MCUs of the primary and the secondary unit are still alive (with the aforementioned script) and will talk via the interconnecting cable (in my example i'm flipping up and down the gpio conneted to the TRS cable at ~1Hz on one side, and turn on a led on the other. This is to say that i would consider the other side being stuck only a side effect of not being able to talk to the primary side.
@george-norton You are right. It would be beneficial to rule out other causes. I assume this could be tested my running the RP2040 with some testing C or Python code, just logging the behavior via the debug serial header. For this test, the USB lines should be fully deactivated, (disabled GPIOs) to mitigate interference.
I also assume the error codes could indicate the problem we are facing here. The question is: What USB connection issues and logs are present in the following cases? And are there other signs indicating the root cause.
Considering the current case, the distance between the IC and the USB doesn't seem to matter much, because this issue also occurs at one side which is not connected USB.
The main reason I mentioned this was, that the design is not within the specification for routing USB lines. This does not say it has to be a problem, but is certainly one of the first areas to look at when problems like this occur.
I tested the case printed with conductive filament but, as expected tbh, there was no change. At this point I'm out of tests I can do.
@foostan while you investigate, i think you should add a warning in the readme.md file about this issue.
Add a notice about this issue on README https://github.com/foostan/crkbd?tab=readme-ov-file#notice
got 2 crkbd rev 4.1, love them, typing on my old 60% is a pain now.
but for some reason the usb connections on both devices are rather unstable on my machines (linux pc, 2 dell laptops with linux). first I thought it was a hw issue, but the second (one from a diy store in germany, one from aliexpress) shows the exact same issues.
using your firmware with the vial keymap. tried some options (remove
USB_SUSPEND_WAKEUP_DELAY
, increase it, ...) but the devices remain unstable. sometimes they run for hours, then they fail every few seconds.Could this be related to https://github.com/foostan/crkbd/issues/229 ?
any help is highly appreciated!
logs: