Closed SolidHal closed 6 years ago
I think this could be related to board revisions (although we're 90% right assuming this is probably something about kernel): I know RK3288 and the USB IP in it have hardware revisions, so maybe later C201s are different and that affects the ability to reproduce this issue.
Thats very possible, I recall seeing revision numbers printed on the usb breakout board inside the I'll compare the revisions of the three ones I have
Quoting what I said #31 for easier future reference.
I pulled out my other c201,a 2gb model, and my second AR9271 and managed to recreate this issue while booting off of usb. It seems to lock up completely, and eventually the kernel crashes.
If I plug in the AR9271 after boot, I get errors just like in this issue qca/open-ath9k-htc-firmware#136. Do you as well @dimkr ?
I realize now why I'm not seeing this issue on my main 4GB c201; Because I have my ath9271 soldered in to the webcam wiring. (or maybe its a 2GB vs 4GB issue, but I doubt it?)
Things I'll test: Running off of usb vs emmc 4gb vs 2gb models libreboot vs stock coreboot
The issue I linked above also mentions not having this problem when using a usb hub, but doesn't mention if it is powered or unpowered. I'll test with one of each.
Its sounding more and more like a kernel issue, especially if everything works fine with the chrome os 3.14 kernel. After testing the above things I'll start digging into the kernel. If using an unpowered usb hub fixes this, a pretty nasty usb bug is hiding somewhere.
Heres my exact steps to recreate this issue:
wpa_passphrase <wifinetwork_name> <wifinetwork_passphrase> > wpa.conf
wpa_supplicant -D wext -i wlan0 -c wpa.conf
I was initially testing on my 2gb c201, unfortunately a refurb so no manufacturing info on the sticker.
On the 2gb model, I get the following in dmesg at boot:
[ 6.297824] usb 1-1: ath9k_htc: Firmware ath9k_htc/htc_9271-1.4.0.fw requested
[ 6.298086] usbcore: registered new interface driver ath9k_htc
[ 6.584679] usb 1-1: ath9k_htc: Transferred FW: ath9k_htc/htc_9271-1.4.0.fw, size: 51008
[ 6.837106] ath9k_htc 1-1:1.0: ath9k_htc: HTC initialized with 33 credits
[ 6.944168] ath: phy0: Mac Chip Rev 0x0f.3 is not supported by this driver
[ 6.944889] ath: phy0: Unable to initialize hardware; initialization status: -95
[ 6.945723] ath: phy0: Unable to initialize hardware; initialization status: -95
[ 6.946592] ath9k_htc: Failed to initialize the device
[ 6.958691] usb 1-1: ath9k_htc: USB layer deinitialized
wlan0
doesnt exist, so wpa_supplicant
can't even be ran
I unplugged the wifi dongle, plugged it back in and got the same message as above. I plugged in the dongle, and force powered down with the power button. I then got this in dmesg on reboot:
[ 6.681174] usb 1-1: ath9k_htc: Firmware ath9k_htc/htc_9271-1.4.0.fw requested
[ 6.681409] usbcore: registered new interface driver ath9k_htc
[ 6.971747] usb 1-1: ath9k_htc: Transferred FW: ath9k_htc/htc_9271-1.4.0.fw, size: 51008
[ 7.223404] ath9k_htc 1-1:1.0: ath9k_htc: HTC initialized with 33 credits
[ 7.489004] ath9k_htc 1-1:1.0: ath9k_htc: FW Version: 1.4
[ 7.489228] ath9k_htc 1-1:1.0: FW RMW support: On
[ 7.489385] ath: EEPROM regdomain: 0x65
[ 7.489392] ath: EEPROM indicates we should expect a direct regpair map
[ 7.489403] ath: Country alpha2 being used: 00
[ 7.489409] ath: Regpair used: 0x65
[ 7.500742] ieee80211 phy0: Atheros AR9271 Rev:1
so I run the commands as laid out at the top of this post and got:
Successfully initialized wpa_supplicant
IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready #This message seems to be irrelevant
ioctl[SIOCSIWENCODEEXT]: Invalid argument
ioctl[SIOCSIWENCODEEXT]: Invalid argument
wlan0: Trying to associate with <MAC addr> (SSID=<ssid> freq=2462 MHz)
wlan0: authenticate with <MAC addr>
wlan0: send auth to <MAC addr> (try 1/3)
wlan0: authenticated
wlan0: associate with <MAC addr> (try 1/3)
wlan0: RX AssocResp from <MAC addr> (capab=0x431 status=0 aid=2)
wlan0: associated
wlan0: associated with <Mac addr>
wlan0: WPA: Key negotiation completed with <MAC addr> [PTK=CCMP GTK=CCMP]
wlan0: CTRL-EVENT-CONNECTED - Connection to <MAC addr> completed [id=0 id_str=] #id str is empty?
IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
wpa_supplicant
seem to run continuously after that, so in anther tty (alt+ctl+ a function key)
I then ran dhclient
and then could run apt update
successfully.
Rebooted, saw the firmware was installed properly, but wpa_supplicant
hangs indefinitely after printing:
Successfully initialized wpa_supplicant
I force shutdown by holding the power button, booted again, firmware was successfully installed, and it could connect to the internet.
I force shutdown one final time, and got a new message at bootup:
usb 1-1: ath9k_htc: Firmware ath9k_htc/htc_9271-1.4.0.fw requested
usbcore: registered new interface driver ath9k_htc
usb 1-1: ath9k_htc: Transferred FW: ath9k_htc/htc_9271-1.4.0.fw, size: 51008
ath9k_htc 1-1:1.0: ath9k_htc: Target is unresponsive
ath9k_htc: Failed to initialize the device
usb 1-1: ath9k_htc: USB layer deinitialized
I then moved to my 4gb c201 from May of 2016
The first time boot I got:
[ 6.297824] usb 1-1: ath9k_htc: Firmware ath9k_htc/htc_9271-1.4.0.fw requested
[ 6.298086] usbcore: registered new interface driver ath9k_htc
[ 6.584679] usb 1-1: ath9k_htc: Transferred FW: ath9k_htc/htc_9271-1.4.0.fw, size: 51008
[ 6.837106] ath9k_htc 1-1:1.0: ath9k_htc: HTC initialized with 33 credits
[ 6.944168] ath: phy0: Mac Chip Rev 0x0f.3 is not supported by this driver
[ 6.944889] ath: phy0: Unable to initialize hardware; initialization status: -95
[ 6.945723] ath: phy0: Unable to initialize hardware; initialization status: -95
[ 6.946592] ath9k_htc: Failed to initialize the device
[ 6.958691] usb 1-1: ath9k_htc: USB layer deinitialized
Just like on the 2GB model
But on the second boot, It initialized properly, wrote the firmware properly, and wlan0
existed so I ran the commands and got:
Successfully initialized wpa_supplicant
IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready #This message seems to be irrelevant
ioctl[SIOCSIWENCODEEXT]: Invalid argument
ioctl[SIOCSIWENCODEEXT]: Invalid argument
wlan0: Trying to associate with <MAC addr> (SSID=<ssid> freq=2462 MHz)
It then hangs......
This seems to only be intermittent though, as the next boot I successfully connected:
Successfully initialized wpa_supplicant
IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready #This message seems to be irrelevant
ioctl[SIOCSIWENCODEEXT]: Invalid argument
ioctl[SIOCSIWENCODEEXT]: Invalid argument
wlan0: Trying to associate with <MAC addr> (SSID=<ssid> freq=2462 MHz)
wlan0: authenticate with <MAC addr>
wlan0: send auth to <MAC addr> (try 1/3)
wlan0: authenticated
wlan0: associate with <MAC addr> (try 1/3)
wlan0: RX AssocResp from <MAC addr> (capab=0x431 status=0 aid=2)
wlan0: associated
wlan0: associated with <Mac addr>
wlan0: WPA: Key negotiation completed with <MAC addr> [PTK=CCMP GTK=CCMP]
wlan0: CTRL-EVENT-CONNECTED - Connection to <MAC addr> completed [id=0 id_str=] #id str is empty?
IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
wpa_supplicant
seem to run continuously after that, so in anther tty (alt+ctl+ a function key)
I then ran dhclient
and then could run apt update
successfully.
I had to pause debugging here unfortunately. Will edit this later.
Further testing on both my 2gb and 4gb devices running stock coreboot.
Throughout all of this, I am getting the same exact errors on both the 4gb and 2gb model I have. The only difference between these two c201's and my main one is that my main one has the wifi chip soldered in so that it uses the ground and usb data lines of the webcam connector, and the 5v line of the usb ports. On my main c201, I do NOT get this issue with the soldered in wifi dongle. I do however see the same errors as above if I plug the wifi dongle into one of its two usb ports.
Based on this, it seems like the bug is located either in the wider usb driver or maybe the device tree. I'm going to start at looking at the differences between how the webcam usb is handled vs the main usb ports.
TODO:
@samdima (@dimkr ?) Could you provide some insight regarding the line
export WIFIVERSION=-3.8
in https://github.com/dimkr/devsus/blob/master/devsus.sh when you get a chance?
This line
complex-usb-min-frequency = <1200000>;
present in https://chromium.googlesource.com/chromiumos/third_party/kernel/+/chromeos-3.14/arch/arm/boot/dts/rk3288-veyron.dtsi but not https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/rk3288-veyron.dtsi
Is interesting
Tested disabling power saving for wireless in the kernel config, as well as removing net.ifnames=0
from the kernel command line, neither of which improved anything.
This line
complex-usb-min-frequency = <1200000>;
is used in drivers/cpufreq/cpufreq.c in the chromeos kernel like so:
In function static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif,
/* some usb device request cpu frequency larger than specific value */
of_property_read_u32(dev->of_node,
"complex-usb-min-frequency", &policy->complexusb_minfreq);
and is referred to by:
/**
* cpufreq_start_complex_usb
* @cpu: CPU number
*
* set cpu min frequency to complex-usb-min-frequency
* which define in dts
*/
void cpufreq_start_complex_usb(void)
{
struct cpufreq_policy *policy = cpufreq_cpu_get(0);
down_write(&policy->rwsem);
policy->complexusb_cnt++;
if (policy->complexusb_cnt > 1) {
up_write(&policy->rwsem);
cpufreq_cpu_put(policy);
return;
}
up_write(&policy->rwsem);
cpufreq_cpu_put(policy);
cpufreq_update_policy(0);
}
the mainline kernel doesnt have either of those functions in cpufreq.c
, and its policy struct has not mention of complexusb https://elixir.bootlin.com/linux/latest/source/include/linux/cpufreq.h#L65
TODO: Testing the reversal of this commit https://chromium.googlesource.com/chromiumos/third_party/kernel/+/b0a35a3f72c38a979e1efa2c167f461c37466a29 on the chromeos 3.14 kernel should be sufficient to see if this fixes it.
TODO: Check out differences in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/wireless/ath vs chromeos ath
@samdima (@dimkr ?) Could you provide some insight regarding the line export WIFIVERSION=-3.8
The Chrome OS 3.14 kernel uses 80211 and ath9k_htc from 3.8. If you take a close look at the git log you'll see they were taken straight off some minor version of 3.8 (don't remember which) and never touched again.
I spent the weekend before the weekend that just ended, trying to do the-opposite-of-backporting to get the 3.8 code in the Chrome OS kernel working with mainline 4.9.x, but it's way too complex due to all the changes between 3.14 and 4.9.
Tested disabling power saving for wireless in the kernel config, as well as removing net.ifnames=0 from the kernel command line, neither of which improved anything.
I can confirm that disabling USB power management and USB auto-suspend doesn't change anything.
@SolidHal: I'm trying to apply b0a35a3f72c38a979e1efa2c167f461c37466a29 to 4.9.x, let's see how this goes
The Chrome OS 3.14 kernel uses 80211 and ath9k_htc from 3.8. If you take a close look at the git log you'll see they were taken straight off some minor version of 3.8 (don't remember which) and never touched again.
I see, that complicates things for sure.
I spent the weekend before the weekend that just ended, trying to do the-opposite-of-backporting to get the 3.8 code in the Chrome OS kernel working with mainline 4.9.x, but it's way too complex due to all the changes between 3.14 and 4.9.
That sounds monumentally challenging, if you get it working though that would be awesome.
I'm going to see if I can identify a regression.
hif_usb.c
seems like a decent place to being testing.
For reference, this is the last commit cros 3.8 and mainline sharehttps://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/net/wireless/ath/ath9k/hif_usb.c?id=763cbac07674a648f1377b21ca66f577c103fa9a
Which leaves these commits as prime targets for regression testing:
2018-06-29 ath9k: use irqsave() in USB's complete callback Sebastian Andrzej Siewior 1 -3/+4
2018-02-07 ath9k_htc: add Altai WA1011N-GU Oleksij Rempel 1 -0/+1
2017-09-25 ath9k: remove cast to void pointer Himanshu Jha 1 -4/+4
2017-08-11 ath9k: constify usb_device_id Arvind Yadav 1 -1/+1
2017-06-16 networking: make skb_push & __skb_push return void pointers Johannes Berg 1 -1/+1
2017-04-05 ath9k_htc: fix NULL-deref at probe Johan Hovold 1 -0/+3
2017-03-09 ath9k_htc: Add support of AirTies 1eda:2315 AR9271 device Dmitry Tunin 1 -0/+1
2016-12-01 ath9k_htc: don't use HZ for usb msg timeouts Anthony Romano 1 -4/+5
2016-04-07 ath9k_htc: Delete unnecessary variable initialisation Markus Elfring 1 -1/+1
2016-01-26 ath9k_htc: add device ID for Toshiba WLM-20U2/GN-1080 Alexander Tsoy 1 -0/+2
2015-09-18 ath9k_htc: introduce support for different fw versions Oleksij Rempel 1 -25/+81
2015-02-26 ath9k_htc: Add new USB ID Leon Nardella 1 -0/+1
2014-02-12 ath9k_htc: Add device ID for Buffalo WLI-UV-AG300P Masaki TAGAWA 1 -0/+2
2013-08-15 ath9k_htc: do not use bulk on EP3 and EP4 Oleksij Rempel 1 -27/+11
2013-07-22 ath9k_htc: reboot firmware if it was loaded Oleksij Rempel 1 -1/+3
2013-07-17 ath9k_htc: fix data race between request_firmware_nowait() callback and suspe... Alexey Khoroshilov 1 -3/+6
2013-06-24 ath9k_htc: Add ethtool stats support.
commit log for reference https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/drivers/net/wireless/ath/ath9k/hif_usb.c
ath9k_htc: reboot firmware if it was loaded
seems like a decent first commit to test.
don't use HZ for usb msg timeouts
also seems fishy
Will report back later with results
@SolidHal: take a look at https://chromium.googlesource.com/chromiumos/third_party/kernel/+/be77c1d1e3cc168ded1c40f9ac1d107a8335219a%5E%21/#F0, maybe b0a35a3f72c38a979e1efa2c167f461c37466a29 was required to fix a different issue
UPDATE: took some time but now I have b0a35a3f72c38a979e1efa2c167f461c37466a29 plus the DTS change applied to 4.9.127, building
Another debugging direction: it seems drivers/devfreq/rockchip/rk3288_dmc.c is missing in mainline! There's a rockchip_dmc_disable()
right near the fix for the USB speaker thingy bug
@SolidHal: take a look at https://chromium.googlesource.com/chromiumos/third_party/kernel/+/be77c1d1e3cc168ded1c40f9ac1d107a8335219a%5E%21/#F0, maybe b0a35a3f72c38a979e1efa2c167f461c37466a29 was required to fix a different issue
UPDATE: took some time but now I have b0a35a3f72c38a979e1efa2c167f461c37466a29 plus the DTS change applied to 4.9.127, building
Let me know if that changes anything. I looked into it a bit but the whole idea of complex usb policy for cpu freq doesn't exist in mainline
it seems drivers/devfreq/rockchip/rk3288_dmc.c is missing in mainline!
Huh, this is what it seems to be used for.
This adds the DEVFREQ driver for the RK3288 dmc. It sets the frequency
+ for the memory controller and reads the usage counts from hardware.
I don't see this directly effecting usb, but I also have very little knowledge of how dwc2 works so maybe it is picky about the memory controller for some reason?
From the discussion here: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/231479 from the comments it looks like the code was provided to the cros team by rockchip. It's associated with a bug, but very unfortunately it doesn't look like the public has access to the bug tracker so all we can do is guess as to what the fix is for :/ I asked on the #chromium-os irc, so maybe someone will help us out.
I think I found the fix, doing some final testing now.
Can't wait to hear what it is :scream:
Just commited the fix, patch can be found here. resources/BuildResources/patches-tested/kernel/reverse-do-not-use-bulk-on-EP3-and-EP4.patch
Will look into why dwc2 doesn't like the original commit later, but for now reverting the commit will suffice.
EDIT: will also edit the patch description later to be more precise, but wanted to get this out quick.
I can confirm that reverting this commit on a vanilla 4.9.127 (without any dwc2 changes) makes the problematic AR9271 stable, on my older C201 that's more sensitive to this bug for some reason.
@dimkr awesome. Out of curiousity, is there a reason you stick with 4.9?
@SolidHal It's a longterm kernel, that's the only reason why; according to https://www.kernel.org/category/releases.html it's maintained until 2023, after all other current stable/longterm branches
@dimkr Ahhhh, that makes sense. Thanks!
As I guessed, reverting that commit isn't a perfect fix. An author of the ath9k firmware made that clear here. https://github.com/qca/open-ath9k-htc-firmware/issues/124#issuecomment-423255235
I'm hoping to find a clean change, but for now reverting that commit seems to be working pretty well for me so it seems to be an alright workaround.
@dimkr Did you ever get this working?
@SolidHal: take a look at https://chromium.googlesource.com/chromiumos/third_party/kernel/+/be77c1d1e3cc168ded1c40f9ac1d107a8335219a%5E%21/#F0, maybe b0a35a3f72c38a979e1efa2c167f461c37466a29 was required to fix a different issue
UPDATE: took some time but now I have b0a35a3f72c38a979e1efa2c167f461c37466a29 plus the DTS change applied to 4.9.127, building
It wouldn't be ridiculous to think that if the cpu freq is too low, a bunch of usb interrupts take too long or get dropped or something weird. That would explain why using bulk endpoints instead of interrupt endpoints would fix it as bulk probably use less cpu.
@SolidHal Nope, I had breakage all over due to differences from 3.14 and gave up on the idea ... other Rockchip drivers are missing anyway
olerem gave me a great breakdown of the history and goals of the original patch
@SolidHal
* usb device is powered and detected by the host * host is requesting the usb descriptor and adapter is sending one of available in ROM (EP3/4 are Int int this case) * host is using provided descriptor to properly configure host controller driver and actual adapter driver.
So, atheros devs noticed some performance issues and tried to add this workaround.
1. try. patched firmware to provide different descriptor - it is not working, because adapter should trigger reinit of this interface. Suddenly some host controller will power cycle the adapter, so patched firmware will be lost 2. try. patch endp->bmAttributes on the host.. Dint worked well. May be on some point, but this way is just a hack and was not expected to work long 3. try. don't patch, just use usb_bulk_msg... it seems to work on some systems, but terribly brake on other. 4.try. remove workaround and make sure it is not violating specifications, brakes dwc2.
So the original "problem" patch seems to revert both 2 and 3. I'm isolating the two sections of the PrawnOS patch and testing both to see which one conflicts with dwc2.
Unable to isolate any changes and keep a functioning system.
Looking into usb_sndintpipe
and usb_interrupt_msg
to see if the implementation in dwc2 is incorrect for some reason.
Closing this for now, as the patch seems to work without errors despite the potential issues. If new issues arise, I'll reopen this
I recently tried the October ISO release and wifi via my AR9271 usb adapter does not seem to be working. I set up my network with wpa_passphrase and connected with wpa_supplicant -B -D wext -i wlan0 -c path/to/file/from/wpa_passphrase, then ran dhclient wlan0. I didn't see anything alarming in dmesg, maybe I am doing something wrong.
I attached my dmesg. dmesg.txt
Try to remove the -D wext
to use nl80211
instead. I have zero issues.
Also try to run ifconfig
and see if you have an IP address assigned.
No luck. I don't get an IP address. The AR9271 adapter I am using is this one: https://www.thinkpenguin.com/gnu-linux/penguin-wireless-n-usb-adapter-gnu-linux-tpe-n150usb.
I also tried connecting to the internet via a USB ethernet adapter, but it does not seem to recognize it or I am not configuring something properly (I don't see eth0). The same adapter works for my Debian install with kernel 4.17 I have on another sdcard. Also, chrooting in with a working internet connection did not allow me to use the internet within PrawnOS. Maybe I am forgetting to configure something?
Your posted dmesg looks correct, the adapter is loading the firmware properly. Could you try using the xfce gui wifi menu and tell me if it works there? I have that same adapter and it is working for me with the October image both from the commandline and the gui.
Before I was just running the basic PrawnOS system without running any scripts.
I had to install to a USB drive instead of a SD card because I got a permission denied error when running ExpandExternalInstall.sh on the line that ran resizefs. I installed to a USB and ran InstallPackages.sh, then ran the xfce gui to connect to a network. It seems to detect networks just fine, but when I attempt to connect to my network with a hidden ESSID, the operating system freezes and I have to force shutdown. Maybe this would work for another network without a hidden ESSID; I'll see if I can test this soon.
As partially discussed in #31, usb wifi AR9271 seem to not be working. This was reported while booting off of a usb device, with the wifi dongle plugged in at boot.
Further testing: install wicd-curses, then attempting to run it hangs with dbus eventually reporting a NoReply. If the wifi dongle is then removed, there are many many of the following message spammed to dmesg:
which then gives way to:
This implies to me that the hang is happening somewhere in the dwc2 driver.