Open OvermindDL1 opened 3 years ago
Hello @OvermindDL1, it might be interesting to check if you have libnss-resolve:i386
and try adding it if the system doesn't have it already. (https://github.com/ValveSoftware/steam-for-linux/issues/4378#issuecomment-620800441)
Separately, when you tested the the workaround with nscd, did you also start the nscd daemon? (https://github.com/ValveSoftware/steam-for-linux/issues/7766#issuecomment-886843078)
Hello @OvermindDL1, it might be interesting to check if you have libnss-resolve:i386 and try adding it if the system doesn't have it already. (#4378 (comment))
Greets, I do have this yes, it was one of a large number of things that I installed and I just checked and it is still installed:
$ sudo apt install libnss-resolve:i386
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libnss-resolve:i386 is already the newest version (247.3-3ubuntu3.4).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Separately, when you tested the the workaround with nscd, did you also start the nscd daemon? (#7766 (comment))
Yes, I confirmed it was running, I also stopped it and restarted it and did a direct query test against it to confirm it was working. Here is its current status, and the system has been rebooted multiple times since its installation (and between the many steam/steamcmd attempts) as well:
$ systemctl status nscd.service
● nscd.service - Name Service Cache Daemon
Loaded: loaded (/lib/systemd/system/nscd.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2121-08-15 16:33:45 MDT; 99 years 11 months left
Process: 790 ExecStart=/usr/sbin/nscd (code=exited, status=0/SUCCESS)
Main PID: 795 (nscd)
Tasks: 13 (limit: 38313)
Memory: 2.5M
CGroup: /system.slice/nscd.service
└─795 /usr/sbin/nscd
And yes, to confirm the files as well for completion:
$ find /usr -name 'libnss_resolve.so*' 2>/dev/null
/usr/share/man/man8/libnss_resolve.so.2.8.gz
/usr/lib/x86_64-linux-gnu/libnss_resolve.so.2
/usr/lib/i386-linux-gnu/libnss_resolve.so.2
After trying oh so so very much including process injecting to trace things to try to figure out what's going on and one thing accidentally got it working on the (not-so-fresh anymore) fresh install of Kubuntu 21.04. In short, by default ubuntu symlinks /etc/resolv.conf
to ../run/systemd/resolve/stub-resolv.conf
, and that file contains (large block of comments elided):
nameserver 127.0.0.53
options edns0 trust-ad
search .
Which makes sense, I can query the local resolvd DNS server and all even directly:
$ dig steampowered.com @127.0.0.53
; <<>> DiG 9.16.8-Ubuntu <<>> steampowered.com @127.0.0.53
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44184
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;steampowered.com. IN A
;; ANSWER SECTION:
steampowered.com. 14 IN A 104.92.238.240
;; Query time: 27 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Thu Aug 19 08:26:57 MDT 2021
;; MSG SIZE rcvd: 61
However, one of the last things I kept seeing the steam client do before it would end up just waiting on a futex was read /etc/resolv.conf
so I tried experimenting with it, first thing I tried was changing its symlink from ../run/systemd/resolve/stub-resolv.conf
to ../run/systemd/resolve/resolv.conf
, which contains (comments elided again, and yes so many entries because I was screwing with NetworkManager settings earlier before I realized the DNS wasn't being queried at all):
nameserver 1.1.1.1
nameserver 8.8.8.8
nameserver 1.1.1.1
nameserver 8.8.8.8
nameserver 8.8.4.4
search .
Aaaaand both steam
and steamcmd
started working! However, this is the new install on a fresh drive, I want to try things on the old install, but first I want to reinstall the OS from scratch again to see what minimal steps I can get to correct the issue (I might be able to do that tonight or tomorrow depending on how much time the little toddler gives me, lol).
However, this makes me wonder what steam/steamcmd is doing after it reads that file, like is it looking for more nameserver's, or does it not allow accessing a localhost nameserver, or does it not like one of the options edns0 trust-ad
or is their very existence breaking its parsing somehow, or what's going on... Sadly steam/steamcmd are not open source so I can't check that way, but I might gdb in to it to see what's going on, but my initial gut-guess is an out of date library steam is using perhaps?
I also want to build a 32-bit native app here shortly to call getaddrinfo
to see if I get an error with it with that reverted, but I'm not sure if that is what steam is calling to get DNS, it seems like it might be doing something else, at least before that.
On one of the computers that steam still works on (running Kubuntu 20.10, but been upgraded year after year since Ubuntu 06.04, yes really, lol) its /etc/resolv.conf
is (with comments considering how short its comments are):
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "systemd-resolve --status" to see details about the actual nameservers.
nameserver 127.0.0.1
And the third computer (apparently on Kubuntu 19.10, wow I need to update it...) its /etc/resolv.conf
is:
nameserver 127.0.0.53
options edns0
So that makes me wonder if trust-ad
is breaking steam's DNS access somehow, though unsure how that might be, I need to check if the steam domains it accesses are properly DNSSEC verified and see if that has anything to do with it, that will be my next task on this adventure to get this all working again.
I wish I could update more often, but toddler, I'm still trying so very many things trying to reduce it to find the actual specific issue though, please stand by... ^.^
Well just a quick check until I can do more testing on the box itself, media.steampowered.com is indeed not DNSSEC secured, so I wonder if this is related, will deal more with later though, could still be a red herring, want to reinstall the OS and try to reduce the steps needed to try next: https://dnssec-debugger.verisignlabs.com/media.steampowered.com
So a few more test cases:
Ubuntu 20.04 on bare metal works Ubuntu 21.04 on bare metal does not work Ubuntu 21.04 in a QEMU VM works. Manjaro in a QEMU VM works. Manjaro on bare metal does not work.
Manjaro doesn't use (or even have installed) resolved as well for note, and messing around its resolve.conf hasn't born fruit as of yet, but then again I'm not even sure if the system uses it?
Every other internet thing from browsers to discord to dig and drill and curl to netcat and everything I can think of has no issues with the network connection or any DNS lookups thereof on any of the above system installations.
I have set up the network to be either fully DHCP to fully manually setup with no change (plenty of reboots of course). Tried a variety of DNS servers from my ISP's to cloudflare to google's.
So what is similar between Ubuntu 21.04 and Manjaro that is distinct from Ubuntu 20.04... Don't have another system to test it on and still not wanting to update either of the other two desktops from 20.04 in case 21.04 breaks them as well...
This system is the newest of all of them (2-3 years old or so?), an EFI system wouldn't be doing something weird with networking that I don't know about would it? Why does it work in a VM but not on the bare metal is what is especially weird, so something to the hardware perhaps, but unsure what that could be, especially working on older ubuntu's, or swapping the resolv.conf on the newer ubuntu (every reboot... which doesn't work on manjaro because it doesn't even have a resolved to bypass to begin with)...
So I whipped up a quick rust program:
I then ran it like (manjaro has quite the powerline, so excuse the noisy unicode it uses):
~/rust/getadd master ?4 cargo run ✔ 1m 55s
Compiling libc v0.2.103
Compiling socket2 v0.4.2
Compiling dns-lookup v1.0.8
Compiling getadd v0.1.0 (/home/sarah/rust/getadd)
Finished dev [unoptimized + debuginfo] target(s) in 2.09s
Running `target/debug/getadd`
AddrInfo { socktype: 1, protocol: 6, address: 2, sockaddr: 23.47.49.38:80, canonname: None, flags: 0 }
AddrInfo { socktype: 1, protocol: 6, address: 2, sockaddr: 23.47.49.58:80, canonname: None, flags: 0 }
And it ran fine, returned the expected results, so then I ran a 32 bit version of it, one via gnu and another via musl:
~/rust/getadd master ?4 cargo run --target=i686-unknown-linux-gnu ✔
Compiling cfg-if v1.0.0
Compiling libc v0.2.103
Compiling socket2 v0.4.2
Compiling dns-lookup v1.0.8
Compiling getadd v0.1.0 (/home/sarah/rust/getadd)
Finished dev [unoptimized + debuginfo] target(s) in 1.93s
Running `target/i686-unknown-linux-gnu/debug/getadd`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: LookupError { kind: Service, err_num: -8, inner: Custom { kind: Other, error: "failed to lookup address information: Servname not supported for ai_socktype" } }', src/main.rs:12:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
~/rust/getadd master ?4 cargo run --target=i686-unknown-linux-musl 101 ✘
Compiling cfg-if v1.0.0
Compiling libc v0.2.103
Compiling socket2 v0.4.2
Compiling dns-lookup v1.0.8
Compiling getadd v0.1.0 (/home/sarah/rust/getadd)
Finished dev [unoptimized + debuginfo] target(s) in 2.00s
Running `target/i686-unknown-linux-musl/debug/getadd`
AddrInfo { socktype: 1, protocol: 6, address: 2, sockaddr: 23.47.49.58:80, canonname: Some("a1843.b.akamai.net"), flags: 0 }
AddrInfo { socktype: 1, protocol: 6, address: 2, sockaddr: 23.47.49.38:80, canonname: Some("a1843.b.akamai.net"), flags: 0 }
So 32-bit musl worked, 32-bit gnu libc did not, so I have my first reproducible test-case outside of steam! Out of time for tonight, but at least this gives me something that I can properly debug, so whipping out gdb next time. Or if someone beats me to it... :-)
But it makes me wonder what changed in recent 32-bit libc's (or the kernel?)... especially since musl works, hmm...
So back on the 21.04 kubuntu install, steam still doesn't work without the /etc/resolv.conf
remap in resolvd, however the above rust program works without that remap:
sarah@sarah-desktop:~/rust/getadd$ DISPLAY=:0 steam
// snip log
[2021-10-01 15:52:02] Downloading manifest: http://media.steampowered.com/client/steam_client_ubuntu12
[2021-10-01 15:52:02] Download failed: http error 0 (media.steampowered.com/client/steam_client_ubuntu12)
[2021-10-01 15:52:02] DownloadManifest - exhausted list of download hosts
[2021-10-01 15:52:02] failed to load manifest from buffer.
[2021-10-01 15:52:02] Failed to load manifest
[2021-10-01 15:52:02] Error: Download failed: http error 0
// snip more log
sarah@sarah-desktop:~/rust/getadd$ cargo run --target=i686-unknown-linux-gnu
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running `target/i686-unknown-linux-gnu/debug/getadd`
AddrInfo { socktype: 1, protocol: 6, address: 2, sockaddr: 23.38.188.186:80, canonname: None, flags: 0 }
AddrInfo { socktype: 1, protocol: 6, address: 2, sockaddr: 23.38.188.200:80, canonname: None, flags: 0 }
sarah@sarah-desktop:~/rust/getadd$ cargo run --target=i686-unknown-linux-musl
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running `target/i686-unknown-linux-musl/debug/getadd`
AddrInfo { socktype: 1, protocol: 6, address: 2, sockaddr: 23.47.51.19:80, canonname: Some("a1843.b.akamai.net"), flags: 0 }
AddrInfo { socktype: 1, protocol: 6, address: 2, sockaddr: 23.47.51.18:80, canonname: Some("a1843.b.akamai.net"), flags: 0 }
sarah@sarah-desktop:~/rust/getadd$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/getadd`
AddrInfo { socktype: 1, protocol: 6, address: 2, sockaddr: 23.47.51.18:80, canonname: None, flags: 0 }
AddrInfo { socktype: 1, protocol: 6, address: 2, sockaddr: 23.47.51.19:80, canonname: None, flags: 0 }
Sooo, that fizzled out, guessing steam isn't use getaddrinfo then? Can anyone say what its method of DNS resolution is for grabbing its manifest url?
Kubuntu 21.10 "just worked". I'm thinking it's something related to the version of "something" around the time period of 21.04, especially as it also affected manjaro (a rolling release), that at least now seems fixed in Kubuntu 21.10 (through multiple installs just to confirm). I'd like to install manjaro once more to give it a try but my wife may have taken her computer back with that new 21.10 partition... ^.^;
@OvermindDL1 your dns server additions fixed this issue on arch as well.
@OvermindDL1 your dns server additions fixed this issue on arch as well.
Oh that's very good to know! Thanks for the follow-up!
It does worry me that it's still happening, especially on a modern Arch system.
Can confirm the symlink change fixed the "Could not connect to Steam network" for me on Ubuntu 20.04:
dimvoly@computer:/etc$ sudo ln -fs ../run/systemd/resolve/resolv.conf resolv.conf
Thanks @OvermindDL1 !
I used these instructions to re-install Steam prior to doing the above, not sure if it makes a difference. It still complains about a bunch of i386 missing packages but Steam runs eventually and I can run install and games off it.
Your system information
Please describe your issue in as much detail as possible:
Wife's computer updated to Kubuntu 21.04 and steam stopped connecting to the network, would no longer load. The error it gave was the well known unable to connect to network, however looking at its output log says
http error 0
when trying to access a URL at media.steampowered.com, however I couldcurl
that URL just fine, no issues. There is no proxy or VPN or anything.Initial googling had a few things to try like installing the 32-bit version of ncsd among other 32-bit things, all were either already installed or I installed them (ncsd was the main one that wasn't already installed, though unsure why I would want that), all to no avail.
Tried
steam -tcp
, no change.Tried
steam --reset
, no change.No blocks in the firewall at all for anything, but nothing in the log even shows that its being hit for media.steampowered.com interestingly.
Wireshark shows there is no network attempt at all for anything related to steam, so that is weird...
Hooked up
strace
to steam to see what was going on, it wasn't doing anything with sockets at all, it would just print the message that it was attempting an update, then it spawned a new thread, a lock was shared between them, polled a dozen or so times, then released and the failed message was printed, there was nosocket
access or anything else going on at all.The other two computers on the same subnet steam is working perfectly.
Tried a dozen other things until...
I then installed the flatpak version of steam, it worked fine (well as fine as the flatpak version works, I.E. doesn't play well with multi-user audio for some reason and such but otherwise...).
strace
'd it and it definitely did a lot of socket access to acquire the files at the same URL without issue. This was not really a great replacement though due to the flatpak related steam issues.Well the system had a blank drive in it so I grabbed a Kubuntu 21.04 fresh installer and installed it all fresh on to that, everything seemed to be working fine on the new installation, fully updated the system, etc... So I then grabbed the deb installer from the steam website for steam and installed it, same network error, still no socket access at all, etc... Installed the flatpak version, and it worked.
I'm guessing some 32-bit library is missing or something, but no steam file I can find is trying to load something that is missing, there's no error messages,
strace
isn't showing anything trying to be accessed that is missing, etc...For more testing I also tried
steamcmd
on both the original and the new/fresh OS installs and it shows the same issue (although much faster to test with and less strace noise, wish I thought of testing with it earlier).Here's the complete strace output of steamcmd that shows the issue, you'll notice there's no
socket
calls at all within it, but there is within the flatpak versions or on one of the other other computers (that are running Kubuntu 20.10 still, stopped their update due to this issue), so for some reason steam/steamcmd is never attempting a socket connection at all. Here's a relevant snippet from the gist showing one of the attempts:As you can see it spawns a new thread, they synchronize with a futex lock, it reads
/etc/resolve.conf
(which just uses systemd's resolverd, same as it did on Kubuntu 20.10, in fact the contents are the same between all the computers), and then the main process gets its thread ID a number of times, then sleeps for 0.024 seconds a number of times, then the futex timed out so it printed the error. This pattern repeats for each attempt. And yes, the printed URL's bothdig
fine for the host name and download fine viacurl
. I did try a few different DNS servers just for testing but no change, not that it was accessing the network at all as you can see (but I tried changing DNS before I noticed the complete lack of network activity).What's really odd is why does the flatpak version work, and what is it doing differently from the host?
The network itself is very simple (although I don't "think" its relevant since steam/steamcmd are not trying to access the network at all), 3 desktop computers with wired ethernet to a central router that connects to the Internet gateway via a NAT, I did try with the computer as DMZ and as normal, but nothing special about the network really.
I'm at work currently but I can acquire any further information if wanted this evening or maybe sooner depending on what is needed (I set up remote SSH access over my home 20.10 desktop SSH tunnel though no gui access, although I guess I can tunnel VNC or so over if needed).
Steps for reproducing this issue: