bedrocklinux / bedrocklinux-userland

This tracks development for the things such as scripts and (defaults for) config files for Bedrock Linux
https://bedrocklinux.org
GNU General Public License v2.0
603 stars 64 forks source link

Some programs won't resolve DNS on specific networks #165

Open SeerLite opened 4 years ago

SeerLite commented 4 years ago

Now this is a funny one. On my main Wi-Fi network (let's call it Wi-Fi Network A), everything works completely fine and DNS entries are resolved correctly. However, there's a specific network that I also connect to (let's call it Wi-Fi Network B) where programs will have trouble resolving DNS correctly for more obscure pages. As an example I'll use https://bedrocklinux.org/. Works completely fine on Wi-Fi A, but returns Chrome error DNS_PROBE_FINISHED_NXDOMAIN on Wi-Fi B.

Some programs don't seem to have this issue. I say this because both nslookup and dig return the IP for https://bedrocklinux.org/ correctly. Yet even after running them, trying to open the page on a browser fails.

This seems to be browser-agnostic: Vivaldi, Chromium, Firefox and even w3m all have this issue. Time zone doesn't seem to be related either, as I ran the same tests on multiple time zones, so I think whatever is causing #161 is unrelated to this issue (just connecting the dots because that one seems to be browser related too).

I have no idea how to even begin debugging this. It's an area where I'm almost completely lost. I'm using NetworkManager, so I guess that can be used as a starting point. Also, both Wi-Fi networks are Android hotspots, each of them using a different ISP, with cellular data for internet. I don't think the ISP is too relevant though. After all, this issue doesn't happen on Arch.

Right now I don't have the opportunity to try test more Wi-Fi networks, but I'll report any news about this here as soon as I do. I'm also going to try to look for relevant information about each network (nmcli connection show) tomorrow. I'm posting this here first because I wanted to clear my mind a bit, and also in case anyone else comes along with a similar issue. (Not probable, considering that I haven't seen other issues about this, and that my setup is kind of weird and specific anyway.)

paradigm commented 4 years ago

Different distros / networking systems have different expectations around /etc/resolv.conf. For example, some expect it to be a symlink pointing to one temporary place, and others expect it to be pointed to another place. Often they won't change /etc/resolv.conf to fit their expectations, but continue on confused. To ensure things work consistently across distros, Bedrock removes /etc/resolv.conf at boot and configures networking setups to (re)create it to their expectations.

Given nslookup works, you likely have a good resolv.conf file, and this probably isn't relevant.

That's the only networking specific thing I can think of that Bedrock does. I don't have any other ideas for how Bedrock could come into play here. I can't reproduce the issue. I also don't recall anyone else reporting such issues.

SeerLite commented 4 years ago

I don't have any other ideas for how Bedrock could come into play here. I can't reproduce the issue. I also don't recall anyone else reporting such issues.

Yup, this issue seems pretty specific to my machine. Not sure why though. Also, I'm sure it has to do with bedrock, I can't reproduce this on Arch at all.

I'm gonna try to look up what these DNS errors mean, and maybe run some debugging programs on browsers to see what's up. The issue doesn't seem to be network specific anymore either, happens on both networks at random.

SeerLite commented 4 years ago

I think I might have (temporarily) fixed this issue by editing /etc/nsswitch.conf. I edited hosts: line from: hosts: files mymachines myhostname resolve [!UNAVAIL=return] dns to hosts: dns files mymachines myhostname resolve [!UNAVAIL=return]

I tried this as this specific line was being referenced in a lot of Ubuntu issues similar to this one (example).

I don't know if this is a great solution or if it might have side effects for some applications and I also still don't know why I experience this problem under Bedrock in the first place, but at least I can use this workaround(?) for now.

I might look into this again at some point, maybe this leads somewhere. I'm leaving this open for now because a workaround was necessary.

nift4 commented 4 years ago

I think you should change the line to hosts: files mymachines myhostname resolve dns [!UNAVAIL=return]

SeerLite commented 4 years ago

@nift4 Thank you. That seems to work too. What is the [!UNAVAIL=return] part about and why would reordering this line fix this (most of the time)?

I still come across DNS problems in Bedrock, almost daily. Sometimes I have to refresh/reconnect for about 5 minutes, waiting for it to solve itself randomly. It's not as frequent as before, but it definitely happens. I just live with it because I'm too lazy to actually do anything about it, mostly because no one else seems affected by it so and I feel like I'm at a dead end. Hopefully someday I get to fix this

nift4 commented 4 years ago

I guess [!UNAVAIL=return] is almost self-explanory but it means the following: If the hostname cannot be found (it looks in all places before [!UNAVAIL=return]) there comes an "Address unreachable" error. But as it was configured to never use DNS (I don't understand why, that makes no sense, I too don't understand why it happened randomly and not always) you found out enabling DNS fixes it. But you made DNS the most important thing, overwriting any other hostname resolution (mostly for your local WiFi), so I suggested to enable DNS, but keep local hostnames too.

SeerLite commented 4 years ago

Ah, I understand now. Thanks!

it was configured to never use DNS (I don't understand why, that makes no sense, I too don't understand why it happened randomly and not always)

This is default Arch configuration though, and it works fine on a default Arch installation. Are you sure that by having dns as the last thing, it gets disabled?

nift4 commented 4 years ago

No ^^ (but it seems so)

nift4 commented 4 years ago

Oh no, I guess I know what's the problem: resolve doesn't work. (Seems similar to DNS and is enabled by default. I have no idea what it does. Good luck googling ;D)

SeerLite commented 4 years ago

Alright, now I can say for sure that I've never seen something as confusing as nsswitch.conf and the DNS stuff that's related to it.

I think I finally figured something out with this post and with nss-resolve(8).

So basically, resolve [!UNAVAIL=return] dns ensures that resolve is used instead of dns but only if resolve is available. (resolve is systemd-resolved).

@nift4 Thank you, this might indeed be related: I realized now that I had systemd-resolved enabled on my Bedrock installation, but not on Arch. I don't recall if enabling it was something I did to troubleshoot this issue in the first place though (and Bedrock should work either way no matter the NSS service used anyway), so maybe it's not related at all. Still, it's worth a try.

I've disabled systemd-resolved and reset nsswitch.conf back to the default. I'll report here if I still get issues.

SeerLite commented 4 years ago

Update: Yup, I get issues with that setup. :/

Something must be going wrong with resolve. Maybe programs/glibc(?) think it's available when it's not and so they don't fall back to dns and the DNS doesn't get resolved. (I hope that doesn't sound confusing)