OpenELEC / OpenELEC.tv

OpenELEC - The living room PC for everyone
http://openelec.tv
1.61k stars 883 forks source link

DNS issues: Covers, Artwork, TV Series image, rss feed fail to load (most of them) #2319

Closed x-cimo closed 11 years ago

x-cimo commented 11 years ago

I have been struggling with OpenElec on all my PC. I am using the Generic build. The issue is that anything that need to be fetched from tvdb etc fail to fetch.

My install are clean OpenElec 3.0.3 Generic build

In about ~100 movies, maybe 5-10 fetched their content.

I have been documenting this in a thread here: http://openelec.tv/forum/72-xbmc/64460-most-movie-cover-and-artwork-background-don-t-load#74575

I have found out that it's related to DNS. Dns are slow to resolve even with 8.8.8.8

ping google.ca take 10 sec before starting. however ping -4 google.ca works perfectly.

I have disabled ipv6, tried static ips, set google dns nothing worked.

As a last resort, I added the IPs for movie db and other image provided that xbmc use in my .config/hosts.conf and IT WORKED.

All my cover, preview, art work all started to load right away.

Here is what I added in hosts.conf 204.246.169.111 cf2.imgobject.com 204.246.169.231 cf1.imgobject.com 204.246.169.83 cf3.imgobject.com 190.93.253.95 thetvdb.com

Obviously hardcoding IP is bad... Does anyone have an idea what could be going wrong?

vslavik commented 11 years ago

In my case, the DNS server used doesn’t have much of an impact:

$ cat /etc/resolv.conf
# Generated by Connection Manager
nameserver 8.8.8.8

But I tried modifying /var/cache/resolv.conf to use my ISP’s DNS servers, Google’s (above) and OpenDNS, it’s all the same.

But — apparently, this is something of a longstanding issue since glibc-2.10 when built with dual-stack support. See a discussion of this issue at Arch Linux forums, including further links and workarounds: https://bbs.archlinux.org/viewtopic.php?id=75770

Drepper dismisses this as broken DNS servers or firewalls (http://udrepper.livejournal.com/20948.html), but it seems to be more nuanced than that (https://fedoraproject.org/wiki/Networking/NameResolution/ADDRCONFIG or https://bugzilla.redhat.com/show_bug.cgi?id=505105 or any number of similar reports for Ubuntu and others). I’m seeing this behind RouterOS routers (which is usually a higher quality than your usual Linksys &c junk) and I’m not aware of any firewalling or filtering by the ISP — but of course that doesn’t say much, does it. I didn’t have any luck making Wireshark capture anything on another machine promiscuously so far to check this.

Supposedly what happens is that glibc sends both A and AAAA requests in the same query and some broken IPv6-unaware DNS servers don’t send any reply back. (I’m not sure about the firewall-eats-it argument, I think the likelihood of OpenDNS and Google being broken is quite low.) glibc solves this with a timeout, hence the delay. The timeout should only happen once, after which glibc knows the DNS server is problematic and won’t do this optimization again — but it happens once per process. So if thumbnails are fetched by dedicated processes launched for each thumbnail, that would explain the delays; it certainly explains ping google.com delays.

The above thread suggests the following workarounds (if you don’t want to patch glibc):

  1. Add options single-request to /etc/resolv.conf. I verified that this does help in my case.
  2. Use nscd. This would help, because all DNS queries would pass through the single caching daemon process and only the first query would incur the timeout. This sounds like a generally good idea to do to me, regardless of this issue.
dukeczech commented 11 years ago

I confirm that add options single-request works well! (ping + loading fanart images). Thanks for solving this nasty glibc bug. And yes, i use RouterOS router like @vslavik does, but i dont think, it makes any difference.

rezou commented 11 years ago

And I use a Juniper SSG-5 for my router, not a crappy off the shelf piece of equipment for sure. I will try the single-request if I actually decide to dump XBMCbuntu for OpenELEC.

jenkins101 commented 11 years ago

@stefansaraev ?

stefansaraev commented 11 years ago

single-request is not an option for now.. 53b7c6000a could be a possible fix. please test

stefansaraev commented 11 years ago

@dukeczech @vslavik if gai.conf does not fix the issue for you, please also test 0c5caa599f5982. or ping @sraue to make a testbuild but I believe jut patching gai.conf should be enough

for a quick gai.conf test without rebuilding. (changes lost on reboot):

cp -a /etc/ /tmp/
mount --bind /tmp/etc/ /etc/

edit /etc/gai.conf and add this:

# this is likely already present
precedence ::ffff:0:0/96  100

# this is important
scopev4 ::ffff:169.254.0.0/112  2
scopev4 ::ffff:127.0.0.0/104    2
scopev4 ::ffff:0.0.0.0/96       14

ensure to remove option single-request in resolv.conf while testing

and btw. nscd is not an option too

vslavik commented 11 years ago

53b7c60 could be a possible fix

Sorry, no, like any other time, shotgun debugging doesn’t work this time either. gai.conf rules describe ordering of the getaddrinfo() result set, while the issue is with receiving the results before that.

please also test 0c5caa5.

…unlike this, which does touch the relevant code. But yes, a build to test would be handy (apparently the nighties are not built anymore?).

single-request is not an option ... and btw. nscd is not an option too

Thanks for that enlightening explanation.

dukeczech commented 11 years ago
stefansaraev commented 11 years ago

FYI with 0c5caa5, if yoy have ONLY link-local address, there will be NO AAAA query at all. if you live in a real v6 environment and your host is (supposed to be) reachable from the net - this does not apply to you, and well.. you have a problem :)

vslavik commented 11 years ago

FYI with 0c5caa5, if yoy have ONLY link-local address, there will be NO AAAA query at all.

I didn’t try it (unlike @dukeczech, who said above this does not help), but I very much doubt this patch helps either. In my configuration, I don’t even have link-local IPv6. The only IPv6-capable interface is the loopback:

# ifconfig 
eth0      Link encap:Ethernet  HWaddr 00:01:2E:23:52:14  
          inet addr:192.168.11.101  Bcast:192.168.11.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1707084 errors:0 dropped:9 overruns:0 frame:0
          TX packets:349584 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:2503820015 (2.3 GiB)  TX bytes:30840773 (29.4 MiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:4 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:288 (288.0 B)  TX bytes:288 (288.0 B)

#

And you’ll notice that the code touched by this patch already used IN6_IS_ADDR_LOOPBACK check before — in other words, the logic stays the same in my network’s configuration. Yet AAAA queries are sent.

vslavik commented 11 years ago

And I suppose options single-request-reopen “is not an option” either?

stefansaraev commented 11 years ago

http://openelec.tv/news/20-project/102-new-unofficial-addon-repo here we have a "tcpdump" addon. test with / without 0c5caa5 and / or ipv6 fully disabled via extlinux.conf (append ipv6.disable=1) or sysctl net.ipv6.conf.*.disable_ipv6: 0

tcpdump -nNpvi eth0 port 53

to add any options to resolv.conf we have to patch connman. I would like to avoid this but make ipv6 support optional and disabled by default instead.

dukeczech commented 11 years ago
  1. with 0c5caa5 & with broken router (Mikrotik RB150 with RouterOS)
  2. with 0c5caa5 & with router (d-link DIR655)
  3. without 0c5caa5 & with broken router (Mikrotik RB150 with RouterOS)
  4. without 0c5caa5 & with router (d-link DIR655)
OpenELEC:~/.xbmc/userdata # ifconfig 
eth0      Link encap:Ethernet  HWaddr 80:EE:73:07:79:3B  
          inet addr:192.168.1.35  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2747 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1291 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:517852 (505.7 KiB)  TX bytes:188761 (184.3 KiB)
          Interrupt:44 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:140 errors:0 dropped:0 overruns:0 frame:0
          TX packets:140 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:15208 (14.8 KiB)  TX bytes:15208 (14.8 KiB)
stefansaraev commented 11 years ago

uh. huh. and you are sure it is built with 0c5caa5. the result here is a bit different:

no v6: http://sprunge.us/bJgc v6 link-local: http://sprunge.us/AcNL v6 http://sprunge.us/IZKU

EDIT: well. I am on master/glibc 2.18. will switch now to 3.2 branch and rebuild. for eglibc 2.17 the patch might be a bit different. will let you know when I am done and will provide a generic64 build for testing if it's ok for you

dukeczech commented 11 years ago
stefansaraev commented 11 years ago

rebuilding eglibc only is enough

      APPLY PATCH (common):   /home/duke/openelec/openelec-3.2/packages/toolchain/devel/eglibc/patches/eglibc-fix-dns-with-broken-routers.patch

^^ this looks fine but you must do "PROJECT=xxx ARCH=yyy ./scripts/clean eglibc" before make release.

EDIT: ups. I didn't noticed the "> build.log" redirect. it is okay.

dukeczech commented 11 years ago
duke@intel-i7:~/openelec/openelec-3.2/build.OpenELEC-ION.x86_64-devel/eglibc-2.17-22321/sysdeps/unix/sysv/linux$ diff check_pf.c check_pf.c.orig 
236,237c236
<             if (!IN6_IS_ADDR_LOOPBACK (address) &&
<             !IN6_IS_ADDR_LINKLOCAL (address))
---
>             if (!IN6_IS_ADDR_LOOPBACK (address))
stefansaraev commented 11 years ago

doing a clean build (x86_64 generic but will work on your box). it takes some time. will let you know when it's ready. thanks for your time and effort

stefansaraev commented 11 years ago

@dukeczech can you please join irc and ping @sraue for a testing build?

dukeczech commented 11 years ago
stefansaraev commented 11 years ago

can you please do

curl google.com

and watch tcpdump output.

dukeczech commented 11 years ago

it takes ~5 sec to start curl exactly like ping: tcpdump

stefansaraev commented 11 years ago

good ;) thank you for your time and testing. now I know for sure what is wrong and I can reproduce here. a propper fix could take some time.

stefansaraev commented 11 years ago

patch from 67cf9779104 need testing. expected behaviour: v4 only: http://sprunge.us/MQhZ v4 + v6 link-local only: http://sprunge.us/XHGU v6 + v6: http://sprunge.us/MDOe

dukeczech commented 11 years ago
The proper function of AI_ADDRCONFIG requires that: 
 1. The usual processing of all node-local and link-local names and addresses is preserved as long as the respective addresses are present. 
 2. The global name resolution is not affected by the existence or non-existence of node-local and link-local addresses. 
 3. IN AAAA DNS queries should not be transmitted from a node with no global IPv6 address, and vice versa: IN A queries should not to be transmitted from a node with no global IPv4 address. 

Unfortunately, the current implementation of getaddrinfo() mostly follows the informational RFC 3493, which fails in both #1, #2, and partially in #3.

EDIT: i make 6to4 tunnel Hurricane Electric tunnel broker services to my lan network:

stefansaraev commented 11 years ago

this fix is in 3.2.2 now. thanks for reporting and testing.