djp952 / pvr.hdhomerundvr

Unofficial Kodi HDHomeRun DVR PVR Client
https://github.com/djp952/pvr.hdhomerundvr/wiki
Other
60 stars 9 forks source link

Excessive DNS requests (over 200k per day) #49

Closed lurkermike closed 5 years ago

lurkermike commented 5 years ago

While troubleshooting an unrelated issue, I discovered that every 5 minutes, my Kodi install is doing about 1000 DNS lookups for the A and AAAA records of api.hdhomerun.com.

My Kodi install is fairly basic. The only HDHR-related addon I use is the pvr.hdhomerundvr addon.

Any idea what's going on? Is there some DNS lookup that's happening per-channel every 5 minutes?

Currently this is resulting in over 200,000 DNS lookups per day from my Kodi install, looking for api.hdhomerun.com.

I appear to be using version 1.2.6 of pvr.hdhomerundvr on Kodi 15.2 on Ubuntu Linux.

djp952 commented 5 years ago

I agree, that seems extremely excessive! The EPG updates are very HTTP request heavy, as I can only get around 8 hours at a time for any given channel, but even that wouldn't come close to what you're seeing here, certainly not every 5 minutes.

What does happen every 5 minutes (exactly every 5 minutes, as a matter of fact) is the device discovery. This goes out to api.hdhomerun.com to get your device list. It's only one HTTP query, though, I would expect only one DNS lookup to be required, certainly not 1000 of them. That said, version 1.2.6 is quite old at this point and could suffer from some malady that no longer exists. In version 1.2.10 I enabled a feature of the HTTP library to cache and share DNS lookups, which greatly improved performance (perhaps this is why!), and there are of course a lot of other fixes that were applied as time went on.

I think we can alleviate the bulk of your problem by switching you over to UDP discovery. This avoids hitting api.hdhomerun.com every 5 minutes like that. If you go into the addon settings, in the General tab the first option will be "Use network broadcast for device discovery". I'm betting that will settle things down for you immediately. If the root cause is in that version of the HTTP library, you will probably still see spikes in DNS activity for the other api.hdhomerun.com actions.

If I can get some additional information from you, I can set up a test here with the exact versions of everything and see if this is still a problem in the current versions as well:

You can also try updating to the latest release, v1.3.9, to see if that helps anything. If you can wait on that, though, I would like to be able to identify the root cause with your help. If this is still happening, it's probably happening to lots of people!

lurkermike commented 5 years ago

Thanks for your response!

This issue is happening every 5 minutes exactly, at 00, 05, 10, 15 ...

The exact Ubuntu release is Ubuntu 16.04.5 LTS. It is a 64-bit version.

I discovered this via another internet issue I was having (duplicate IP) that manifested itself in timed out connections, including DNS. I noticed that there was a large volume of DNS requests coming from my Kodi device trying to look up api.hdhomerun.com. I confirmed the traffic with tcpdump and then opened the issue here.

The Kodi version was my mistake. The version reported back from Kodi is 17.6 Git:20171114-a9a7a20, I installed this from an Ubuntu repository.

Please let me know if there is more information I can provide. I can even figure out how to get you access to the machine specifically if it would be of use.

djp952 commented 5 years ago

This is perfect, thanks! I can start getting this set up and will let you know what I find out. Access to the machine shouldn't be necessary, but I appreciate it! Hopefully it won't take too long to get to the cause (and hopefully something that's already fixed - lol)

djp952 commented 5 years ago

Well, I'm not sure if this is good or bad news, but I was able to get everything set up and I … don't see anything wrong here. I'm getting exactly one A and one AAAA lookup every 5 minutes during device discovery:

22:22:04.782360 IP UBUNTU-1604LTS.mshome.net.54577 > BREHMM-W10.mshome.net.domain: 2031+ A? api.hdhomerun.com. (35)
22:22:04.782382 IP UBUNTU-1604LTS.mshome.net.54577 > BREHMM-W10.mshome.net.domain: 50925+ AAAA? api.hdhomerun.com. (35)
22:22:04.803537 IP BREHMM-W10.mshome.net.domain > UBUNTU-1604LTS.mshome.net.54577: 2031- 2/0/0 CNAME api.silicondust.net., A 216.218.230.214 (120)
22:22:04.865189 IP BREHMM-W10.mshome.net.domain > UBUNTU-1604LTS.mshome.net.54577: 50925- 1/0/0 CNAME api.silicondust.net. (85)

I'm unsure what would be causing the problem. Can you try starting tcpdump and then executing 'nslookup api.hdhomerun.com' from a Terminal? Here I see a single A record lookup for that operation. I'm wondering if we are looking at a systemic DNS lookup issue as opposed to a Kodi thing.

A couple follow-up questions - are you using your ISP's DNS or one of the public ones, like Google's 8.8.8.8/8.8.4.4? In your tcpdump, are you getting the same values that I do for the length of the A/AAAA results - the "(35)" above, Are those values always the same or perhaps maybe the last A record right before the CNAME entrie(s) pointing to api.silicondust.net (216.218.230.214) is a different length?

I ran both the version you are running, 1.2.6 as well as the latest, 1.3.9, and saw the same general results. Each A / AAAA record entry only appears once before the name is resolved successfully.

Let me know! I'll play around with things like putting a fake proxy server in Kodi or messing with the IPv4/IPv6 interfaces to see if I can get anything similar to occur.

lurkermike commented 5 years ago

So, it may be a more subtle bug. I restarted Kodi a bit ago after responding to you and I see several lookups (maybe 10) when Kodi starts, but then only a few on a regular basis now.

This is the tcpdump from "nslookup api.hdhomerun.com" that you asked for:

20:20:41.448574 IP (tos 0x0, ttl 64, id 2500, offset 0, flags [DF], proto UDP (17), length 63) 10.255.255.21.34323 > 10.255.255.3.53: [bad udp cksum 0x1454 -> 0x4349!] 32820+ A? api.hdhomerun.com. (35)

20:20:41.448853 IP (tos 0x0, ttl 64, id 53697, offset 0, flags [DF], proto UDP (17), length 112) 10.255.255.3.53 > 10.255.255.21.34323: [udp sum ok] 32820 q: A? api.hdhomerun.com. 2/0/0 api.hdhomerun.com. [44s] CNAME api.silicondust.net., api.silicondust.net. [7m8s] A 216.218.230.214 (84)

I uploaded the tcpdump log from when it was happening regularly here: https://transfer.sh/9nTqr/hdhrq.txt

10.255.255.3 is the DNS server and 10.255.255.21 is the Kodi device.

djp952 commented 5 years ago

Thanks! I'll grab it and see what I can learn for us!

lurkermike commented 5 years ago

I'm not sure it'll teach you much, it's just several thousand DNS queries, but glad to provide what I have :)

lurkermike commented 5 years ago

So, I left my tcpdump going and it seems to be doing the requests about every 2 hours now rather than every 5 minutes. See: https://imgur.com/a/QrGDaOs

The rightmost spike is ~10k queries. Smaller ones are ~5k queries.

Is there some debugging I can enable over night that might be useful?

djp952 commented 5 years ago

That's quite alright! It's actually telling in a very important way - the actual HTTP query being executed against api.hdhomerun.com is completely absent from the log. This means the PVR code isn't directly causing it -- it's all taking place from cURL (HTTP library). The PVR is asking cURL to go fetch the data from a URL and waiting until it comes back. The loop is happening somewhere in there.

Why cURL is doing that is of course the question that needs to be answered here, and I've been researching documented cURL bugs to see if anything stands out. Not much so far, but I did find an option I can set to not allow IPv6 DNS lookups that I didn't know about previously. There is no IPv6 DNS name for api.hdhomerun.com, so it's likely I can set that option and save a few more milliseconds here and there (they do add up!). This should eliminate the AAAA lookups, but not necessarily truly solve anything.

lurkermike commented 5 years ago

That's quite alright! It's actually telling in a very important way - the actual HTTP query being executed against api.hdhomerun.com is completely absent from the log.

Ah, this may be intentional. I realized as I was reading what you said that I was only capturing traffic to/from the DNS server, not all traffic, which would exclude any subsequent HTTP requests. I will capture all traffic and see if I can catch it when it happens next.

djp952 commented 5 years ago

So, I left my tcpdump going and it seems to be doing the requests about every 2 hours now rather than every 5 minutes. See: https://imgur.com/a/QrGDaOs The rightmost spike is ~10k queries. Smaller ones are ~5k queries. Is there some debugging I can enable over night that might be useful?

Being able to correlate the Kodi log timestamps with the tcpdump log timestamps might be useful. Nothing that hits api.hdhomerun.com defaults to a 20 minute cycle. Hitting the tuners and the DVR engine defaults to every 10 minutes but shouldn't need a DNS lookup since it's done specifically by IP address.

Ah, this may be intentional. I realized as I was reading what you said that I was only capturing traffic to/from the DNS server, not all traffic, which would exclude any subsequent HTTP requests. I will capture all traffic and see if I can catch it when it happens next.

Perfect - thank you! If you find HTTP traffic between the requests that would absolutely move the problem back into the PVR code. Perhaps a problem with the internal task scheduler on the platform. The scheduler uses a lot of "Standard Library" stuff as opposed to OS specific stuff, and it's been the source of more than one problem in the past. I'm not seeing any issues (yet), but if we can confirm the HTTP requests are going through each time it will actually be much easier to fix :)

lurkermike commented 5 years ago

Will report back in the morning. Thank you very much for your help thus far!

lurkermike commented 5 years ago

So, it looks like the DNS requests are related to the guide updates.

I see a DNS lookup (two actually, A+AAAA), followed by an HTTP request to 216.218.230.214 (api.hdhomerun.com) which is looking for:

GET /api/guide?DeviceAuth=[removed]&Channel=2&Start=1548471097

Followed by another DNS lookup and another HTTP request:

GET /api/guide?DeviceAuth=[removed]&Channel=2&Start=1548500400

Followed by another DNS lookup and another HTTP request:

GET /api/guide?DeviceAuth=[removed]&Channel=2&Start=1548529200

Followed by another DNS lookup and another HTTP request:

GET /api/guide?DeviceAuth=[removed]&Channel=2&Start=1548558000

It is doing this now every 2 hours or so. It did 10356 lookups at 22:58 last night (half A/half AAAA)

I can provide you the tcpdump logs if that would be useful to you.

djp952 commented 5 years ago

That seems right for the older versions, actually. Until 1.2.10 it would do a DNS lookup for each and every EPG request. Telling cURL to share/cache DNS made EPG much faster, and you won't see nearly as many A/AAAA requests.

I left tcpdump and Kodi running all night and didn't find anything out of the ordinary for that version either. Other than an absolutely massive dump file, of course :)

The ton of HTTP queries for EPG is currently unavoidable. The HDHomeRun back end will only provide ~8 hours of data for any given channel per request. It's designed for the SiliconDust app, with their 'slice' EPG as opposed to a grid EPG like Kodi. On my connection, each EPG request takes on average 75ms to complete and process, which is equivalent to the performance of the SD app (verified with a sniffer). So a bit of math can illustrate how this snowballs quickly …

Let's say you have 300 channels and have your Kodi guide set for the full 2 weeks. In order to get a complete EPG load that ends up roughly as 300 3 14 * 75 = 945000ms, or 15.75 minutes. Add in each of the DNS lookups on the older versions, and it works around a little over 20 minutes.

Here's what I recommend. Update to the most recent PVR version (1.3.9) from here: https://github.com/djp952/pvr.hdhomerundvr/wiki/Downloads-for-Kodi-Krypton. You want the GNU Linux x86_64 .zip file. There are a couple things I think you'll gain from this. First, the DNS won't get slammed nearly as hard for the EPG, and it really speeds things up. Second, at some point (I don't recall exactly when) SiliconDust fixed a bug on their end that was allowing me to load EPG data from the past. When they fixed that, the PVR couldn't satisfy Kodi's request for that data and returned an error. For some people this made Kodi ask for EPG data much more often than it needed to.

If for some reason 1.3.9 doesn't work well for you or causes you some issue, you can revert back to 1.2.x anytime, Kodi will let you install older versions. Your original PVR database file will still be there, it will simply pick up where it left off. If you revert, however, I suggest using version 1.2.10 (https://github.com/djp952/pvr.hdhomerundvr/releases/tag/krypton-1.2.10) instead of 1.2.6. That was the last 1.2.x release and had the DNS changes in it.

I'll keep watching the dumps to see if I can replicate the DNS storm every 5 minutes like you encountered, and I would be very interested to know if you find the network activity from the newer PVR version(s) more reasonable. I wish there was something I could do about the EPG but until/unless SD changes the backend to allow more data to be retrieved at one time we are kind of stuck there.

lurkermike commented 5 years ago

Okay, I've upgraded to 1.3.9 and will continue to monitor.

One thing that occurs to me is that if there was some way to make the guide requests using HTTP/1.1 and a persistent connection, rather than an individual cURL request per guide request, that would cut down on the DNS requests and HTTP requests to the API.

Not sure if this would require a significant architectural change of how you do guide updates or if that may already be how the more recent version works. Just a thought!

djp952 commented 5 years ago

All requests go out as HTTP 1.1 with keep-alive support already, and cURL is configured to reuse connections when possible, but there is no pipelining support right now. The way Kodi asks for the data and the way HDHomeRun API returns it doesn't easily lend itself to doing more than one request at a time. Kodi asks for a single channel and a time frame, so I send the channel and start time into the API and parse what it sends back. I use the last end time I got as the next start time for any subsequently required requests until the time frame is satisfied.

Knowing that the API typically returns 8 hours of data at a time would allow me to kick off multiple requests for that single channel by introducing some reasonable overlap, but in testing that the gains were pretty trivial. I didn't go so far as to see how Kodi would handle semi-random guide data being thrown at it by processing the results in parallel, at least not yet.

It's definitely something that's on the table and has been for a while, the hold up for me is that I want to ask SiliconDust for their permission first. Their backend isn't designed for these mass queries, I want to be sure it wouldn't cause them any more grief than I'm already causing them.

To me, the ideal scenario would be for the backend to provide all the data that's asked for instead of forcing it into chunks. Based on the rather tiny size of the payload itself, the lionshare of time spent must be the backend database query that's taking place on their end. I'd bet you money that getting all the data requested for a channel would only add a few milliseconds as opposed to having to ask over and over again along with thousands of other people doing the same thing :)

FWIW, I am about to test a change that disables the IPv6 AAAA record lookups altogether. The HDHomeRun services are IPv4 only so it's pointless to try. I'd never even thought about that! Props to you for your investigations here, and I truly appreciate all suggestions. Every time somebody opens an Issue, I either learn something new or make somebody's life a little bit better. It's what makes this project stay fun :)

lurkermike commented 5 years ago

After upgrading to 1.3.9, the DNS requests went from ~120k/day to ~1k/day.

I should have tried upgrading before I raised the issue, but at least it was a simple fix.

Thanks again for your help!

djp952 commented 5 years ago

Anytime! Thanks to you the DNS requests in the next version should be cut roughly in half again too! The AAAA record lookups will be gone - confirmed. I also reached out to Nick at SiliconDust about parallelizing the EPG requests and if that would cause them any grief.

Always a pleasure to hear from folks. Like I said I either get to learn new things or solve something. This time it was both! Bonus. :-)

djp952 commented 5 years ago

Hi again -- posting to a closed issue but wanted to let you know that I did release a version with pipelined EPG requests today (1.3.11) and the improvement can be jaw-dropping. I really appreciate you pointing out that it was an option for HTTP 1.1 servers. The way I did it should also support HTTP/2 multiplexing if they ever upgrade to that. I honestly never thought about pipelining the requests, I've always approached it as a serial set of operations.

I made sure to give you props on the post I made at SiliconDust as well. It's not a beer or a pizza, but at least credit where credit is due!!

lurkermike commented 5 years ago

Nice. I will check it out. Thanks!