Open agneevX opened 2 years ago
I also would like an option to configure the upstream policy.
Maybe we could implement a configuration enum like:
This would mirror the first two options in adguard home. (I don't get the third option and never used it 😅)
Current implementation was designed to combine privacy with performance:
We can provide additional "strategies", like strict, random or random weighted based on upstream resolver response time.
Maybe we can also implement a "hyperlocal" mode: Blocky works as a recursive resolver and doesn't rely on any upstream resolver? That means blocky will recursively ask the corresponding name server and caches results. This will significantly improve the privacy, but is probably slow for queries with many subdomains.
Any thoughts?
What do you mean by "corresponding name server"? Do you mean something like Unbound?
Yes, like unbound, but in blocky. In this case we can reuse blocky's cache and provide additional prometheus metrics.
A few things I've observed when I used to use Unbound to query root name servers:
I don't think that it would be feasible to include a recursive dns server option. Most users won't use it as forward dns servers are more common. Therefore it would most likely just increase binary size.
In my setup there are multiple unbound instances as upstream resolvers for blocky. Even I wouldn't use an internal recursive option as this would reduce my fault tolerance and configuration option.
I'm currently trying to migrate from Pi-Hole to Blocky, since it is much better suited for running on K8s, but this issue is currently blocking me from doing so, unless I'm missing another option. I want the LanCache DNS server to always be preferred if it is available.
My current Setup, with Pi-Hole using strict order, looks like
Router --- Pi-Hole --- LanCache --- Unbound
\_______________________/
With Blocky, I think currently the only options would be
Router --- LanCache --- Blocky --- Unbound
or
Router --- Blocky --- LanCache --- Unbound
with LanCache being a SPOF since both Blocky and Unbound have multiple replicas.
@reitermarkus is that the Steam LAN thing?
If so, it should not be a problem if LC answers queries faster than your other upstreams.
Yes, it's for caching Steam games, among other things.
Well, my other upstream is Unbound running in the same cluster, so it's quite likely that LanCache will not be significantly faster, if at all.
I'm not sure about blocky, but I know AdGuard Home has a Fastest IP feature that does exactly what you want.
Conditional DNS configuration (https://0xerr0r.github.io/blocky/configuration/#conditional-dns-resolution) could work if you can figure out which DNS names are used (steamcontent.com for example for steam, maybe others?) Did you try this approach?
AdGuard Home has a Fastest IP feature that does exactly what you want.
I had a look ad AdGuard Home before finding Blocky, but it has the same issue as Pi-Hole: No easy way to have multiple replicas.
Conditional DNS configuration (https://0xerr0r.github.io/blocky/configuration/#conditional-dns-resolution) could work
That depends: Will conditional DNS fall back to using the default upstream when LanCache DNS is down?
That depends: Will conditional DNS fall back to using the default upstream when LanCache DNS is down?
No, blocky will ask your lancache instance and if it returns NXDOMAIN, there is no fallback. Is it not the desired behaviour? Since lancache will either return the ip of local cache or the origin ip.
The problem would be if LanCache is down, now I cannot resolve any cached domains. Basically, I want to be able to download game updates even if LanCache is down for whatever reason.
Currently, this works by having LanCache as first DNS server, and if it is down, fall back to the next, i.e. downloads fall back to using the uncached upstream IP.
Is this the way how pihole works? If one upstream DNS is down, it tries the second one (and not round-robin)? That means, if you query for example for "google.com", the pihole will ask you lancache instance first, does lancache return NXDOMAIN or will it resolve this query properly (by using some external resolver)?
Is this the way how pihole works?
Not by default, but since it uses DNSmasq, I can configure it to use strict order.
the pihole will ask you lancache instance first, does lancache return NXDOMAIN or will it resolve this query properly (by using some external resolver)?
LanCache will resolve it, using Unbound as upstream. And the same Unbound server acts as the fallback DNS server in Pi-Hole.
So in case LanCache is running:
Pi-Hole -> LanCache -> Unbound
In case LanCache is down:
Pi-Hole -> Unbound
ok, got it. The requested "strict order resolution" will solve this challenge. With conditional mapping, you won't get the fallback resolution.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.
Hi all, I would like to contribute both the strict
& random
(non weighted) resolvers.
Where should we add the new upstreamStrategy
config field. Should we start with adding it as a global enum which configures the strategy for all upstream groups?
If there are use cases for a group scoped enum we could still discuss adding it later on.
Hi all, I would like to contribute both the
strict
&random
(non weighted) resolvers.That sounds good! :+1:
Currently, we do have the "upstream" section and related "UpstreamTimeout". The "upstream" section is not a nested struct, but only a map (historical reasons). It would be better to have all upstream related configurations in a separate structure, but in this case we'll introduce breaking changes. So I think it would be better (for a moment) to introduce a new top-level config enum "upstreamStrategy" and refactor the "ParallelBestResolver" to extract the resolver choose logic for example in a separate interface. So we can implement more strategies later.
Currently, we do have the "upstream" section and related "UpstreamTimeout". The "upstream" section is not a nested struct, but only a map (historical reasons). It would be better to have all upstream related configurations in a separate structure, but in this case we'll introduce breaking changes.
I've got local changes to allow having more config there, and be back-compat. Basically I also renamed it to upstreams
instead of upstream
, so we can use our standard option deprecation flow.
The main goal of those changes is to have parallel init for upstreams (#835). It's almost done so I could make a PR soon. But I think I can even split the config change so we can merge that quicker and @DerRockWolf can use that as a base.
EDIT: so if you, @DerRockWolf, have already started some work, don't worry too much about the config, just add something to the big Config
struct, and moving your struct into the one I created should be easy :)
refactor the "ParallelBestResolver" to extract the resolver choose logic for example in a separate interface. So we can implement more strategies later.
Related to #1001
Bad weather gave me a bit of extra time today, so I opened #1086 with just the config change.
@agneevX my PR (#1093) implementing the strict
strategy doesn't tackle:
REFUSED is returned. Google does this for some queries containing ECS data.
The "upstream resolver" contacting the upstream DNS server only returns err if it didn't get a reply. The responses are returned as received, regardless of the DNS message response codes.
This is also currently the case for the parallel_best
resolver. If google DNS replies REFUSED and wins the race, blocky will return the answer from google.
We would need to implement custom handling based on the DNS response codes.
Currently blocky...
This works very well, but is not desirable when you want to use a known resolver as primary all the time and want to use a secondary resolver only as backup.
I propose adding an option to query the first resolver in the list, then falling back to secondary and so on... after any of the following:
upstreamTimeout
value maybe)REFUSED
is returned. Google does this for some queries containing ECS data. More on that issue here.