freifunk-berlin / firmware

DEPRECATED: Build system for Berlin firmware. Please user the pinned falter-repos instead
https://berlin.freifunk.net
GNU General Public License v3.0
74 stars 34 forks source link

ffuplink (all) fails to make DNS queries when the uplink (fritz) is offline, even when the node has a smart gw #660

Closed pmelange closed 5 years ago

pmelange commented 5 years ago

This situation is referred to in #659 but affects all the uplink types.

When a router's uplink no longer works, the router should fail over to the smart gw. This works with openvpn and tunneldigger relatively quick. But it still leaves the users without usable internet because DSL is not working.

The issue is best seen in the situation where the fritz box is still powered on, but the DSL connection is down. Dnsmasq will still try to make all DNS queries via WAN (fritz box) leaving the users with an unusable internet connection.

pmelange commented 5 years ago

As a side note, if the WAN interface is brought down (via ifdown wan) then the DNS queries go over the smart gw. But bringing down WAN prevents the ffuplink from coming up again. So this is not an option.

SvenRoederer commented 5 years ago

This problem should also be seen on a pre Hedy-1.0.0 node, right?

pmelange commented 5 years ago

I would assume that this is also a problem with earlier version of the firmware.

@SvenRoederer do you know how we could make dnsmasq use the same route as the clients on the br-dhcp interface and to fail back on br-wan if there isn't a route?

SvenRoederer commented 5 years ago

I've seen https://github.com/openwrt/packages/tree/master/net/pingcheck in the packages feed. This can run scripts when the uplink dies.

I'm unsure if it's a good idea to route all dns-traffic via the VPN too

pmelange commented 5 years ago

I'm unsure if it's a good idea to route all dns-traffic via the VPN too

I agree.

Wouldn't it be possible to set up an IP rule that sends all locally generated traffic to a specific table? Normally the default table, but in this above mentioned situation, the olsr table?

pmelange commented 5 years ago

This is a better URL for pingcheck https://github.com/br101/pingcheck

pmelange commented 5 years ago

In combination with a ping check (possibly using the pingcheck package) I suggest the following action.

If internet not reachable via WAN
  ip rule add prio 3000 iif lo lookup olsr-tunnel
else
  ip rule del prio 3000 iif lo lookup olsr-tunnel
fi

With this ip rule, all locally generated traffic will go over the olsr-tunnel but any VPN connection will not (since is it bound to WAN). Also, ping -c 1 8.8.8.8 -I br-wan works/fails as expected.

Does anyone have any comments on using this ip rule when the internet is not reachable via WAN?

SvenRoederer commented 5 years ago

not sure if this is the perfect rule, but something in this direction

pmelange commented 5 years ago

Before I start implementing this, it would be great to have suggestions as to what would be better.

SvenRoederer commented 5 years ago

btw. just seen, that there is a package "freifunk-gwcheck" which might also do the job.

pmelange commented 5 years ago

I don't think freifunk-gwcheck will do what we want. I think pingcheck looks promising.

SvenRoederer commented 5 years ago

probably it makes sense to combine pingcheck with the logic of the scripts of freifunk-gwcheck

kls0e commented 5 years ago

it would be welcome if we tested the mentioned approach. I maintain two small sites that regularly run into the situation described.

booo commented 5 years ago

I would highly appreciate if we (first) create some documentation about the whole routing setup and the involved tables, rules, scripts. It feels like we create a even more complex system with every commit that involves the routing.

pmelange commented 5 years ago

@booo, I also think it wise to have better documentation. #565

But at the same time, I have already made a branch which solves this issue and #659.

The branch is called pingcheck in both the firmware and firmware-packages repos. Please take a look at them and let me know what you think. For example, do the rules have an acceptable prio? Is there a better way to do the uci-defaults? Is the host that I'm pinging reliable? If not, what would be a better host?

pmelange commented 5 years ago

fixed with https://github.com/freifunk-berlin/firmware-packages/pull/182