NICMx / Jool

SIIT and NAT64 for Linux
GNU General Public License v2.0
320 stars 66 forks source link

Redundancy für bib database? #113

Closed ruben-herold closed 7 years ago

ruben-herold commented 9 years ago

hi,

is there any kind of planung for redundancy like ipvs+keepalived? For my point of view something must replicate the bib database accross servers and the ip take overs can do a keeplived (Vrrp) ...

ydahhrk commented 9 years ago

Warning: This is my first time reading about these topics. You might want to correct me if I'm not making sense.

I'm assuming Keepalived is IPVS plus VRRP (I don't know what the "checkers" are, though). IPVS provides server redundancy while VRRP provides router redundancy.

With that in mind...


Er, this looks like two separate questions.

  1. IPVS: Well, IPVS happens to be implemented as a Netfilter module that hooks itself after filtering. Unless we discover a problem with my reasoning later in testing, it looks like Jool and IPVS should coexist in harmony once we can move Jool away from the beginning of Netfilter's input chain (which is how we're planning to fix issue #41 anyway). So yeah, I guess there are plans to support server-side redundancy. We hadn't considered the test case, however.
    And I ~guess~ you could always chain Jool and IPVS in separate machines.
  2. VRRP: This one is harder. Because only Jool knows its tables, Database synchronization would have to be coded into Jool. Because sessions also contain important information for normal NAT64 operation, the session tables might also have to be sync'd. Keepalived's VRRP implementation looks reusable though.

Summary: IPVS should work when we fix #41. VRRP sounds like a reasonable objective afterwards.

Connection synchronization hasn't proved to be too heavy?

ruben-herold commented 9 years ago

Just to clarify what I mean:

First we need an mechanism for sync the database and sessions between the server like: http://www.linuxvirtualserver.org/docs/sync.html

After that is done we have to servers with the same bib and session data so if someone installes keepalived ob both servers a failover is possible. Cause with vrrp you can transfer the "server ip Adresses ( ipv4 and ipv6)" between the server.

I never want let ipvs run on the same systems! It was only an example how it could work.

ydahhrk commented 9 years ago

OK. Tentatively adding to milestone 3.3.0.

toreanderson commented 9 years ago

@ruben-herold, correct me if I'm wrong, but I do not think implementation of this issue really has anything to do with IPVS or VRRP per se. VRRP and IPVS is just one of many methods an administrator can use to fail over traffic from one Jool instance to another. (Another way would be e.g. advertise the pool4/pool6 networks from the Jool servers using BGP, and then change the advertisements such that the previously inactive instance suddenly got all the traffic.) If all the stateful stuff is kept in sync between two (or more) Jool nodes, existing sessions won't drop if suddenly traffic shifts from one instance to another.

I think it would be better to compare this feature to the Netfilter conntrackd, which can do exactly this for Netfilter's connection tracking tables.

Tore

ydahhrk commented 9 years ago

I think I'm the one who phrased it poorly in this comment.

I did not meant we wanted to make Keepalived the only possible way to admin redundant Jool instances; I meant we're using it to get acquainted with this kind of setup.

What's stopping us is a couple of loose ends in the database synchronization algorithm (here's a little rant). I understand there's no standard for this, so I guess we'll steal some ideas from conntrackd.

(BTW: It looks like anyone with Github accounts can edit the wiki, in case you want to use it for something)

toreanderson commented 9 years ago

You might want to consider active/active/.../active scenarios as well. For example if the operator is having a router or switch load balance between multiple Jool instances using ECMP. Perhaps you could facilitate for that by announcing any changes to the state stable to a multicast group which all the cluster members can subscribe to. In order to limit the amount of state replication traffic, another idea could be to only synchronize long-lived sessions (as it's usually not a problem if short-lived HTTP requests and such get interrupted half-way through).

(Just thinking out loud here.)

Tore

ruben-herold commented 9 years ago

The multicast distribution looks for me like a way to go. This will not break any type of setups active/active, active/passive and so on.

As I rember right tomcat uses muticast for session replication between cluster nodes.

ydahhrk commented 7 years ago

https://jool.mx/en/session-synchronization.html