TritonDataCenter / smartos-live

For more information, please see http://smartos.org/ For any questions that aren't answered there, please join the SmartOS discussion list: https://smartos.topicbox.com/groups/smartos-discuss
1.57k stars 245 forks source link

vrrp virtual IP not reachable from the global zone #181

Open alcir opened 11 years ago

alcir commented 11 years ago

I followed the wiki directions. An I followed the workaround described in the issue #136 to configure two zones on separate hardware.

VRRP works (master/backup takeover), and the virtual IP is reachable from my notebook.

The problem is the following:

If I shutdown the backup zone, the ping start working. No problems at all in reaching the VIP from the GZ hosting the master instance or from host outside the two involved servers.

AlainODea commented 11 years ago

This affects me as well. Myself and Alessio on the smartos-discuss list bounced this back and forth at length. It seems to be a defect in L2 packet forwarding, possibly in Crossbow. The packets are being forwarded to the backup NIC even though it is down.

I will endeavor to give as much detail of my own experience as possible to aid in resolving this. I am not sure if this is a SmartOS issue or an Illumos issue, but I am a SmartOS user so this is the forum I choose.

Setup of VRRP Zones

I have two zones zone:b59a6c5e-e0e9-4bd2-ad6d-ed0ffaeb9f04 and zone:0290c2d8-5cc9-4cd5-a5f7-f4baf48def51. They are on independent physical hosts.

zone:b59a6c5e-e0e9-4bd2-ad6d-ed0ffaeb9f04 was created from proxy-a.json:

{
  "alias": "proxy",
  "max_physical_memory": 1024,
  "dataset_uuid": "84cb7edc-3f22-11e2-8a2a-3f2a7b148699",
  "nics": [
    {
      "nic_tag": "external",
      "vlan_id": 7,
      "ip": "10.101.0.4",
      "netmask": "255.255.0.0",
      "gateway": "10.101.0.1",
      "vrrp_vrid": 1,
      "vrrp_primary_ip": "10.101.0.5"
    },
    {
      "nic_tag": "external",
      "vlan_id": 7,
      "ip": "10.101.0.5",
      "netmask": "255.255.0.0",
      "gateway": "10.101.0.1",
      "primary": true,
      "allow_ip_spoofing": true
    }
  ]
}

zone:0290c2d8-5cc9-4cd5-a5f7-f4baf48def51 was created from proxy-b.json:

{
  "alias": "proxy",
  "max_physical_memory": 1024,
  "dataset_uuid": "84cb7edc-3f22-11e2-8a2a-3f2a7b148699",
  "nics": [
    {
      "nic_tag": "external",
      "vlan_id": 7,
      "ip": "10.101.0.4",
      "netmask": "255.255.0.0",
      "gateway": "10.101.0.1",
      "vrrp_vrid": 1,
      "vrrp_primary_ip": "10.101.0.6"
    },
    {
      "nic_tag": "external",
      "vlan_id": 7,
      "ip": "10.101.0.6",
      "netmask": "255.255.0.0",
      "gateway": "10.101.0.1",
      "primary": true,
      "allow_ip_spoofing": true
    }
  ]
}

I configured zone:b59a6c5e-e0e9-4bd2-ad6d-ed0ffaeb9f04 to be the VRRP MASTER:

vrrpadm create-router -V 1 -l net0 -A inet router0

I configured zone:0290c2d8-5cc9-4cd5-a5f7-f4baf48def51 to be the VRRP BACKUP:

vrrpadm create-router -V 1 -l net0 -A inet router0
vrrpadm modify-router -p 127 router0

Verification of VRRP

I verified VRRP MASTER state on zone:b59a6c5e-e0e9-4bd2-ad6d-ed0ffaeb9f04:

[root@b59a6c5e-e0e9-4bd2-ad6d-ed0ffaeb9f04 ~]# ifconfig
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
net0: flags=50201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS,VRRP,L3PROTECT> mtu 1500 index 2
        inet 10.101.0.4 netmask ffff0000 broadcast 10.101.255.255
        ether 0:0:5e:0:1:1
net1: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 3
        inet 10.101.0.5 netmask ffff0000 broadcast 10.101.255.255
        ether a2:93:ad:c2:50:60
lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1
        inet6 ::1/128
[root@b59a6c5e-e0e9-4bd2-ad6d-ed0ffaeb9f04 ~]# vrrpadm show-router router0
NAME    VRID LINK    AF   PRIO ADV_INTV MODE  STATE VNIC
router0 1    net1    IPv4 127  1000     e-pa- MASTER net0

I verified the VRRP BACKUP state on zone:0290c2d8-5cc9-4cd5-a5f7-f4baf48def51:

[root@0290c2d8-5cc9-4cd5-a5f7-f4baf48def51 ~]# ifconfig
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
net0: flags=50201000842<BROADCAST,RUNNING,MULTICAST,IPv4,CoS,VRRP,L3PROTECT> mtu 1500 index 2
        inet 10.101.0.4 netmask ffff0000 broadcast 10.101.255.255
        ether 0:0:5e:0:1:1
net1: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 3
        inet 10.101.0.6 netmask ffff0000 broadcast 10.101.255.255
        ether 22:33:14:3:13:b9
lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1
        inet6 ::1/128
[root@0290c2d8-5cc9-4cd5-a5f7-f4baf48def51 ~]# vrrpadm show-router router0
NAME    VRID LINK    AF   PRIO ADV_INTV MODE  STATE VNIC
router0 1    net1    IPv4 1    1000     e-pa- BACKUP net0

I pinged the VRRP VIP from the VRRP MASTER zone:b59a6c5e-e0e9-4bd2-ad6d-ed0ffaeb9f04:

[root@0290c2d8-5cc9-4cd5-a5f7-f4baf48def51 ~]# ping 10.101.0.4
10.101.0.4 is alive

In a separate SSH session I observed no ping on the VRRP VNIC of the VRRP MASTER zone:b59a6c5e-e0e9-4bd2-ad6d-ed0ffaeb9f04:

[root@b59a6c5e-e0e9-4bd2-ad6d-ed0ffaeb9f04 ~]# snoop icmp
Using device net0 (promiscuous mode)

In a separate SSH session I observed no ping on the primary VNIC of the VRRP MASTER zone:b59a6c5e-e0e9-4bd2-ad6d-ed0ffaeb9f04:

[root@b59a6c5e-e0e9-4bd2-ad6d-ed0ffaeb9f04 ~]# snoop -d net1 icmp
Using device net1 (promiscuous mode)

I pinged the VRRP VIP from the VRRP BACKUP zone:0290c2d8-5cc9-4cd5-a5f7-f4baf48def51:

[root@0290c2d8-5cc9-4cd5-a5f7-f4baf48def51 ~]# ping 10.101.0.4
no answer from 10.101.0.4

In a separate SSH session I observed the ICMP Echo request arriving on the VRRP VNIC of the VRRP BACKUP zone:0290c2d8-5cc9-4cd5-a5f7-f4baf48def51:

[root@0290c2d8-5cc9-4cd5-a5f7-f4baf48def51 ~]# snoop icmp
Using device net0 (promiscuous mode)
0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local -> 0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local ICMP Echo request (ID: 31027 Sequence number: 0)
0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local -> 0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local ICMP Echo request (ID: 31027 Sequence number: 1)
0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local -> 0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local ICMP Echo request (ID: 31027 Sequence number: 2)
0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local -> 0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local ICMP Echo request (ID: 31027 Sequence number: 3)
0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local -> 0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local ICMP Echo request (ID: 31027 Sequence number: 4)
0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local -> 0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local ICMP Echo request (ID: 31027 Sequence number: 5)
0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local -> 0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local ICMP Echo request (ID: 31027 Sequence number: 6)
0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local -> 0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local ICMP Echo request (ID: 31027 Sequence number: 7)
0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local -> 0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local ICMP Echo request (ID: 31027 Sequence number: 8)
0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local -> 0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local ICMP Echo request (ID: 31027 Sequence number: 9)
0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local -> 0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local ICMP Echo request (ID: 31027 Sequence number: 10)
0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local -> 0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local ICMP Echo request (ID: 31027 Sequence number: 11)
0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local -> 0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local ICMP Echo request (ID: 31027 Sequence number: 12)
0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local -> 0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local ICMP Echo request (ID: 31027 Sequence number: 13)
0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local -> 0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local ICMP Echo request (ID: 31027 Sequence number: 14)
0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local -> 0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local ICMP Echo request (ID: 31027 Sequence number: 15)
0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local -> 0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local ICMP Echo request (ID: 31027 Sequence number: 16)
0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local -> 0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local ICMP Echo request (ID: 31027 Sequence number: 17)
0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local -> 0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local ICMP Echo request (ID: 31027 Sequence number: 18)
0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local -> 0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local ICMP Echo request (ID: 31027 Sequence number: 19)

As far as I understand VRRP the VRRP VNIC on the VRRP BACKUP should not be receiving anything except perhaps broadcast traffic. This seems to be the most observable symptom of this defect.

Isolation of the Issue

Non-VRRP NICs are apparently not affected.

I pinged the VRRP MASTER's primary IP from the VRRP BACKUP zone:0290c2d8-5cc9-4cd5-a5f7-f4baf48def51:

[root@0290c2d8-5cc9-4cd5-a5f7-f4baf48def51 ~]# ping 10.101.0.5
10.101.0.5 is alive

In a separate SSH session I observed the ICMP Echo request arriving and Echo reply leaving the primary VNIC of the VRRP MASTER zone:b59a6c5e-e0e9-4bd2-ad6d-ed0ffaeb9f04:

[root@b59a6c5e-e0e9-4bd2-ad6d-ed0ffaeb9f04 ~]# snoop -d net1 icmp
Using device net1 (promiscuous mode)
lvhubproxy01-b.verafin.net -> lvhubproxy01-a.verafin.net ICMP Echo request (ID: 31028 Sequence number: 0)
lvhubproxy01-a.verafin.net -> lvhubproxy01-b.verafin.net ICMP Echo reply (ID: 31028 Sequence number: 0)

I pinged the VRRP BACKUP's primary IP from the VRRP MASTER zone:b59a6c5e-e0e9-4bd2-ad6d-ed0ffaeb9f04:

[root@b59a6c5e-e0e9-4bd2-ad6d-ed0ffaeb9f04 ~]# ping 10.101.0.6
10.101.0.6 is alive

In a separate SSH session I observed the ICMP Echo request arriving and Echo reply leaving the primary VNIC of the VRRP BACKUP zone:0290c2d8-5cc9-4cd5-a5f7-f4baf48def51:

[root@0290c2d8-5cc9-4cd5-a5f7-f4baf48def51 ~]# snoop -d net1 icmp
Using device net1 (promiscuous mode)
lvhubproxy01-a.verafin.net -> 0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local ICMP Echo request (ID: 4502 Sequence number: 0)
0290c2d8-5cc9-4cd5-a5f7-f4baf48def51.local -> lvhubproxy01-a.verafin.net ICMP Echo reply (ID: 4502 Sequence number: 0)

I pinged the VRRP VIP from an independent host outside the SmartOS hosts running the VRRP BACKUP and MASTER and observed the ICMP Echo reply packets leaving the primary VNIC and Echo request packets arriving on the VRRP VNIC:

[root@b59a6c5e-e0e9-4bd2-ad6d-ed0ffaeb9f04 ~]# snoop -d net1 icmp
Using device net1 (promiscuous mode)
lvhubproxy01.verafin.net -> lvjump01.verafin.net ICMP Echo reply (ID: 1 Sequence number: 9)
lvhubproxy01.verafin.net -> lvjump01.verafin.net ICMP Echo reply (ID: 1 Sequence number: 10)
lvhubproxy01.verafin.net -> lvjump01.verafin.net ICMP Echo reply (ID: 1 Sequence number: 11)
lvhubproxy01.verafin.net -> lvjump01.verafin.net ICMP Echo reply (ID: 1 Sequence number: 12)
[root@b59a6c5e-e0e9-4bd2-ad6d-ed0ffaeb9f04 ~]# snoop -d net0 icmp
Using device net0 (promiscuous mode)
lvjump01.verafin.net -> lvhubproxy01.verafin.net ICMP Echo request (ID: 1 Sequence number: 13)
lvjump01.verafin.net -> lvhubproxy01.verafin.net ICMP Echo request (ID: 1 Sequence number: 14)
lvjump01.verafin.net -> lvhubproxy01.verafin.net ICMP Echo request (ID: 1 Sequence number: 15)
lvjump01.verafin.net -> lvhubproxy01.verafin.net ICMP Echo request (ID: 1 Sequence number: 16)

Observations

The primary VNIC being used to forward replies rather than the VRRP VNIC seems unusual and possibly defective.

The VRRP communications appear to work inconsistently. On the backup they don't get forwarded correctly and on the master they seem to work without even appearing on the VNICs.

Limitations

Access to the VIP is prevented for all zones (including the GZ) that are on the same VLAN as the VRRP BACKUP on the same physical host. This makes VRRP impractical for production use.

jmealo commented 11 years ago

Are there any updates on this?

AlainODea commented 11 years ago

I have reproduced this issue on joyent_20130808T195337Z.

I am taking a long guess that this is an issue in how Crossbow handles Layer 2 forwarding when the VRRP MAC is nominally on a local zones interface, but shouldn't be active. The ICMP Echo Request ethernet frames are being built correctly (source and destination MACs are valid), but are not received by the target host. I wish I had a SPAN port set up so I could see whether the packets get forwarded to the attached physical switch. I believe that they don't, but I have no proof of that.

I have softened my original opinion on this somewhat. It is actually practical for production use, but comes with annoying caveats. If the GZ or an OS VM (zone) communicates with VRRP VIPs you have two options: 1) put them on the same VLAN as the VRRP VIPs. Requires: that they be on separate SmartOS hosts. Avoiding SPoFs in complex service-oriented architectures in this case involves insuring that no single box with break the dependency graph. This requires at least three SmartOS hosts since a single clients on Host A would need a service on Host B to fail to Host C. If the service failed to Host A it would no longer be accessible. 2) put them on the same SmartOS host. Requires: that they be on separate VLANs from the VRRP VIP. Avoiding SPoFs in complex service-oriented architectures is largely a matter of having the dependencies be a DAG (directed acyclic graph) and making sure no service client is on the same VLAN as the service it calls. This can be done without a VLAN per service given some thought. This requires at least two SmartOS hosts.

joshado commented 10 years ago

Just found this thread via the smartos-discuss list.

We've hit this issue but have found a workaround that lets us use it in production. The effect is only observed by VNICs on the same underlying physical interface - so if you have a VRRP VNIC on one interface (in BACKUP state), then VNICs sharing that NIC indeed can't communicate with the VRRP master.

That said, if you have a VNIC on a separate physical interface on the same host, it can access the VRRP master. Consequently, since our servers have 4 gig-e links, we've dedicated a physical interface to these VRRP VNICs.

It's also worth pointing out that we only use VRRP for our soft-routers (the default gateway address), we actually use the "wackamole" utility to do HA application level VIP failover, and find this much easier to configure (as it doesn't require any serious zone-level configuration or set-up).

Shadok commented 10 years ago

Are you using wackamole on a smartos zone ? I'm wondering how does it work considering that any zone network interface is managed by smartos.

joshado commented 10 years ago

Yep - wackamole uses alias interfaces, rather than dedicated interfaces for the VIP. The only thing you need to do is configure the VNIC to allow IP spoofing, or provide the list of acceptable IP addresses.

Shadok commented 10 years ago

Could you write a quick tutorial please ? Being new to SmartOS, i'm having a hard time making failover work with it. It would help anyone stuck with vrrp bad behaviour.

Shadok commented 10 years ago

Ok, I managed to make it work. I was thinking it would need an additional vnic and it doesn't. So, for those who'd like to do it : http://blog.adityapatawari.com/2011/09/building-highly-available-cluster-using.html Just create your zone with a single vnic with allow_ip_spoofing to true and follow the tutorial. Your network interface will be net0.