canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.37k stars 930 forks source link

Load balancer on OVN network are not responsive from the same OVN network #14166

Open Fred78290 opened 1 month ago

Fred78290 commented 1 month ago

Required information

Issue description

Load balancer on OVN network are not responsive from the same OVN network.

Steps to reproduce

  1. Follow the LXD tutorial to create OVN network (network named: ovntest)
  2. Create two Ubuntu (u1,u2) instances with ovntest network and install nginx
  3. Follow the LXD tutorial to create load balancer on OVN network and target the u1 and u2 instances with port 80
  4. lxc exec u1 -- curl http://\<ip address of LB>
tomponline commented 1 month ago

This is an effective duplicate of https://github.com/canonical/lxd/issues/10654 and is caused by the upstream OVN bug:

https://github.com/ovn-org/ovn/issues/144

Fred78290 commented 1 month ago

@tomponline

I don't agree because on openstack the OVN firewall is reachable from target network.

I make a comparison between lxd implementation and openstack implementation and it results a small workflow difference

If you compare you can see that a port (aka ovn-lb-vip-53df3769-fcb7-4258-85f1-9f389f2c1a05) is declared in the switch but LXD doesn't have the same.

I think LXD miss to create a forward port.

northbridge for openstack

switch 9ede1f2c-9d8b-45d5-b519-20dd39fa2a24 (neutron-a34f8a5a-afed-495f-8715-ca50b19e0778) (aka public)
    port provnet-d8c28681-21be-4a5c-a6b2-f83f644b8519
        type: localnet
        addresses: ["unknown"]
    port 51acdcc4-ca42-4bd1-a6e8-3370822beaa8 (aka ovn-lb-vip-53df3769-fcb7-4258-85f1-9f389f2c1a05)
    port 49dba2ef-c241-4719-a6f6-fd5ef932ebb5
        type: router
        router-port: lrp-49dba2ef-c241-4719-a6f6-fd5ef932ebb5
    port 7a8ec63c-0c39-4bc0-96f4-d7bb7149901f
        type: localport
        addresses: ["fa:16:3e:7e:39:9c 10.0.0.127"]
switch 0757da50-7ff4-4e7e-9411-ede422f5e0d9 (neutron-67ff44c6-6438-4294-9f68-ca61037cc65f) (aka shared)
    port 9f0c751c-e400-45aa-b69f-b2785b3f9e73
        type: localport
        addresses: ["fa:16:3e:cd:d4:ee 192.168.233.2"]
switch 1c0959a5-783c-4fb7-b8c7-21b7d13647cd (neutron-4371e259-2101-4469-b8a0-754ea290ce26) (aka private)
    port bfea0000-79c3-4a8a-a7b0-899ea6b0dae3
        type: router
        router-port: lrp-bfea0000-79c3-4a8a-a7b0-899ea6b0dae3
    port c4bafa86-a981-4c49-90d7-57eb61f736b4
        addresses: ["fa:16:3e:a8:05:4c 192.168.32.5"]
    port 759b258b-6e23-4903-91b1-a72bdcca1e73 (aka ovn-lb-hm-fa63e289-ed90-4c80-8a21-d544cd1b2714)
        type: localport
        addresses: ["fa:16:3e:84:85:10 192.168.32.61"]
    port cdb7ba33-6c10-423e-81c8-829de62f9076
        type: localport
        addresses: ["fa:16:3e:10:75:60 192.168.32.2"]
    port 533bc73e-eab1-4872-95ca-e9b6e0055f67
        addresses: ["fa:16:3e:90:f9:e3 192.168.32.3"]
router 6e250c98-9bf9-44e5-94b1-0d20f7835388 (neutron-dcadaa16-7d8e-4150-84cc-ccd2df882a50) (aka router1)
    port lrp-bfea0000-79c3-4a8a-a7b0-899ea6b0dae3
        mac: "fa:16:3e:a5:00:ed"
        networks: ["192.168.32.1/26"]
    port lrp-49dba2ef-c241-4719-a6f6-fd5ef932ebb5
        mac: "fa:16:3e:88:db:49"
        networks: ["10.0.0.135/24"]
        gateway chassis: [e51d0c21-6181-4bd9-9d9e-bb80ae968f7a]
    nat bc1b58bd-52ce-4ed1-b237-3c78ac92070c
        external ip: "10.0.0.135"
        logical ip: "192.168.32.0/26"
        type: "snat"

OVN Workflow applied for openstack

_uuid               : ad95e5d7-2461-4e04-b307-d53ac5ace8c8
actions             : "clone {outport = \"49dba2ef-c241-4719-a6f6-fd5ef932ebb5\"; output; }; outport = \"_MC_flood_l2\"; output;"
controller_meter    : []
external_ids        : {source="northd.c:8165", stage-hint=b9fcb9e6, stage-name=ls_in_l2_lkup}
logical_datapath    : 0a4ae9e5-ca0d-4b55-830e-094727c5de3e
logical_dp_group    : []
match               : "flags[1] == 0 && arp.op == 1 && arp.tpa == 10.0.0.156"
pipeline            : ingress
priority            : 80
table_id            : 27
tags                : {}
hash                : 0

_uuid               : 279af996-8bfa-4013-8b49-42cbeb727832
actions             : "ct_dnat;"
controller_meter    : []
external_ids        : {source="northd.c:11280", stage-hint="0766d34b", stage-name=lr_in_defrag}
logical_datapath    : 416dbd76-6bd7-4caa-8a2c-5238dd5e459a
logical_dp_group    : []
match               : "ip && ip4.dst == 10.0.0.156"
pipeline            : ingress
priority            : 100
table_id            : 5
tags                : {}
hash                : 0

_uuid               : 5d8dca1c-a013-4899-9c70-5609c0b7a979
actions             : "ct_lb_mark(backends=192.168.32.3:80,192.168.32.5:80; hash_fields=\"ip_dst,ip_src,tcp_dst,tcp_src\");"
controller_meter    : []
external_ids        : {source="northd.c:10970", stage-hint="0766d34b", stage-name=lr_in_dnat}
logical_datapath    : 416dbd76-6bd7-4caa-8a2c-5238dd5e459a
logical_dp_group    : []
match               : "ct.new && !ct.rel && ip4 && ip4.dst == 10.0.0.156 && tcp && tcp.dst == 80 && is_chassis_resident(\"cr-lrp-49dba2ef-c241-4719-a6f6-fd5ef932ebb5\")"
pipeline            : ingress
priority            : 120
table_id            : 7
tags                : {}
hash                : 0

_uuid               : 5770884a-52dc-482d-be58-0e465afc02bf
actions             : "reg0[1] = 0; ct_lb_mark(backends=192.168.32.3:80,192.168.32.5:80; hash_fields=\"ip_dst,ip_src,tcp_dst,tcp_src\");"
controller_meter    : []
external_ids        : {source="northd.c:7689", stage-hint="0766d34b", stage-name=ls_in_lb}
logical_datapath    : c78de46c-5c00-4845-a044-0a77dd5488c2
logical_dp_group    : []
match               : "ct.new && ip4.dst == 10.0.0.156 && tcp.dst == 80"
pipeline            : ingress
priority            : 120
table_id            : 13
tags                : {}
hash                : 0

_uuid               : db97ebdc-590f-4d7d-9e6e-847c42f6aa2b
actions             : "reg1 = 10.0.0.156; reg2[0..15] = 80; ct_lb_mark;"
controller_meter    : []
external_ids        : {source="northd.c:7211", stage-hint="0766d34b", stage-name=ls_in_pre_stateful}
logical_datapath    : c78de46c-5c00-4845-a044-0a77dd5488c2
logical_dp_group    : []
match               : "reg0[2] == 1 && ip4.dst == 10.0.0.156 && tcp.dst == 80"
pipeline            : ingress
priority            : 120
table_id            : 6
tags                : {}
hash                : 0

northbridge for LXD

switch 09b5dd07-6ce4-4d9a-bc07-3a6d772e1c07 (lxd-net49-ls-ext)
    port lxd-net49-ls-ext-lsp-provider
        type: localnet
        addresses: ["unknown"]
    port lxd-net49-ls-ext-lsp-router
        type: router
        router-port: lxd-net49-lr-lrp-ext
switch 1ae77337-0408-4578-8e5a-5fa31957653b (lxd-net49-ls-int)
    port lxd-net49-ls-int-lsp-router
        type: router
        router-port: lxd-net49-lr-lrp-int
    port lxd-net49-instance-97da1458-3818-4f23-a5a0-289b3fffe5ca-eth0
        addresses: ["00:16:3e:79:36:9a dynamic"]
    port lxd-net49-instance-17490213-05ee-4a6e-a3b7-7878041048fe-eth0
        addresses: ["00:16:3e:18:d4:00 dynamic"] 
router 8b24476e-6aa1-4651-b630-27cf303d325a (lxd-net49-lr)
    port lxd-net49-lr-lrp-int
        mac: "00:16:3e:ac:aa:58"
        networks: ["10.183.144.1/24"]
    port lxd-net49-lr-lrp-ext
        mac: "00:16:3e:ac:aa:58"
        networks: ["192.168.6.1/22"]
    nat 54dc4d97-6b1f-4d66-8713-ca0c72666c34
        external ip: "192.168.6.1"
        logical ip: "10.183.144.0/24"
        type: "snat"

OVN Workflow applied for LXD

_uuid               : 9a644314-06d1-4b93-8255-b1e6a3be7ff9
actions             : "clone {outport = \"lxd-net49-ls-ext-lsp-router\"; output; }; outport = \"_MC_flood_l2\"; output;"
controller_meter    : []
external_ids        : {source="northd.c:8165", stage-hint=db3927a0, stage-name=ls_in_l2_lkup}
logical_datapath    : 288866c1-ae92-4206-8fb6-e878355dd4c2
logical_dp_group    : []
match               : "flags[1] == 0 && arp.op == 1 && arp.tpa == 192.168.6.50"
pipeline            : ingress
priority            : 80
table_id            : 27
tags                : {}
hash                : 0

_uuid               : 1ec51423-2d2a-4f95-a990-1af6b34f0dd2
actions             : "ct_dnat;"
controller_meter    : []
external_ids        : {source="northd.c:11280", stage-hint=b84e02dd, stage-name=lr_in_defrag}
logical_datapath    : 9c1653d9-6993-45da-89b2-719987507f09
logical_dp_group    : []
match               : "ip && ip4.dst == 192.168.6.50"
pipeline            : ingress
priority            : 100
table_id            : 5
tags                : {}
hash                : 0

_uuid               : ae35ec75-3888-47d2-a20a-e29a5d13a641
actions             : "ct_lb_mark(backends=10.183.144.2:443,10.183.144.3:443);"
controller_meter    : []
external_ids        : {source="northd.c:10970", stage-hint=b84e02dd, stage-name=lr_in_dnat}
logical_datapath    : 9c1653d9-6993-45da-89b2-719987507f09
logical_dp_group    : []
match               : "ct.new && !ct.rel && ip4 && ip4.dst == 192.168.6.50 && tcp && tcp.dst == 443 && is_chassis_resident(\"cr-lxd-net49-lr-lrp-ext\")"
pipeline            : ingress
priority            : 120
table_id            : 7
tags                : {}
hash                : 0

_uuid               : 1d783e7e-14da-4a56-9cd2-81189048567f
actions             : "ct_lb_mark(backends=10.183.144.2:80,10.183.144.3:80);"
controller_meter    : []
external_ids        : {source="northd.c:10970", stage-hint=b84e02dd, stage-name=lr_in_dnat}
logical_datapath    : 9c1653d9-6993-45da-89b2-719987507f09
logical_dp_group    : []
match               : "ct.new && !ct.rel && ip4 && ip4.dst == 192.168.6.50 && tcp && tcp.dst == 80 && is_chassis_resident(\"cr-lxd-net49-lr-lrp-ext\")"
pipeline            : ingress
priority            : 120
table_id            : 7
tags                : {}
hash                : 0
tomponline commented 1 month ago

LXD creates entries in the Load_Balancer OVN northbound table and correctly associates them to the logical router. They work from the uplink network but they do not work from the internal network due to hairpin snat not taking effect in ovn when using distributed routers (which lxd does).

This is the bug i was actually investigating today.

tomponline commented 1 month ago

Please confirm if they work ok from the uplink network.

Fred78290 commented 1 month ago

Please confirm if they work ok from the uplink network.

I don't really understand your question. The loadbalancer is reachable from every external network and the uplink also.

In a very simple LXD config, the loadbalancer is reachabe from the uplink. the ovntest network has lxdbr0 as uplink.

The instance u3 reach the load balancer but u1 and u2 on ovntest can't.

+---------+----------+---------+-----------------+---------------------------+-------------+---------+---------+
|  NAME   |   TYPE   | MANAGED |      IPV4       |           IPV6            | DESCRIPTION | USED BY |  STATE  |
+---------+----------+---------+-----------------+---------------------------+-------------+---------+---------+
| br-int  | bridge   | NO      |                 |                           |             | 0       |         |
+---------+----------+---------+-----------------+---------------------------+-------------+---------+---------+
| enp5s0  | physical | NO      |                 |                           |             | 0       |         |
+---------+----------+---------+-----------------+---------------------------+-------------+---------+---------+
| lxdbr0  | bridge   | YES     | 192.168.48.1/24 | fd42:432d:df96:51f1::1/64 |             | 3       | CREATED |
+---------+----------+---------+-----------------+---------------------------+-------------+---------+---------+
| lxdovn1 | bridge   | NO      |                 |                           |             | 0       |         |
+---------+----------+---------+-----------------+---------------------------+-------------+---------+---------+
| ovntest | ovn      | YES     | 10.68.223.1/24  | fd42:e949:26f2:ad67::1/64 |             | 2       | CREATED |
+---------+----------+---------+-----------------+---------------------------+-------------+---------+---------+
+------+---------+-----------------------+-----------------------------------------------+-----------+-----------+
| NAME |  STATE  |         IPV4          |                     IPV6                      |   TYPE    | SNAPSHOTS |
+------+---------+-----------------------+-----------------------------------------------+-----------+-----------+
| u1   | RUNNING | 10.68.223.2 (eth0)    | fd42:e949:26f2:ad67:216:3eff:fe9f:7052 (eth0) | CONTAINER | 0         |
+------+---------+-----------------------+-----------------------------------------------+-----------+-----------+
| u2   | RUNNING | 10.68.223.3 (eth0)    | fd42:e949:26f2:ad67:216:3eff:fe36:3718 (eth0) | CONTAINER | 0         |
+------+---------+-----------------------+-----------------------------------------------+-----------+-----------+
| u3   | RUNNING | 192.168.48.146 (eth0) | fd42:432d:df96:51f1:216:3eff:fe50:ab15 (eth0) | CONTAINER | 0         |

I think that LXD miss to create a port in the switch for routing internal traffic to internal traffic thru the external ip like openstack does. It's not a problem of snat_dnat.

Fred78290 commented 1 month ago

Take a look at ovn-octavia-provider It's work fine.

tomponline commented 1 month ago

I don't really understand your question. The loadbalancer is reachable from every external network and the uplink also.

That was my question, thanks.

I'm not following you im afraid. Please can you show the differences in the OVN northbound DB, not the OVN flows (as LXD doesn't control those directly).

I've been chatting with the Canonical OVN team about this earlier today and they agreed the bug I referenced earlier was the issue (they have seen similar issues in the past with NAT rules on distributed routers).

tomponline commented 1 month ago

What does ovn-nbctl find load_balancer and ovn-nbctl find logical_router show in your openstack setup, as OVN load balancers do not require adding a dedicated logical switch port to the network AFAIK (like your openstack example has above), so im wondering if there is a different mechanism being used there.

If you run the ovn-nbctl find load_balancer on the LXD setup you'll see the load balancer entries setup.

tomponline commented 1 month ago

Are you using distributed routers or per-chassis routers?

tomponline commented 1 month ago

See comment on upstream bug for more info https://github.com/ovn-org/ovn/issues/144#issuecomment-1238391080

Fred78290 commented 1 month ago

@tomponline take a look at the end of my comment, i have fixed the issue manually as suggested https://github.com/ovn-org/ovn/issues/144#issuecomment-1277123745

LXD and Openstack are running on the same host.

The LXD load balancer have IP: 196.168.6.50 The Openstack loadbalancer have IP: 10.0.0.156

I'm not following you im afraid. Please can you show the differences in the OVN northbound DB, not the OVN flows (as LXD doesn't control those directly).

ovn-nbctl show

switch 0757da50-7ff4-4e7e-9411-ede422f5e0d9 (neutron-67ff44c6-6438-4294-9f68-ca61037cc65f) (aka shared)
    port 9f0c751c-e400-45aa-b69f-b2785b3f9e73
        type: localport
        addresses: ["fa:16:3e:cd:d4:ee 192.168.233.2"]
switch 9ede1f2c-9d8b-45d5-b519-20dd39fa2a24 (neutron-a34f8a5a-afed-495f-8715-ca50b19e0778) (aka public)
    port provnet-d8c28681-21be-4a5c-a6b2-f83f644b8519
        type: localnet
        addresses: ["unknown"]
    port 51acdcc4-ca42-4bd1-a6e8-3370822beaa8 (aka ovn-lb-vip-53df3769-fcb7-4258-85f1-9f389f2c1a05)
    port 49dba2ef-c241-4719-a6f6-fd5ef932ebb5
        type: router
        router-port: lrp-49dba2ef-c241-4719-a6f6-fd5ef932ebb5
    port 7a8ec63c-0c39-4bc0-96f4-d7bb7149901f
        type: localport
        addresses: ["fa:16:3e:7e:39:9c 10.0.0.127"]
switch 1c0959a5-783c-4fb7-b8c7-21b7d13647cd (neutron-4371e259-2101-4469-b8a0-754ea290ce26) (aka private)
    port bfea0000-79c3-4a8a-a7b0-899ea6b0dae3
        type: router
        router-port: lrp-bfea0000-79c3-4a8a-a7b0-899ea6b0dae3
    port c4bafa86-a981-4c49-90d7-57eb61f736b4
        addresses: ["fa:16:3e:a8:05:4c 192.168.32.5"]
    port 759b258b-6e23-4903-91b1-a72bdcca1e73 (aka ovn-lb-hm-fa63e289-ed90-4c80-8a21-d544cd1b2714)
        type: localport
        addresses: ["fa:16:3e:84:85:10 192.168.32.61"]
    port cdb7ba33-6c10-423e-81c8-829de62f9076
        type: localport
        addresses: ["fa:16:3e:10:75:60 192.168.32.2"]
    port 533bc73e-eab1-4872-95ca-e9b6e0055f67
        addresses: ["fa:16:3e:90:f9:e3 192.168.32.3"]
switch 09b5dd07-6ce4-4d9a-bc07-3a6d772e1c07 (lxd-net49-ls-ext)
    port lxd-net49-ls-ext-lsp-router
        type: router
        router-port: lxd-net49-lr-lrp-ext
    port lxd-net49-ls-ext-lsp-provider
        type: localnet
        addresses: ["unknown"]
switch 1ae77337-0408-4578-8e5a-5fa31957653b (lxd-net49-ls-int)
    port lxd-net49-instance-582c2399-9b2a-4fe8-bffb-1385724fd6a7-eth0
        addresses: ["00:16:3e:46:12:57 10.183.144.11"]
    port lxd-net49-ls-int-lsp-router
        type: router
        router-port: lxd-net49-lr-lrp-int
    port lxd-net49-instance-97da1458-3818-4f23-a5a0-289b3fffe5ca-eth0
        addresses: ["00:16:3e:79:36:9a dynamic"]
    port lxd-net49-instance-a8b9cc48-fdfa-4d35-9b1a-e75b9cc1298d-eth0
        addresses: ["00:16:3e:be:a1:42 10.183.144.12"]
    port lxd-net49-instance-17490213-05ee-4a6e-a3b7-7878041048fe-eth0
        addresses: ["00:16:3e:18:d4:00 dynamic"]
    port lxd-net49-instance-57327725-8540-49c8-9e4b-fd4a165a944a-eth0
        addresses: ["00:16:3e:58:89:7d 10.183.144.15"]
router 6e250c98-9bf9-44e5-94b1-0d20f7835388 (neutron-dcadaa16-7d8e-4150-84cc-ccd2df882a50) (aka router1)
    port lrp-bfea0000-79c3-4a8a-a7b0-899ea6b0dae3
        mac: "fa:16:3e:a5:00:ed"
        networks: ["192.168.32.1/26"]
    port lrp-49dba2ef-c241-4719-a6f6-fd5ef932ebb5
        mac: "fa:16:3e:88:db:49"
        networks: ["10.0.0.135/24"]
        gateway chassis: [e51d0c21-6181-4bd9-9d9e-bb80ae968f7a]
    nat bc1b58bd-52ce-4ed1-b237-3c78ac92070c
        external ip: "10.0.0.135"
        logical ip: "192.168.32.0/26"
        type: "snat"
router 8b24476e-6aa1-4651-b630-27cf303d325a (lxd-net49-lr)
    port lxd-net49-lr-lrp-int
        mac: "00:16:3e:ac:aa:58"
        networks: ["10.183.144.1/24"]
    port lxd-net49-lr-lrp-ext
        mac: "00:16:3e:ac:aa:58"
        networks: ["192.168.6.1/22"]
    nat 54dc4d97-6b1f-4d66-8713-ca0c72666c34
        external ip: "192.168.6.1"
        logical ip: "10.183.144.0/24"
        type: "snat"

ovn-nbctl find load_balancer

_uuid               : b84e02dd-a211-4707-ad27-08ecd59227c7
external_ids        : {}
health_check        : []
ip_port_mappings    : {}
name                : lxd-net49-lb-192.168.6.50-tcp
options             : {}
protocol            : tcp
selection_fields    : []
vips                : {"192.168.6.50:443"="10.183.144.2:443,10.183.144.3:443", "192.168.6.50:80"="10.183.144.2:80,10.183.144.3:80"}

_uuid               : 0766d34b-6503-4143-96b3-108b9c132d1a
external_ids        : {enabled=True, listener_af4effcf-9380-44f1-9301-0f3d7367256b="80:pool_37210363-e4a8-4dec-b2f3-5814dd24e253", lr_ref=neutron-dcadaa16-7d8e-4150-84cc-ccd2df882a50, ls_refs="{\"neutron-4371e259-2101-4469-b8a0-754ea290ce26\": 2}", "neutron:member_status"="{\"f79895ee-9702-4faa-a9c4-cd5a40d36a7f\": \"ERROR\", \"e760cd7e-31db-4523-b628-79907f1678e4\": \"ERROR\"}", "neutron:vip"="10.0.0.156", "neutron:vip_port_id"="51acdcc4-ca42-4bd1-a6e8-3370822beaa8", "octavia:healthmonitors"="[\"995518ea-bd6e-446a-9057-46d83c444830\"]", pool_37210363-e4a8-4dec-b2f3-5814dd24e253="member_f79895ee-9702-4faa-a9c4-cd5a40d36a7f_192.168.32.3:80_fa63e289-ed90-4c80-8a21-d544cd1b2714,member_e760cd7e-31db-4523-b628-79907f1678e4_192.168.32.5:80_fa63e289-ed90-4c80-8a21-d544cd1b2714"}
health_check        : [ab60b064-7a87-4884-b92c-0d9576fd411c]
ip_port_mappings    : {"192.168.32.3"="533bc73e-eab1-4872-95ca-e9b6e0055f67:192.168.32.61", "192.168.32.5"="c4bafa86-a981-4c49-90d7-57eb61f736b4:192.168.32.61"}
name                : "53df3769-fcb7-4258-85f1-9f389f2c1a05"
options             : {}
protocol            : tcp
selection_fields    : [ip_dst, ip_src, tp_dst, tp_src]
vips                : {"10.0.0.156:80"="192.168.32.3:80,192.168.32.5:80"}

ovn-nbctl find logical_router

_uuid               : 6e250c98-9bf9-44e5-94b1-0d20f7835388
copp                : []
enabled             : true
external_ids        : {"neutron:availability_zone_hints"="", "neutron:revision_number"="3", "neutron:router_name"=router1}
load_balancer       : [0766d34b-6503-4143-96b3-108b9c132d1a]
load_balancer_group : []
name                : neutron-dcadaa16-7d8e-4150-84cc-ccd2df882a50
nat                 : [bc1b58bd-52ce-4ed1-b237-3c78ac92070c]
options             : {always_learn_from_arp_request="false", dynamic_neigh_routers="true", mac_binding_age_threshold="0"}
policies            : []
ports               : [9b28e21a-8e20-4c6a-9345-bcb459b2c005, e4f0967f-d245-4154-9c7f-2fe3d3ba2155]
static_routes       : [ab5aad72-b9d5-42da-bd0b-66f9fb297ffd]

_uuid               : 8b24476e-6aa1-4651-b630-27cf303d325a
copp                : []
enabled             : []
external_ids        : {}
load_balancer       : [b84e02dd-a211-4707-ad27-08ecd59227c7]
load_balancer_group : []
name                : lxd-net49-lr
nat                 : [54dc4d97-6b1f-4d66-8713-ca0c72666c34]
options             : {mac_binding_age_threshold="0"}
policies            : [5c7104c3-3352-4316-93a3-3b00f1b1d19e, 6a0806b8-f9d3-4536-9341-53cab84480a3, a7c8e0ef-7fe3-4f13-b6de-343031c2aadf]
ports               : [2d3c940a-2306-4e75-bbfa-388753a2091d, 70339737-551c-4f2d-88b0-e8093701c1c1]
static_routes       : [0800a1fc-36ac-4284-b344-258dffad6ffa]

Are you using distributed routers or per-chassis routers?

My openstack installation (devstack) use ENABLE_CHASSIS_AS_GW=True, Then devstack will set ovn-cms-options with enable-chassis-as-gw

ovn-sbctl show

Chassis "e51d0c21-6181-4bd9-9d9e-bb80ae968f7a"
    hostname: openstack
    Encap geneve
        ip: "192.168.2.21"
        options: {csum="true"}
    Port_Binding lxd-net49-instance-582c2399-9b2a-4fe8-bffb-1385724fd6a7-eth0
    Port_Binding lxd-net49-instance-97da1458-3818-4f23-a5a0-289b3fffe5ca-eth0
    Port_Binding lxd-net49-instance-57327725-8540-49c8-9e4b-fd4a165a944a-eth0
    Port_Binding lxd-net49-instance-17490213-05ee-4a6e-a3b7-7878041048fe-eth0
    Port_Binding cr-lrp-49dba2ef-c241-4719-a6f6-fd5ef932ebb5
    Port_Binding cr-lxd-net49-lr-lrp-ext
    Port_Binding lxd-net49-instance-a8b9cc48-fdfa-4d35-9b1a-e75b9cc1298d-eth0

See comment on upstream bug for more info https://github.com/ovn-org/ovn/issues/144#issuecomment-1238391080

Concerning this issue, it's true in ubuntu 22.04 that logical router was snat only and could not be used as gateway for external traffic to internal network but now it's work in ubuntu 24.04.

But after reading the issue, i have applied the suggest from https://github.com/ovn-org/ovn/issues/144#issuecomment-1277123745 and now the ovn network can reach the load balancer.

sudo ovn-nbctl --wait=hv set logical_router lxd-net34-lr options:chassis=xxx

ovn-nbctl --wait=hv set logical_router lxd-net49-lr options:chassis=e51d0c21-6181-4bd9-9d9e-bb80ae968f7a

ovn-sbctl show

Chassis "e51d0c21-6181-4bd9-9d9e-bb80ae968f7a"
    hostname: openstack
    Encap geneve
        ip: "192.168.2.21"
        options: {csum="true"}
    Port_Binding lxd-net49-instance-582c2399-9b2a-4fe8-bffb-1385724fd6a7-eth0
    Port_Binding lxd-net49-lr-lrp-ext
    Port_Binding lxd-net49-instance-97da1458-3818-4f23-a5a0-289b3fffe5ca-eth0
    Port_Binding lxd-net49-ls-ext-lsp-router
    Port_Binding lxd-net49-instance-57327725-8540-49c8-9e4b-fd4a165a944a-eth0
    Port_Binding lxd-net49-instance-17490213-05ee-4a6e-a3b7-7878041048fe-eth0
    Port_Binding cr-lrp-49dba2ef-c241-4719-a6f6-fd5ef932ebb5
    Port_Binding lxd-net49-instance-a8b9cc48-fdfa-4d35-9b1a-e75b9cc1298d-eth0
    Port_Binding lxd-net49-lr-lrp-int
    Port_Binding lxd-net49-ls-int-lsp-router

Also I confirm that is working on a fresh lxd installation inside a multipass vm.

Fred78290 commented 1 month ago

@tomponline

an hotfix after created the load balancer and added target + port. The sleep is needed to wait propagating to south bridge.

#===========================================================================================================================================
# PATCH OVN LOAD BALANCER
#===========================================================================================================================================
OVN_CHASSIS_UUID=$(sudo ovn-sbctl show | grep Chassis | cut -d ' ' -f 2 | tr -d '"')
OVN_NLB_NAME=$(sudo ovn-nbctl find load_balancer | grep "lb-${NLB_VIP_ADDRESS}-tcp" | awk '{print $3}')
OVN_ROUTER_NAME="${OVN_NLB_NAME%-lb*}-lr"

sudo ovn-nbctl --wait=hv set logical_router ${OVN_ROUTER_NAME} options:chassis=${OVN_CHASSIS_UUID}
sleep 2

Here a full script to create a multipass instance with lxd installed and ovn network, load balancer with 3 instances

#!/bin/bash
set -e

UBUNTU_DISTRIBUTION=noble
SSH_KEY=$(cat ~/.ssh/id_rsa.pub)
VMNAME="lxd-ovn-${UBUNTU_DISTRIBUTION}-${RANDOM}"
VMNETWORK=
VMCPU=4
VMMEMORY=8
VMDISK=40

OPTIONS=(
    "cpu:"
    "disk:"
    "memory:"
    "network:"
    "ssh-key:"
    "ubuntu:"
)

PARAMS=$(echo ${OPTIONS[@]} | tr ' ' ',')
TEMP=$(getopt -o u:d:c:m:n:k: --long "${PARAMS}"  -n "$0" -- "$@")

eval set -- "${TEMP}"

while true ; do
    #echo "1:$1"
    case "$1" in
        -u|--ubuntu)
            UBUNTU_DISTRIBUTION="$2"
            VMNAME="lxd-ovn-${UBUNTU_DISTRIBUTION}-${RANDOM}"
            shift 2
            ;;
        -d|--disk)
            VMDISK="$2"
            shift 2
            ;;
        -c|--cpu)
            VMCPU="$2"
            shift 2
            ;;
        -m|--memory)
            VMMEMORY="$2"
            shift 2
            ;;
        -n|--network)
            VMNETWORK="$2"
            shift 2
            ;;
        -k|--ssh-key)
            SSH_KEY="$2"
            shift 2
            ;;
        --)
            shift
            break
            ;;
        *)
            echo_red "$1 - Internal error!"
            exit 1
            ;;
    esac
done

for PREVIOUS in $(multipass ls | grep '\-ovn-' | cut -d ' ' -f 1)
do
    multipass delete ${PREVIOUS} -p
done

if [ -n "${VMNETWORK}" ]; then
    VMNETWORK="--network name=${VMNETWORK}"
fi

multipass launch -n ${VMNAME} -c ${VMCPU} -m ${VMMEMORY}G -d ${VMDISK}G ${VMNETWORK} ${UBUNTU_DISTRIBUTION}
multipass exec ${VMNAME} -- bash -c "echo '${SSH_KEY}' >> ~/.ssh/authorized_keys"
multipass exec ${VMNAME} -- sudo bash -c "apt update"
multipass exec ${VMNAME} -- sudo bash -c "DEBIAN_FRONTEND=noninteractive apt upgrade -y"
multipass restart ${VMNAME}

sleep 2

multipass shell ${VMNAME} << 'EOF'

cat > create-lxd.sh <<'SHELL'
#!/bin/bash
set -e

export INSTALL_BR_EX=NO
export DEBIAN_FRONTEND=noninteractive

LISTEN_INF=${LISTEN_INF:=$(ip route show default 0.0.0.0/0 | sed -n '2 p' |cut -d ' ' -f 5)}
LISTEN_CIDR=$(ip addr show ${LISTEN_INF} | grep "inet\s" | awk '{print $2}')
LISTEN_IP=$(echo ${LISTEN_CIDR} | cut -d '/' -f 1)

#===========================================================================================================================================
#
#===========================================================================================================================================
function echo_blue_bold() {
    # echo message in blue and bold
    echo -e "\x1B[90m= $(date '+%Y-%m-%d %T') \x1B[39m\x1B[1m\x1B[34m$@\x1B[0m\x1B[39m"
}

#===========================================================================================================================================
#
#===========================================================================================================================================
function echo_blue_dot_title() {
    # echo message in blue and bold
    echo -n -e "\x1B[90m= $(date '+%Y-%m-%d %T') \x1B[39m\x1B[1m\x1B[34m$@\x1B[0m\x1B[39m"
}

#===========================================================================================================================================
#
#===========================================================================================================================================
function echo_blue_dot() {
    echo -n -e "\x1B[90m\x1B[39m\x1B[1m\x1B[34m.\x1B[0m\x1B[39m"
}

#===========================================================================================================================================
#
#===========================================================================================================================================
function start_console_green_italic() {
    echo -e "\x1B[3m\x1B[32m"
}

#===========================================================================================================================================
#
#===========================================================================================================================================
function stop_console_green_italic() {
    echo -e "\x1B[23m\x1B[39m"
}

#===========================================================================================================================================
#
#===========================================================================================================================================
function launch_container() {
    local NAME=$1
    local CONTAINER_IP=

    echo_blue_bold "Create instance ${NAME}"

    lxc launch ubuntu:noble ${NAME} --network=ovntest

    start_console_green_italic
    lxc exec ${NAME} -- apt update
    lxc exec ${NAME} -- bash -c 'DEBIAN_FRONTEND=noninteractive apt upgrade -y'
    lxc exec ${NAME} -- apt install nginx -y
    stop_console_green_italic

    echo_blue_dot_title "Wait ip instance ${NAME}"

    while [ -z "${CONTAINER_IP}" ]; do
        CONTAINER_IP=$(lxc list name=${NAME} --format=json | jq -r '.[0].state.network|.eth0.addresses[]|select(.family == "inet")|.address')
        sleep 1
        echo_blue_dot
    done
    echo
}

#===========================================================================================================================================
#
#===========================================================================================================================================
function container_ip() {
    local NAME=$1

    lxc list name=${NAME} --format=json | jq -r '.[0].state.network|.eth0.addresses[]|select(.family == "inet")|.address'
}

#===========================================================================================================================================
# INSTALL PACKAGES
#===========================================================================================================================================
sudo apt update
sudo apt upgrade -y
sudo apt install jq socat conntrack net-tools traceroute nfs-common unzip -y
sudo snap install yq

#===========================================================================================================================================
# CONFIGURE OVN
#===========================================================================================================================================
sudo apt install ovn-host ovn-central -y
sudo ovs-vsctl set open_vswitch . \
   external_ids:ovn-remote=unix:/var/run/ovn/ovnsb_db.sock \
   external_ids:ovn-encap-type=geneve \
   external_ids:ovn-encap-ip=127.0.0.1

cat > restore-bridge.sh <<-LXDINIT
#!/bin/bash
ip route add 10.68.223.0/24 via 192.168.48.192
LXDINIT

sudo cp restore-bridge.sh /usr/local/bin
sudo chmod +x /usr/local/bin/restore-bridge.sh

#===========================================================================================================================================
# INSTALL SERVICE RESTORE ROUTE
#===========================================================================================================================================
cat > restore-bridge.service <<-LXDINIT
[Install]
WantedBy = multi-user.target

[Unit]
After = ovn-northd.service snap.lxd.daemon.service
Description = Service for adding physical ip to ovn bridge

[Service]
Type = oneshot
TimeoutStopSec = 30
Restart = no
SyslogIdentifier = restore-devstack
ExecStart = /usr/local/bin/restore-bridge.sh
LXDINIT

sudo cp restore-bridge.service /etc/systemd/system 
sudo systemctl enable restore-bridge.service

#===========================================================================================================================================
# INSTALL LXD
#===========================================================================================================================================
if [ -z "$(snap list | grep lxd)" ]; then
    sudo snap install lxd --channel=6.1/stable
elif [[ "$(snap list | grep lxd)" != *6.1* ]]; then
    sudo snap refresh lxd --channel=6.1/stable
fi

lxd init --preseed <<< $(cat << LXDINIT
config:
  core.https_address: '[::]:8443'
networks:
- config:
    ipv4.address: 192.168.48.1/24
    ipv4.dhcp.ranges: 192.168.48.128-192.168.48.159
    ipv4.ovn.ranges: 192.168.48.192-192.168.48.253
    ipv4.routes: 192.168.50.0/24
    ipv4.nat: true
  description: ""
  name: lxdbr0
  type: ""
  project: default
storage_pools:
- config: {}
  description: ""
  name: default
  driver: dir
storage_volumes: []
profiles:
- config: {}
  description: ""
  devices:
    eth0:
      name: eth0
      network: lxdbr0
      type: nic
    root:
      path: /
      pool: default
      type: disk
  name: default
projects: []
cluster: null
LXDINIT
)

#===========================================================================================================================================
# CREATE OVN NETWORK
#===========================================================================================================================================
echo_blue_bold "Create ovntest"

lxc network create ovntest --type=ovn network=lxdbr0 ipv4.address=10.68.223.1/24 ipv4.nat=true volatile.network.ipv4.address=192.168.48.192

sudo ip route add 10.68.223.0/24 via 192.168.48.192

#===========================================================================================================================================
# CREATE CONTAINERS
#===========================================================================================================================================
launch_container u1
launch_container u2

U1_IP=$(container_ip u1)
U2_IP=$(container_ip u2)

echo_blue_bold "Create instance u3"
lxc launch ubuntu:noble u3 --network=lxdbr0

lxc ls

#===========================================================================================================================================
# CREATE OVN LOAD BALANCER
#===========================================================================================================================================

NLB_VIP_ADDRESS=$(lxc network load-balancer create ovntest --allocate=ipv4 | cut -d ' ' -f 4)

echo_blue_bold "NLB_VIP_ADDRESS=${NLB_VIP_ADDRESS}"

lxc network load-balancer backend add ovntest ${NLB_VIP_ADDRESS} u1 ${U1_IP} 80,443
lxc network load-balancer backend add ovntest ${NLB_VIP_ADDRESS} u2 ${U2_IP} 80,443
lxc network load-balancer port add ovntest ${NLB_VIP_ADDRESS} tcp 80,443 u1,u2

#===========================================================================================================================================
# PATCH OVN LOAD BALANCER
#===========================================================================================================================================
OVN_CHASSIS_UUID=$(sudo ovn-sbctl show | grep Chassis | cut -d ' ' -f 2 | tr -d '"')
OVN_NLB_NAME=$(sudo ovn-nbctl find load_balancer | grep "lb-${NLB_VIP_ADDRESS}-tcp" | awk '{print $3}')
OVN_ROUTER_NAME="${OVN_NLB_NAME%-lb*}-lr"

sudo ovn-nbctl --wait=hv set logical_router ${OVN_ROUTER_NAME} options:chassis=${OVN_CHASSIS_UUID}
sleep 2

#===========================================================================================================================================
# TEST OVN LOAD BALANCER
#===========================================================================================================================================
#echo_blue_bold "Check load balancer on host"
curl http://${NLB_VIP_ADDRESS}

echo_blue_bold "Check load balancer on u3"
lxc exec u3 -- curl http://${NLB_VIP_ADDRESS}

echo_blue_bold "Check load balancer on u1"
lxc exec u1 -- curl http://${NLB_VIP_ADDRESS}

SHELL

chmod +x create-lxd.sh
exit 0

EOF

multipass exec ${VMNAME} -- ./create-lxd.sh

echo "multipass shell ${VMNAME}"
tomponline commented 1 month ago

Cool glad to hear that works for single nodes.

We'll need to wait for a fix for distributed routers as that is what LXD uses for clusters.

Fred78290 commented 1 month ago

We'll need to wait for a fix for distributed routers as that is what LXD uses for clusters.

Must be done on each node? I will try.

mkalcok commented 1 month ago

Just adding my two cents here, the reason why setting options:chassis "fixes" this issue, is that it changes the LR from "distributed" to "gateway/centralized" mode. "Hair-pinning" on the LB address in gateway router is a working and tested scenario, but in distributed router, it is not. I gave it a quick spin in a multi-node LXD cluster setup and, to my surprise, it did not negatively affect connectivity to external networks from VMs (regardless of the LXD node on which they were located). However it's good to keep in mind that doing this removes the failover capabilities of the router. Meaning that if the chassis that hosts this router goes down, every VM that relied on this router for external connectivity will lose it.

Now regarding the ovn-octavia-provider, I skimmed through their code, and they appear to be associating the LB with LS in addition to just LR. I gave it a go as well with

# While using Distributed Logical Router
ovn-nbctl ls-lb-add <ls_name> <lb_name>

and it fixed the hairpin LB issue as well! @tomponline perhaps this could be the alternative way forward for LXD that would not require switching to gateway routers, nor waiting for the LB hairpin fix in distributed routers. I'm not that versed with the OVN LBs, so the true implications of this approach would have to be examined in more detail, but to me it looks promising .

tomponline commented 1 month ago

and it fixed the hairpin LB issue as well! @tomponline perhaps this could be the alternative way forward for LXD that would not require switching to gateway routers, nor waiting for the LB hairpin fix in distributed routers. I'm not that versed with the OVN LBs, so the true implications of this approach would have to be examined in more detail, but to me it looks promising .

Interesting, this is indeed worth investigating. Thanks!

Fred78290 commented 1 month ago

@mkalcok

While using Distributed Logical Router

ovn-nbctl ls-lb-add

On the external switch or internal, it must be done?

mkalcok commented 1 month ago

@Fred78290 On the internal switch

Fred78290 commented 1 month ago

@tomponline @mkalcok

The workaround ovn-nbctl ls-lb-add is not stable in the time. In fact after some traffic between containers thru the load balancer, OVN reset the table logical-switch to load-balancer.

I spended too many hours on ovn & microovn with different version with lxd and my conclusion is ovn is not ready.

I fallback on keepalived for internal load balancing.

Regards

Fred78290 commented 4 weeks ago

The workaround ovn-nbctl ls-lb-add is not stable in the time. In fact after some traffic between containers thru the load balancer, OVN reset the table logical-switch to load-balancer.

@tomponline @mkalcok

Forget my last comment, the reason is when i add a new listener into the ovn load balancer with lxc network load-balancer edit <vip>, the load balancer is deleted by lxd and recreated. So I need to apply again ovn-nbctl ls-lb-add and it's working again.

Regards

mkalcok commented 3 weeks ago

Hi @Fred78290 , thanks for the follow up. Yeah, I think that any workaround that involves manually touching OVN is at risk of eventually getting overwritten by the LXD. Hopefully though, a fix for this will make it into the LXD natively.