F5Networks / f5-openstack-lbaasv1

OpenStack Neutron LBaaSv1 plugin and agent to control F5 BIG-IP devices
http://f5-openstack-lbaasv1.readthedocs.io
6 stars 8 forks source link

Bug - F5 Agent running prevents HAproxy LBAAS instances from being created #74

Closed gorriec closed 8 years ago

gorriec commented 8 years ago

Agent Version

Version 1.0.14

Operating System

Ubuntu 14.04

OpenStack Release

Kilo

Description

We have been testing Openstack Kilo integration with a Big-IP instance and have been successfully using the F5 Lbaas Agent to allow F5 loadbalancer instances to be created on the Big-IP via Neutron. The F5 agent is running standalone (on a host also running an instance of neutron-server) and connects to a single Big-IP (VE) device. Our neutron controller service is configured with 3 nodes in HA. Each neutron controller has the F5 device driver configured in addition to the HAProxy lbaas driver. The HAProxy lbaas driver is configured as the default driver.

Creation of F5 Load balancers and monitors works faultlessly.

In testing we have noticed that it is not possible to create an HAProxy LBAAS instance if the F5-Agent is running however.

If we try to create an HAproxy LBaaS we receive an error (either in Horizon or from the neutron CLI). For example:

#neutron lb-pool-create --name cg-web-lb --lb-method ROUND_ROBIN --protocol HTTP --subnet-id b2304a4f-10dc-49ad-834a-ba5269f3ba46 Request Failed: internal server error while processing your request.

In the neutron-server logs we see: 2016-03-01 09:23:23.114 4531 ERROR neutron.api.v2.resource [req-a55ed2aa-1a72-4fcb-a428-54649ac99305 ] create failed 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource Traceback (most recent call last): 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/api/v2/resource.py", line 83, in resource 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource result = method(request=request, **args) 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/api/v2/base.py", line 461, in create 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource obj = obj_creator(request.context, **kwargs) 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron_lbaas/services/loadbalancer/plugin.py", line 196, in create_pool 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource driver.create_pool(context, p) 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron_lbaas/services/loadbalancer/drivers/common/agent_driver_base.py", line 376, in create_pool 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource self.device_driver) 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron_lbaas/services/loadbalancer/agent_scheduler.py", line 114, in schedule 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource active_agents) 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron_lbaas/services/loadbalancer/agent_scheduler.py", line 86, in get_lbaas_agent_candidates 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource if device_driver in agent_conf['device_drivers']: 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource KeyError: 'device_drivers' 2016-03-01 09:23:25.012 4529 INFO neutron.api.v2.resource [req-6114e765-2226-476f-91cf-53be17c71f53 ] show failed (client error): Vip None could not be found

If we stop the F5 lbaas agent :

initctl stop f5-oslbaasv1-agent f5-oslbaasv1-agent stop/waiting ..we can now create the HAproxy LBaaS instance: neutron lb-pool-create --name cg-web-lb --lb-method ROUND_ROBIN --protocol HTTP --subnet-id b2304a4f-10dc-49ad-834a-ba5269f3ba46

Created a new pool: +------------------------+--------------------------------------+ | Field | Value | +------------------------+--------------------------------------+ | admin_state_up | True | | description | | | health_monitors | | | health_monitors_status | | | id | 431048bd-a1b7-4db2-ae8a-35425bc4d878 | | lb_method | ROUND_ROBIN | | members | | | name | cg-web-lb | | protocol | HTTP | | provider | haproxy | | status | PENDING_CREATE | | status_description | | | subnet_id | b2304a4f-10dc-49ad-834a-ba5269f3ba46 | | tenant_id | 39b082d615944f5292da852b2a71867f | | vip_id | | +------------------------+--------------------------------------+

We can recreate this issue simply by starting or stopping the F5-agent.

The neutron.conf file for the three neutron controller nodes has the following configuration for the service providers: [service_providers] service_provider = LOADBALANCER:Haproxy:neutron_lbaas.services.loadbalancer.drivers.haproxy.plugin_driver.HaproxyOnHostPluginDriver:default service_provider=LOADBALANCER:F5:f5.oslbaasv1driver.drivers.plugin_driver.F5PluginDriver

The agent configuration file and log are attached.

f5-oslbaasv1-agent.ini.txt

f5-oslbaasv1-agent.log.txt

Deployment

Three OpenStack controllers in HA, each with the F5-driver configured. One node is running the F5-lbaas-agent. The F5 agent is managing a single BIG-IP (VE) appliance.

swormke commented 8 years ago

Working on reproducing this in Kilo now. We have seen this work properly in Liberty so we are going to see if we can narrow down where the problem is.

swormke commented 8 years ago

@gorriec we are able to recreate the issue locally in Kilo and are debugging the issue now.

swormke commented 8 years ago

I was able to confirm that we do not see the issue in Liberty. This appears to be an OpenStack Neutron issue for LBaaSv1 in Kilo. I am going to see if there is a bug report for it on launchpad for OpenStack Kilo. We may need to file one and try to fix it I cannot find one.

swormke commented 8 years ago

@zancas as we discussed today in our Sprint planning please do the following based on my initial triage. Make sure you reach out to @richbrowne or @mattgreene if you need help.

See if you can figure out what the dictionary returned by the call to agent_conf = self.get_configuration_dict(agent) in neutron-lbaas/neutron_lbaas/agent_scheduler.py below. It looks like when you have our agent running and registered that this is missing the key device_drivers.

    def get_lbaas_agent_candidates(self, device_driver, active_agents):
        candidates = []
        for agent in active_agents:
            agent_conf = self.get_configuration_dict(agent)
            if device_driver in agent_conf['device_drivers']:
                candidates.append(agent)
        return candidates

Also see if this problem exists if you run two drivers of any type other than ours.

richbrowne commented 8 years ago

When the haproxy lbaas service is invoked the first thing that will happen is to find an agent to perform the action. It queries the database for all active agents of type AGENT_TYPE_LOADBALANCER. The result will be all active loadbalancers, including the f5 loadbalancer. In the code printed above, it then queries every active agent to see if the device driver reported in the config matches the one passed in to the method. The F5 LBaaS driver does not define this key in the agent config and so the exception results when the F5 LBaaS agent is processed. Since the ha proxy device driver is the reference implementation, we probably need to make a change in the icontrol_driver initialization to do:

diff --git a/agent/f5/oslbaasv1agent/drivers/bigip/icontrol_driver.py b/agent/f5/oslbaasv1agent/drivers/bigip/icontrol_driver.py
index 4dae18e..a5dc9aa 100644
--- a/agent/f5/oslbaasv1agent/drivers/bigip/icontrol_driver.py
+++ b/agent/f5/oslbaasv1agent/drivers/bigip/icontrol_driver.py
@@ -263,6 +263,7 @@ class iControlDriver(LBaaSBaseDriver):
         self.device_type = conf.f5_device_type
         self.plugin_rpc = None
         self.__last_connect_attempt = None
+       self.driver_name = 'f5-lbaas-icontrol'

         # BIG-IP containers
         self.__bigips = {}
@@ -288,7 +289,7 @@ class iControlDriver(LBaaSBaseDriver):

             self.agent_configurations['common_networks'] = \
                 self.conf.common_network_ids
-
+           self.agent_configurations['device_drivers'] = [ self.driver_name ]
             if self.conf.environment_prefix:
                 LOG.debug(_('BIG-IP name prefix for this environment: %s' %
                             self.conf.environment_prefix))
richbrowne commented 8 years ago

From the LBaaSv2 agent. This would be very similar to the LBaaSv1 config. Notice not device drivers dictionary.

[testlab@host-26 ~(keystone_admin)]$ neutron agent-show 2455220c-ca7d-40c6-bbe9-72a81e023096
+---------------------+----------------------------------------------------------------------+
| Field               | Value                                                                |
+---------------------+----------------------------------------------------------------------+
| admin_state_up      | True                                                                 |
| agent_type          | Loadbalancerv2 agent                                                 |
| alive               | True                                                                 |
| binary              | f5-oslbaasv2-agent                                                   |
| configurations      | {                                                                    |
|                     |      "icontrol_endpoints": {                                         |
|                     |           "10.190.7.116": {                                          |
|                     |                "device_name": "bigip1",                              |
|                     |                "platform": "",                                       |
|                     |                "version": "11.6.0",                                  |
|                     |                "serial_number": "cedac472-0ee4-49ce-2f76642d1087"    |
|                     |           }                                                          |
|                     |      },                                                              |
|                     |      "request_queue_depth": 0,                                       |
|                     |      "environment_prefix": "Project",                                |
|                     |      "tunneling_ips": [                                              |
|                     |           "201.0.159.10"                                             |
|                     |      ],                                                              |
|                     |      "common_networks": {},                                          |
|                     |      "services": 1,                                                  |
|                     |      "f5_common_external_networks": true,                            |
|                     |      "environment_capacity_score": 0,                                |
|                     |      "tunnel_types": [                                               |
|                     |           "vxlan"                                                    |
|                     |      ],                                                              |
|                     |      "environment_group_number": 1,                                  |
|                     |      "bridge_mappings": {                                            |
|                     |           "default": "1.1"                                           |
|                     |      },                                                              |
|                     |      "global_routed_mode": false                                     |
|                     | }                                                                    |
| created_at          | 2016-04-18 18:38:55                                                  |
| description         |                                                                      |
| heartbeat_timestamp | 2016-04-19 20:05:25                                                  |
| host                | host-26.int.lineratesystems.com:f766515f-cbc3-5c4a-bb6e-d950e1cc0e34 |
| id                  | 2455220c-ca7d-40c6-bbe9-72a81e023096                                 |
| started_at          | 2016-04-18 18:38:55                                                  |
| topic               | f5-lbaasv2-process-on-agent                                          |
+---------------------+----------------------------------------------------------------------+
richbrowne commented 8 years ago

From the haproxy agent:

[testlab@host-5 ~(keystone_admin)]$ neutron agent-show f2363e5a-a7ab-4517-be6f-12158e22ee37
+---------------------+--------------------------------------+
| Field               | Value                                |
+---------------------+--------------------------------------+
| admin_state_up      | True                                 |
| agent_type          | Loadbalancer agent                   |
| alive               | True                                 |
| binary              | neutron-lbaas-agent                  |
| configurations      | {                                    |
|                     |      "device_drivers": [             |
|                     |           "haproxy_ns"               |
|                     |      ],                              |
|                     |      "instances": 0                  |
|                     | }                                    |
| created_at          | 2016-04-19 18:52:52                  |
| description         |                                      |
| heartbeat_timestamp | 2016-04-19 20:17:52                  |
| host                | host-5.int.lineratesystems.com       |
| id                  | f2363e5a-a7ab-4517-be6f-12158e22ee37 |
| started_at          | 2016-04-19 18:52:52                  |
| topic               | n-lbaas_agent                        |
+---------------------+--------------------------------------+
richbrowne commented 8 years ago

The reason why we don't see this problem in the case where only the f5 lbaas v1 driver is running is because we override the schedule() method, in agent_scheduler.py

zancas commented 8 years ago

Should we also add "instances" to the "configurations" dict of the v2 driver... since the haproxy agent has it?

zancas commented 8 years ago

@richbrowne it looks like you have a commit that fixes the bug? If you do, and you want me to test it you could PR me with it, or if you've already confident that your fix works should I just reassign this issue to you?

mattgreene commented 8 years ago

Tested Rich's proposed solution. The second part of the diff was located within an if/else and needs to move above that code block so that device_drivers shows up for both global routed and l2 adjacent mode. With that change, I created a pool for HAProxy with the F5 agent active w/out problem. It failed previously for the same reason reported in this issue.

zancas commented 8 years ago

Alright, since I've done none of the work to close this issue, and it seems resolved to me, I'm reassigning to @mattgreene.

swormke commented 8 years ago

Let make sure that we look at LBaaSv2 as well and make sure that we make the same fix if it applies. We don't want customers to hit this if the upgrade.

zancas commented 8 years ago

OK, but we've not previously seen this bug in lbaasv2, correct?

swormke commented 8 years ago

You tell me. Have you tested it?

zancas commented 8 years ago

Working on it.

mattgreene commented 8 years ago

We hadn't seen it previously on LBaaSv1. So we need QE to take custody of the Escape Analysis and make sure this doesn't happen again... anywhere.

zancas commented 8 years ago

ACK