Closed gorriec closed 8 years ago
Working on reproducing this in Kilo now. We have seen this work properly in Liberty so we are going to see if we can narrow down where the problem is.
@gorriec we are able to recreate the issue locally in Kilo and are debugging the issue now.
I was able to confirm that we do not see the issue in Liberty. This appears to be an OpenStack Neutron issue for LBaaSv1 in Kilo. I am going to see if there is a bug report for it on launchpad for OpenStack Kilo. We may need to file one and try to fix it I cannot find one.
@zancas as we discussed today in our Sprint planning please do the following based on my initial triage. Make sure you reach out to @richbrowne or @mattgreene if you need help.
See if you can figure out what the dictionary returned by the call to agent_conf = self.get_configuration_dict(agent)
in neutron-lbaas/neutron_lbaas/agent_scheduler.py
below. It looks like when you have our agent running and registered that this is missing the key device_drivers
.
def get_lbaas_agent_candidates(self, device_driver, active_agents):
candidates = []
for agent in active_agents:
agent_conf = self.get_configuration_dict(agent)
if device_driver in agent_conf['device_drivers']:
candidates.append(agent)
return candidates
Also see if this problem exists if you run two drivers of any type other than ours.
When the haproxy lbaas service is invoked the first thing that will happen is to find an agent to perform the action. It queries the database for all active agents of type AGENT_TYPE_LOADBALANCER. The result will be all active loadbalancers, including the f5 loadbalancer. In the code printed above, it then queries every active agent to see if the device driver reported in the config matches the one passed in to the method. The F5 LBaaS driver does not define this key in the agent config and so the exception results when the F5 LBaaS agent is processed. Since the ha proxy device driver is the reference implementation, we probably need to make a change in the icontrol_driver initialization to do:
diff --git a/agent/f5/oslbaasv1agent/drivers/bigip/icontrol_driver.py b/agent/f5/oslbaasv1agent/drivers/bigip/icontrol_driver.py
index 4dae18e..a5dc9aa 100644
--- a/agent/f5/oslbaasv1agent/drivers/bigip/icontrol_driver.py
+++ b/agent/f5/oslbaasv1agent/drivers/bigip/icontrol_driver.py
@@ -263,6 +263,7 @@ class iControlDriver(LBaaSBaseDriver):
self.device_type = conf.f5_device_type
self.plugin_rpc = None
self.__last_connect_attempt = None
+ self.driver_name = 'f5-lbaas-icontrol'
# BIG-IP containers
self.__bigips = {}
@@ -288,7 +289,7 @@ class iControlDriver(LBaaSBaseDriver):
self.agent_configurations['common_networks'] = \
self.conf.common_network_ids
-
+ self.agent_configurations['device_drivers'] = [ self.driver_name ]
if self.conf.environment_prefix:
LOG.debug(_('BIG-IP name prefix for this environment: %s' %
self.conf.environment_prefix))
From the LBaaSv2 agent. This would be very similar to the LBaaSv1 config. Notice not device drivers dictionary.
[testlab@host-26 ~(keystone_admin)]$ neutron agent-show 2455220c-ca7d-40c6-bbe9-72a81e023096
+---------------------+----------------------------------------------------------------------+
| Field | Value |
+---------------------+----------------------------------------------------------------------+
| admin_state_up | True |
| agent_type | Loadbalancerv2 agent |
| alive | True |
| binary | f5-oslbaasv2-agent |
| configurations | { |
| | "icontrol_endpoints": { |
| | "10.190.7.116": { |
| | "device_name": "bigip1", |
| | "platform": "", |
| | "version": "11.6.0", |
| | "serial_number": "cedac472-0ee4-49ce-2f76642d1087" |
| | } |
| | }, |
| | "request_queue_depth": 0, |
| | "environment_prefix": "Project", |
| | "tunneling_ips": [ |
| | "201.0.159.10" |
| | ], |
| | "common_networks": {}, |
| | "services": 1, |
| | "f5_common_external_networks": true, |
| | "environment_capacity_score": 0, |
| | "tunnel_types": [ |
| | "vxlan" |
| | ], |
| | "environment_group_number": 1, |
| | "bridge_mappings": { |
| | "default": "1.1" |
| | }, |
| | "global_routed_mode": false |
| | } |
| created_at | 2016-04-18 18:38:55 |
| description | |
| heartbeat_timestamp | 2016-04-19 20:05:25 |
| host | host-26.int.lineratesystems.com:f766515f-cbc3-5c4a-bb6e-d950e1cc0e34 |
| id | 2455220c-ca7d-40c6-bbe9-72a81e023096 |
| started_at | 2016-04-18 18:38:55 |
| topic | f5-lbaasv2-process-on-agent |
+---------------------+----------------------------------------------------------------------+
From the haproxy agent:
[testlab@host-5 ~(keystone_admin)]$ neutron agent-show f2363e5a-a7ab-4517-be6f-12158e22ee37
+---------------------+--------------------------------------+
| Field | Value |
+---------------------+--------------------------------------+
| admin_state_up | True |
| agent_type | Loadbalancer agent |
| alive | True |
| binary | neutron-lbaas-agent |
| configurations | { |
| | "device_drivers": [ |
| | "haproxy_ns" |
| | ], |
| | "instances": 0 |
| | } |
| created_at | 2016-04-19 18:52:52 |
| description | |
| heartbeat_timestamp | 2016-04-19 20:17:52 |
| host | host-5.int.lineratesystems.com |
| id | f2363e5a-a7ab-4517-be6f-12158e22ee37 |
| started_at | 2016-04-19 18:52:52 |
| topic | n-lbaas_agent |
+---------------------+--------------------------------------+
The reason why we don't see this problem in the case where only the f5 lbaas v1 driver is running is because we override the schedule() method, in agent_scheduler.py
Should we also add "instances" to the "configurations" dict of the v2 driver... since the haproxy agent has it?
@richbrowne it looks like you have a commit that fixes the bug? If you do, and you want me to test it you could PR me with it, or if you've already confident that your fix works should I just reassign this issue to you?
Tested Rich's proposed solution. The second part of the diff was located within an if/else and needs to move above that code block so that device_drivers shows up for both global routed and l2 adjacent mode. With that change, I created a pool for HAProxy with the F5 agent active w/out problem. It failed previously for the same reason reported in this issue.
Alright, since I've done none of the work to close this issue, and it seems resolved to me, I'm reassigning to @mattgreene.
Let make sure that we look at LBaaSv2 as well and make sure that we make the same fix if it applies. We don't want customers to hit this if the upgrade.
OK, but we've not previously seen this bug in lbaasv2, correct?
You tell me. Have you tested it?
Working on it.
We hadn't seen it previously on LBaaSv1. So we need QE to take custody of the Escape Analysis and make sure this doesn't happen again... anywhere.
ACK
Agent Version
Version 1.0.14
Operating System
Ubuntu 14.04
OpenStack Release
Kilo
Description
We have been testing Openstack Kilo integration with a Big-IP instance and have been successfully using the F5 Lbaas Agent to allow F5 loadbalancer instances to be created on the Big-IP via Neutron. The F5 agent is running standalone (on a host also running an instance of neutron-server) and connects to a single Big-IP (VE) device. Our neutron controller service is configured with 3 nodes in HA. Each neutron controller has the F5 device driver configured in addition to the HAProxy lbaas driver. The HAProxy lbaas driver is configured as the default driver.
Creation of F5 Load balancers and monitors works faultlessly.
In testing we have noticed that it is not possible to create an HAProxy LBAAS instance if the F5-Agent is running however.
If we try to create an HAproxy LBaaS we receive an error (either in Horizon or from the neutron CLI). For example:
#neutron lb-pool-create --name cg-web-lb --lb-method ROUND_ROBIN --protocol HTTP --subnet-id b2304a4f-10dc-49ad-834a-ba5269f3ba46 Request Failed: internal server error while processing your request.
In the neutron-server logs we see:
2016-03-01 09:23:23.114 4531 ERROR neutron.api.v2.resource [req-a55ed2aa-1a72-4fcb-a428-54649ac99305 ] create failed 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource Traceback (most recent call last): 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/api/v2/resource.py", line 83, in resource 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource result = method(request=request, **args) 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/api/v2/base.py", line 461, in create 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource obj = obj_creator(request.context, **kwargs) 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron_lbaas/services/loadbalancer/plugin.py", line 196, in create_pool 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource driver.create_pool(context, p) 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron_lbaas/services/loadbalancer/drivers/common/agent_driver_base.py", line 376, in create_pool 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource self.device_driver) 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron_lbaas/services/loadbalancer/agent_scheduler.py", line 114, in schedule 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource active_agents) 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron_lbaas/services/loadbalancer/agent_scheduler.py", line 86, in get_lbaas_agent_candidates 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource if device_driver in agent_conf['device_drivers']: 2016-03-01 09:23:23.114 4531 TRACE neutron.api.v2.resource KeyError: 'device_drivers' 2016-03-01 09:23:25.012 4529 INFO neutron.api.v2.resource [req-6114e765-2226-476f-91cf-53be17c71f53 ] show failed (client error): Vip None could not be found
If we stop the F5 lbaas agent :
initctl stop f5-oslbaasv1-agent f5-oslbaasv1-agent stop/waiting
..we can now create the HAproxy LBaaS instance:neutron lb-pool-create --name cg-web-lb --lb-method ROUND_ROBIN --protocol HTTP --subnet-id b2304a4f-10dc-49ad-834a-ba5269f3ba46
Created a new pool: +------------------------+--------------------------------------+ | Field | Value | +------------------------+--------------------------------------+ | admin_state_up | True | | description | | | health_monitors | | | health_monitors_status | | | id | 431048bd-a1b7-4db2-ae8a-35425bc4d878 | | lb_method | ROUND_ROBIN | | members | | | name | cg-web-lb | | protocol | HTTP | | provider | haproxy | | status | PENDING_CREATE | | status_description | | | subnet_id | b2304a4f-10dc-49ad-834a-ba5269f3ba46 | | tenant_id | 39b082d615944f5292da852b2a71867f | | vip_id | | +------------------------+--------------------------------------+
We can recreate this issue simply by starting or stopping the F5-agent.
The neutron.conf file for the three neutron controller nodes has the following configuration for the service providers:
[service_providers] service_provider = LOADBALANCER:Haproxy:neutron_lbaas.services.loadbalancer.drivers.haproxy.plugin_driver.HaproxyOnHostPluginDriver:default service_provider=LOADBALANCER:F5:f5.oslbaasv1driver.drivers.plugin_driver.F5PluginDriver
The agent configuration file and log are attached.
f5-oslbaasv1-agent.ini.txt
f5-oslbaasv1-agent.log.txt
Deployment
Three OpenStack controllers in HA, each with the F5-driver configured. One node is running the F5-lbaas-agent. The F5 agent is managing a single BIG-IP (VE) appliance.