Ensure errors received from device prevent creation of LB components

ghost commented 9 years ago

Currently, if you try to do something like lbaas-loadbalancer-create on a device that isn't ready yet (you just rebuilt the appliance in stack, for example), the driver throws an error but the elements are still created in Openstack. You can't delete it because it doesn't exist on the device so you're forced to manually delete the record from the DB (--good). We need to ensure that said error condition is propagated to the caller so it knows not to create DB records/etc for stuff that's broken.

Cedev commented 9 years ago

If the device isn't ready yet, the command should probably be retried again later. Is there a way to signal to neutron that the command should be re-queued? Once a record is created (for whatever reason) and there isn't a corresponding object on the device, deleting the object in neutron should succeed.

Cedev commented 9 years ago

We should at least set the status_description when commands fail, so the reason is visible in the neutron commands as well as in the logs.

The user experience when encountering errors is pretty bad. I created a health monitor and pool and associated them together.

    neutron lb-healthmonitor-create --delay 200 --max-retries 2 --timeout 100 --type TCP
    neutron lb-pool-create --name pool1 --lb-method ROUND_ROBIN --subnet-id public-subnet --protocol TCP

    neutron lb-healthmonitor-associate ... pool1

The only response I got from the lb-healthmonitor-associate command is

Request Failed: internal server error while processing your request.

The logs have a much more helpful message

ACOSException: 1162 Invalid integer. Parameter interval in method slb.hm.create. The valid scope is 1 - 180.

It would be nice if that message found its way into neutron lb-healthmonitor-show ...

+----------------+----------------------------------------------------------------------------------------------------+
| Field          | Value                                                                                              |
+----------------+----------------------------------------------------------------------------------------------------+
| admin_state_up | True                                                                                               |
| delay          | 200                                                                                                |
| id             | 232e59c7-5bb7-4050-b3cc-a7dd3c23face                                                               |
| max_retries    | 2                                                                                                  |
| pools          | {"status": "ERROR", "status_description": null, "pool_id": "42161812-21dc-4013-80cf-ef80c0899809"} |
| tenant_id      | 2d205c4f252249acaa8c1bc53b40f1dd                                                                   |
| timeout        | 100                                                                                                |
| type           | TCP                                                                                                |
+----------------+----------------------------------------------------------------------------------------------------+

Cedev commented 8 years ago

Returning and recording nice error messages is #179

a10networks / a10-neutron-lbaas

Ensure errors received from device prevent creation of LB components #116