deployment lifecycle: deleting

alonsocamaro commented 7 years ago

Currently the delete of an F5 VNF needs to be done as follows:

[deployment delete start]
delete applications -> sync -> delete sync -> delete cluster -> delete bigips 
[deployment delete completed]

The reason why F5::Cm::Sync objects needs to be created and deleted in this process is because when the "deployment delete start" is triggered the applications are deleted the bigips get out of sync. If this sync -> delete sync is not performed this would make fail deleting the deployment because currently F5::Cm::Cluster requires the pair to be in sync before removing the F5::Cm::Cluster resource type.

Please note that in an orchestrator it is not expected that to delete a VNF new resources are created. In the case above this is be the F5::Cm::Sync resource. The deletion of a VNF should just be deleting the resources created following the reverse dependencies graph. In other words, when something fails the orchestrators just try to roll-back the steps performed. In our case the way F5::Cm::Sync and F5:Cm::Cluster operate they just don't allow this. This problem also happens in the creation of a VNF, this is:

[deployment start]
create bigips  -> setup cluster -> deploy applications -> sync -> delete sync 
[deployment completed]

In this case the problem lies in that after the cluster is created if something goes wrong while deploying applications then the F5 VNF will not be able to be deleted because F5::Cm::Cluster needs that the units are in sync but the F5::Cm::Sync resource has not been created.

Also note that again the creation the deployment needs to delete resources (F5::Cm::Shync) which is not expected / does not match VNF lifecycle from an orchestrator point of view.

pjbreaux commented 7 years ago

This is tightly bound to the aggressive teardown issue in the sdk. This would have to be 'best effort', since we cannot guarantee the final state of the devices if the starting state of the cluster is not known (meaning a user went into the cluster and manually manipulated the members). This work would take around a week as well. It is mostly work in the sdk.

pjbreaux commented 7 years ago

The thing we should be able to ensure is that the cluster stack will be removed from the database.

alonsocamaro commented 7 years ago

Note: I have updated the issue description to hopefully better describe it

alonsocamaro commented 7 years ago

I find that only changing the F5::Cm::Cluster behavior would not suffice to match orchestrator's expected behavior of a VNF because of the need of deleting resources when deploying and create new resources when deleting.

Following NFV model the lifecycle should be

VNFD would be a Heat Template that creates a shared VNF used by several NSD (Network Service Descriptors)

[deployment start]
create bigips  -> setup cluster
[deployment completed]

The deletion of the VNF would be just undoing the created resources.

A NSD would be just

[deploy service]
deploy configuration in cluster
[deploy service completed]

Again, the deletion of the NS would just be undoing the created resources.

I wonder if the following could be a good approach:

F5::Cm::Cluster to have an attribute that would allow auto-sync between the F5's in the cluster
Deploying services would would be done to a F5::Cm::Cluster target instead of a F5::BigIP::Device as it is currently only allowed.

This would allow:

By having the cluster automatically in sync would avoid F5::Cm::Cluster failures when deleting this resource unless something is really wrong.
By allowing deploying config resources types to a cluster allows to do not depend if a given BIG-IP unit is available or not: the config will be copied to one of the units that are available and eventually automatically sync'ed.

Also note that these two together would allow the lifecycle that orchestrators expect.

The caveat I find is that currently the LBaaS agent does not work with config sync and instead it currently feplicates the configuration to all the F5 nodes -- I wonder if this later behavior will be eventually changed since it is not a typical behavior of BIG-IPs.

F5Networks / f5-openstack-heat

deployment lifecycle: deleting #127