CCI-MOC / hil

Hardware Isolation Layer, formerly Hardware as a Service
Apache License 2.0
24 stars 54 forks source link

Unable to delete a nic after a networking action on it #1039

Closed naved001 closed 5 years ago

naved001 commented 5 years ago

I can't seem to delete a nic on a node.

[naved@kzn-vbmi01res ~]$ hil node nic delete neu-3-9 nic1
[naved@kzn-vbmi01res ~]$ hil node nic delete neu-3-9 nic2
^[[A^[[A[naved@kzn-vbmi01res ~]$ hil node show neu-3-9
project   :  None
nics      :  [{u'port': None, u'switch': None, u'macaddr': u'00:0f:53:0d:f7:29', u'networks': {}, u'label': u'nic2'}, {u'port': None, u'switch': None, u'macaddr': u'00:0f:53:0d:f7:28', u'networks': {}, u'label': u'nic1'}]
name      :  neu-3-9
metadata      :  {}

Curl returns a 500 error. Before running this.

I then repeated this in my development environment. All you have to do is, register a nic, run a networking operation on it (connect to a network, or run revert port). And then try to delete it.

IntegrityError: (sqlite3.IntegrityError) NOT NULL constraint failed: networking_action.nic_id [SQL: u'UPDATE networking_action SET nic_id=? WHERE networking_action.id = ?'] [parameters: (None, 2)] (Background on this error at: http://sqlalche.me/e/gkpj)
INFO:werkzeug:127.0.0.1 - - [19/Sep/2018 17:19:30] "DELETE /v0/node/dell1/nic/nic1 HTTP/1.1" 500 -

Probably because we create an entry in the neworking_action table for querying later so we can't just delete the nic because the networking_action table refers to that nic_id.

It was easily fixed by clearing my networking_action table.

naved001 commented 5 years ago

Solution 1. https://github.com/CCI-MOC/hil/blob/79d724e1efb6a3e31666f73153ab20c10838741a/hil/model.py#L557

We change that to this:

backref=db.backref('current_action', uselist=False, cascade="all,delete"))

http://docs.sqlalchemy.org/en/rel_0_9/orm/cascades.html

Comes with a migration script.

Solution 2.

In the nic_delete call, manually check and delete for any entries in the networking_action table and then delete the nic. For instance we require the nic to be deleted (by a user) before a node can be deleted, rather than doing a cascade delete on node.

@zenhack thoughts?

zenhack commented 5 years ago

We should be refusing to delete a nic that has pending operations on it. But it shouldn't be throwing 500s, we should be detecting the issue and reporting it to the user cleanly.

naved001 commented 5 years ago

It's not just pending operations anymore. We keep the networking actions in the table and mark them DONE (or 'ERROR`) so that users can query the status later on.

naved001 commented 5 years ago

And we keep a networking action for that nic until a new action on that nic replaces it.

zenhack commented 5 years ago

Aha, then yeah, we should automatically delete the 'DONE' actions when the nic is removed. I'd prefer to do this via an explicit query, rather than a cascade, since that way we can filter on the status and it will still barf at us if there are pending actions and we somehow screw up and try to delete the nic.