TritonDataCenter / sdc-napi

Triton Network API: manages networking-related data
Mozilla Public License 2.0
4 stars 14 forks source link

ADMINUI Corrupts NAPI tables after modifying network parameters. #8

Closed sodre closed 9 years ago

sodre commented 9 years ago

Steps to reproduce:

  1. Create a network using the adminui and save it.
  2. Modify the network, e.g. add an additional DNS server.
  3. Hit save.
    • At this moment it seems NAPI stores a corrupted record to moray/manatee.
  4. AdminUI and sdc-napi loose the capability to list /networks.

To restore the system to functional level I had to delete the offending network from manatee itself, and add it back again through adminui.

timclassic commented 9 years ago

I observed the same issue, and the same fix was required. This was a new install of SDC7 release-20150416-20150420T172304Z-gdb8a7ee, updated to the latest platform and service images yesterday.

I looked into the napi VM and r was null while calling r.toString() on the following line:

https://github.com/joyent/sdc-napi/blob/master/lib/models/network.js#L1001

I thought I had saved the stacktrace but did not. I have since reimaged the headnode and compute nodes using release-20150430-20150430T143810Z-gb30be5a, so I cannot investigate further.

timclassic commented 9 years ago

Here is the stacktrace from napi's logs:

[2015-05-04T03:09:27.275Z] ERROR: napi/17559 on 0af48b42-6de3-4480-949b-5e4862c65062: Uncaught exception (req_id=f81fdcf0-f20a-11e4-b567-b5643afe5208)
    TypeError: Cannot call method 'toString' of null
        at /opt/smartdc/napi/lib/models/network.js:1001:22
        at Array.map (native)
        at Network.networkSerialize [as serialize] (/opt/smartdc/napi/lib/models/network.js:1000:47)
        at /opt/smartdc/napi/lib/models/network.js:1351:50
        at Array.forEach (native)
        at /opt/smartdc/napi/lib/models/network.js:1350:26
        at EventEmitter._endList (/opt/smartdc/napi/lib/apis/moray.js:285:16)
        at EventEmitter.emit (events.js:92:17)
        at EventEmitter.<anonymous> (/opt/smartdc/napi/node_modules/moray/lib/objects.js:197:13)
        at EventEmitter.g (events.js:180:16)

I just hit this issue again on a fresh deploy after changing the nameservers for my external network.

rgulewich commented 9 years ago

Fixed in be9d423 and 08e571b

dwlf commented 9 years ago

@sodre , @timclassic unfortunately, it will be about two weeks until the next release. If you are on the 'dev' channel, you can get get the fix now https://github.com/joyent/sdc/blob/master/docs/operator-guide/update.md#channels

timclassic commented 9 years ago

Thank you for the quick turnaround! I can certainly wait a couple of weeks. Fixing it manually wasn't terribly difficult, once I found @sodre's bug report.

pannon commented 9 years ago

@sodre @timclassic I am facing the same issue, what happens to instances attached to the affected network once you delete and re-add the network?

I guess the instances will need some extra attention?

sodre commented 9 years ago

Hi Peter,

I don’t remember anything bad happening, after I edited the database by hand and restarted NAPI things went back to normal.

Best, Patrick

On May 15, 2015, at 6:06 PM, Peter Toth notifications@github.com wrote:

@sodre https://github.com/sodre @timclassic https://github.com/timclassic I am facing the same issue, what happens to instances attached to the affected network once you delete and re-add the network?

I guess the instances will need some extra attention?

— Reply to this email directly or view it on GitHub https://github.com/joyent/sdc-napi/issues/8#issuecomment-102535432.

pannon commented 9 years ago

Thanks Patrick (@sodre), all went OK but instances attached to the re-created network don't show their interfaces under adminui. The IP is correctly reassigned tho even after a reboot.

Will probably need to hack the interfaces via vmadm for the affected machines and update the network UUID...

timclassic commented 9 years ago

Peter,

In the cases where I deleted and re-added the network via the UI, I took note of the corresponding UUIDs and restored them on the new DB rows. This kept the VMs themselves in sync with the network information, but I do not know if it had other ill effects. I would need to check the schema to remember which columns I restored (probably _key and I think there was another).

I ended up fixing the issue the second time by simply correcting the invalid JSON in the existing rows, which felt much safer.

-TimS

On Fri May 15 18:06:09 2015 Peter Toth notifications@github.com wrote:

@sodre @timclassic I am facing the same issue, what happens to instances attached to the affected network once you delete and re-add the network?

I guess the instances will need some extra attention?


Reply to this email directly or view it on GitHub: https://github.com/joyent/sdc-napi/issues/8#issuecomment-102535432

Tim Stewart tim@stoo.org

pannon commented 9 years ago

Thanks @timclassic followed the same procedure (some details in #11) all back to normal now.