Closed jmbredal closed 7 years ago
(by mbrekkevold) Hi, it would be very helpful if you could provide a full traceback from the logs, so we are able to locate the problematic code.
(by mbrekkevold) I have managed to reproduce the problem without a traceback. Your upgrade/reboot has likely resulted in new indexes being assigned to all the physical entities listed in the ENTITY-MIB::entPhysicalTable (and possibly a changed set of entities as well). The ipdevpoll code that tries to resolve predicted db integrity errors in advance may however fail under unforeseen circumstances.
Likely the whole resolve code would not be necessary if the entire NetboxEntity database update ran inside a single transaction. I have confirmed this on my side, but since I cannot be sure I have replicated your issue 100%, we won't know until you upgrade.
(by mbrekkevold) fix here: https://nav.uninett.no/hg/stable/rev/1a1c2c3f8899
(by einar-haraldseid) I have now upgraded to NAV 4.3.1 and can confirm that this has been fixed, there are no more errors in the log and the watchdog is back to green.
Translated changeset references: https://nav.uninett.no/hg/stable/rev/1a1c2c3f8899: a363c741199d10f507719798eed572d7a4eda079
We did a routine IOS upgrade from IOS Version 15.1(2)SY2 to 15.1(2)SY5 on one of our switches (gsw), and after that the jobs "inventory" and "statuscheck" has stopped working.
The switch is a Cisco C6807-XL, supervisor VS-SUP2T-10G
The ipdevpoll.log reports a lot of the following:
grep inventory:
2015-08-19 09:58:09,113 [WARNING plugins.uptime.uptime] [inventory gsw-hostname] Detected possible coldboot at 2015-08-17 19:03:37 2015-08-19 09:58:29,362 [ERROR jobs.jobhandler] [inventory gsw-hostname] Save stage failed with unhandled error 2015-08-19 09:58:29,362 [ERROR jobs.jobhandler] [inventory gsw-hostname] Job 'inventory' for gsw-hostname aborted: Job aborted due to save failure (cause=IntegrityError('duplicate key value violates unique constraint "netboxentity_netboxid_source_index_unique"\nDETAIL: Key (netboxid, source, index)=(111, ENTITY-MIB, -5000) already exists.\n',))
grep statuscheck:
2015-08-19 10:04:36,720 [INFO schedule.netboxjobscheduler] [statuscheck gsw-hostname] statuscheck for gsw-hostname failed in 0:00:02.904420. next run in 0:04:59.999959. 2015-08-19 10:09:41,234 [ERROR jobs.jobhandler] [statuscheck gsw-hostname] Save stage failed with unhandled error 2015-08-19 10:09:41,235 [ERROR jobs.jobhandler] [statuscheck gsw-hostname] Job 'statuscheck' for gsw-hostname aborted: Job aborted due to save failure (cause=IntegrityError('duplicate key value violates unique constraint "netboxentity_netboxid_source_index_unique"\nDETAIL: Key (netboxid, source, index)=(111, ENTITY-MIB, -5000) already exists.\n',))
Imported from Launchpad using lp2gh.