Uninett / nav

Network Administration Visualized
GNU General Public License v3.0
192 stars 39 forks source link

[BUG] statuscheck job fails for Netonix PoE switches - ipdevpoll.log shows error: psycopg2.errors.NotNullViolation: null value in column "power" violates not-null constraint #2017

Closed xorr0 closed 4 years ago

xorr0 commented 5 years ago

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/opt/venvs/nav/lib/python3.7/site-packages/nav/ipdevpoll/jobs.py", line 442, in _perform_save manager.save() File "/opt/venvs/nav/lib/python3.7/site-packages/nav/ipdevpoll/storage.py", line 87, in save obj.save(self.containers) File "/opt/venvs/nav/lib/python3.7/site-packages/nav/ipdevpoll/storage.py", line 482, in save obj.save() File "/opt/venvs/nav/lib/python3.7/site-packages/django/db/models/base.py", line 808, in save force_update=force_update, update_fields=update_fields) File "/opt/venvs/nav/lib/python3.7/site-packages/django/db/models/base.py", line 838, in save_base updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields) File "/opt/venvs/nav/lib/python3.7/site-packages/django/db/models/base.py", line 924, in _save_table result = self._do_insert(cls._base_manager, using, fields, update_pk, raw) File "/opt/venvs/nav/lib/python3.7/site-packages/django/db/models/base.py", line 963, in _do_insert using=using, raw=raw) File "/opt/venvs/nav/lib/python3.7/site-packages/django/db/models/manager.py", line 85, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/opt/venvs/nav/lib/python3.7/site-packages/django/db/models/query.py", line 1079, in _insert return query.get_compiler(using=using).execute_sql(return_id) File "/opt/venvs/nav/lib/python3.7/site-packages/django/db/models/sql/compiler.py", line 1112, in execute_sql cursor.execute(sql, params) File "/opt/venvs/nav/lib/python3.7/site-packages/django/db/backends/utils.py", line 64, in execute return self.cursor.execute(sql, params) django.db.utils.IntegrityError: null value in column "power" violates not-null constraint DETAIL: Failing row contains (23, 492, null, 1, 1, null).

2019-11-07 16:29:56,222 [ERROR jobs.jobhandler] [statuscheck 10.0.51.2] Save stage failed with unhandled error Traceback (most recent call last): File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner self.run() File "/usr/lib/python3.7/threading.py", line 865, in run self._target(*self._args, self._kwargs) File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/_threads/_threadworker.py", line 46, in work task() File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/_threads/_team.py", line 190, in doWork task() --- --- File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/python/threadpool.py", line 250, in inContext result = inContext.theWork() File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/python/threadpool.py", line 266, in inContext.theWork = lambda: context.call(ctx, func, *args, *kw) File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/python/context.py", line 122, in callWithContext return self.currentContext().callWithContext(ctx, func, args, kw) File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/python/context.py", line 85, in callWithContext return func(*args,kw) File "/opt/venvs/nav/lib/python3.7/site-packages/nav/ipdevpoll/db.py", line 100, in _reset return func(*args, *kwargs) File "/opt/venvs/nav/lib/python3.7/site-packages/nav/ipdevpoll/db.py", line 78, in _cleanup return func(args, kwargs) File "/opt/venvs/nav/lib/python3.7/site-packages/nav/ipdevpoll/jobs.py", line 397, in complete_save_cycle result = self._perform_save() File "/opt/venvs/nav/lib/python3.7/site-packages/nav/ipdevpoll/jobs.py", line 442, in _perform_save manager.save() File "/opt/venvs/nav/lib/python3.7/site-packages/nav/ipdevpoll/storage.py", line 87, in save obj.save(self.containers) File "/opt/venvs/nav/lib/python3.7/site-packages/nav/ipdevpoll/storage.py", line 482, in save obj.save() File "/opt/venvs/nav/lib/python3.7/site-packages/django/db/models/base.py", line 808, in save force_update=force_update, update_fields=update_fields) File "/opt/venvs/nav/lib/python3.7/site-packages/django/db/models/base.py", line 838, in save_base updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields) File "/opt/venvs/nav/lib/python3.7/site-packages/django/db/models/base.py", line 924, in _save_table result = self._do_insert(cls._base_manager, using, fields, update_pk, raw) File "/opt/venvs/nav/lib/python3.7/site-packages/django/db/models/base.py", line 963, in _do_insert using=using, raw=raw) File "/opt/venvs/nav/lib/python3.7/site-packages/django/db/models/manager.py", line 85, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/opt/venvs/nav/lib/python3.7/site-packages/django/db/models/query.py", line 1079, in _insert return query.get_compiler(using=using).execute_sql(return_id) File "/opt/venvs/nav/lib/python3.7/site-packages/django/db/models/sql/compiler.py", line 1112, in execute_sql cursor.execute(sql, params) File "/opt/venvs/nav/lib/python3.7/site-packages/django/db/backends/utils.py", line 64, in execute return self.cursor.execute(sql, params) File "/opt/venvs/nav/lib/python3.7/site-packages/django/db/utils.py", line 94, in exit six.reraise(dj_exc_type, dj_exc_value, traceback) File "/opt/venvs/nav/lib/python3.7/site-packages/django/utils/six.py", line 685, in reraise raise value.with_traceback(tb) File "/opt/venvs/nav/lib/python3.7/site-packages/django/db/backends/utils.py", line 64, in execute return self.cursor.execute(sql, params) django.db.utils.IntegrityError: null value in column "power" violates not-null constraint DETAIL: Failing row contains (23, 492, null, 1, 1, null).

2019-11-07 16:29:56,222 [ERROR jobs.jobhandler] [statuscheck 10.0.51.2] Job 'statuscheck' for 10.0.51.2 aborted: Job aborted due to save failure (cause=IntegrityError('null value in column "power" violates not-null constraint\nDETAIL: Failing row contains (23, 492, null, 1, 1, null).\n')) 2019-11-07 16:29:56,224 [INFO schedule.netboxjobscheduler] [statuscheck 10.0.51.2] statuscheck for 10.0.51.2 failed in 0:02:03.310821. next run in 0:04:59.999983.

- I only have 27 of these errors, with fresh VM boot from the new 5.0 release on 11/7/2019, uptime of my system is 6hours 45minutes.

root@navappliance:/var/log/nav# grep -i "psycopg2.errors.NotNullViolation: null value in column \"power\" violates not-null constraint" ipdevpoll.log | wc -l 27


**To Reproduce**
Steps to reproduce the behavior:
1. Exacly as I mentioned, import the device, wait for NAV to do it's thing, and it pops this error.
2. When I goto view the IP device at http://${NAV_IP_ADDRESS}/ipdevinfo/10.0.51.2/ I see that the *Statuscheck* job is listed as *Status: Overdue*, it never completes.

**Expected behavior**
- I hope to get the Statuscheck job to properly run with Netonix PoE switches, and clear this error.

**Environment (please complete the following information):**
- NAV version installed: 5.0
- Method of installation: pre-packaged Virtual Appliance, running on Virtual Box

I can provide any additional information required, let me know. Thank you for any assistance.
lunkwill42 commented 5 years ago

Well, it seems NAV did not get a pethMainPsePower value from the device. This value represents the nominal power of the PSE (Power Sourcing Equipment) in Watts.

It appears the code does not anticipate that this value would ever be missing once it actually gets information about a POE group from a device, but NAV's data model requires the value to be present. I'm not sure what the point of registering POE groups would be if there was no power data attached to it, which is why I guess null values aren't allowed in the model.

There are at leaste a couple of options for a solution, as I see it, but I'm not sure which one would be best - so maybe you have a suggestion:

If no power value could be retrieved from the device:

  1. Log a warning and just store the value as 0 in the database. or
  2. Log a warning and ignore the entire power group, as though it didn't exist.
xorr0 commented 5 years ago

hmmm, for this Netonix PoE switch it does log voltage:

[C:\~]$ snmpwalk -v:1 -c:<COMMUNITY> -r:10.0.51.2 -os:1.3.6.1.4.1.46241
SnmpWalk v1.01 - Copyright (C) 2009 SnmpSoft Company
[ More useful network tools on http://www.snmpsoft.com ]

OID=.1.3.6.1.4.1.46242.1.0, Type=OctetString, Value=1.3.8
OID=.1.3.6.1.4.1.46242.2.1.1.1, Type=Integer, Value=1
OID=.1.3.6.1.4.1.46242.2.1.1.2, Type=Integer, Value=2
OID=.1.3.6.1.4.1.46242.2.1.1.3, Type=Integer, Value=3
OID=.1.3.6.1.4.1.46242.2.1.2.1, Type=Integer, Value=5820
OID=.1.3.6.1.4.1.46242.2.1.2.2, Type=Integer, Value=1770
OID=.1.3.6.1.4.1.46242.2.1.2.3, Type=Integer, Value=5670
OID=.1.3.6.1.4.1.46242.3.1.1.1, Type=Integer, Value=1
OID=.1.3.6.1.4.1.46242.3.1.1.2, Type=Integer, Value=2
OID=.1.3.6.1.4.1.46242.3.1.1.3, Type=Integer, Value=3
OID=.1.3.6.1.4.1.46242.3.1.1.4, Type=Integer, Value=4
OID=.1.3.6.1.4.1.46242.3.1.2.1, Type=OctetString, Value=Board Temp
OID=.1.3.6.1.4.1.46242.3.1.2.2, Type=OctetString, Value=CPU Temp
OID=.1.3.6.1.4.1.46242.3.1.2.3, Type=OctetString, Value=PHY Temp
OID=.1.3.6.1.4.1.46242.3.1.2.4, Type=OctetString, Value=PHY2 Temp
OID=.1.3.6.1.4.1.46242.3.1.3.1, Type=Integer, Value=45
OID=.1.3.6.1.4.1.46242.3.1.3.2, Type=Integer, Value=63
OID=.1.3.6.1.4.1.46242.3.1.3.3, Type=Integer, Value=63
OID=.1.3.6.1.4.1.46242.3.1.3.4, Type=Integer, Value=63
OID=.1.3.6.1.4.1.46242.4.1.1.1, Type=Integer, Value=1
OID=.1.3.6.1.4.1.46242.4.1.1.2, Type=Integer, Value=2
OID=.1.3.6.1.4.1.46242.4.1.1.3, Type=Integer, Value=3
OID=.1.3.6.1.4.1.46242.4.1.1.4, Type=Integer, Value=4
OID=.1.3.6.1.4.1.46242.4.1.2.1, Type=OctetString, Value=Board 48V
OID=.1.3.6.1.4.1.46242.4.1.2.2, Type=OctetString, Value=Board 24V Ports 1-12
OID=.1.3.6.1.4.1.46242.4.1.2.3, Type=OctetString, Value=Board 24V Ports 13-24
OID=.1.3.6.1.4.1.46242.4.1.2.4, Type=OctetString, Value=Board 3V
OID=.1.3.6.1.4.1.46242.4.1.3.1, Type=Integer, Value=4960
OID=.1.3.6.1.4.1.46242.4.1.3.2, Type=Integer, Value=2420
OID=.1.3.6.1.4.1.46242.4.1.3.3, Type=Integer, Value=2420
OID=.1.3.6.1.4.1.46242.4.1.3.4, Type=Integer, Value=330
%Failed to get value of SNMP variable. Variable does not exist (noSuchName(2))
Total: 31

Thank you @lunkwill42 for your assistance.

lunkwill42 commented 5 years ago

hmmm, for this Netonix PoE switch it does log voltage:

[C:\~]$ snmpwalk -v:1 -c:<COMMUNITY> -r:10.0.51.2 -os:1.3.6.1.4.1.46241

Well, to be precise, NAV collects this data from POWER-ETHERNET-MIB::pethMainPseTable (.1.3.6.1.2.1.105.1.3.1), not from a Netonix proprietary MIB.

  • I am a bit newer to NAV, and want to learn is there a way I can easily just log the warning and store as 0? Or your other suggestion, ignore the entire power group for the Netonix PoE switches? I am versed in linux and python, just newer to NAV. Unfortunately, I am not sure the proper way to implement either of your suggestions, I would happy try either/or.

Being very specific, this appears to be the line that sets the power value from the collected MIB value:

https://github.com/Uninett/nav/blob/56e9d9a98705fdb8fcb7475b257860249ebc54fe/python/nav/ipdevpoll/plugins/poe.py#L73

I presume row['pethMainPsePower'] is None in this case. This could easily be translated to 0 and logged at this spot, I guess. Logging examples are found in the same code file.

However, it might also be that the missing value from the Netonix device could be correlated with the value of pethMainPseOperStatus, set on the code line above the one I quoted. Maybe the Netonix device doesn't return a power value if the power group is operationally off....

xorr0 commented 4 years ago

Thank you for the input, this helps out a good deal.