amyasnikov / validity

NetBox plugin to validate network devices
MIT License
136 stars 8 forks source link

COMPLIANCE TESTS PROBLEM: sorry, too many clients already #89

Closed ArsenyPie closed 5 months ago

ArsenyPie commented 5 months ago

Hi. I'm having problems with the compliance tests. I use puller to poll devices, save configuration in Netbox data sources, and further test. But I have quite a lot of network devices, about 250 pieces. And when the compliance check is completed on some devices, I find a lack of configuration (for example, in the "Data Sources" section for a specific device London_r881.txt ). Instead of the expected "show run" configuration of my router, I see an exception:

PULLING ERROR OperationalError: connection failed: FATAL: sorry, too many clients already

I think the problem may be related to the fact that by default, Validity uses too many threads to poll devices (500 pieces) and Netbox simply does not have time to process so many responses. Is there any way to solve this problem without redefining the Threadpool class. I tried changing the timers in the Netmikko connector in the Puller section, but this did not solve the problem.

Thank you in advance for your help <3

amyasnikov commented 5 months ago

Hi @ArsenyPie, please specify the following:

  1. Python version
  2. NetBox version
  3. Validity version
  4. Full traceback of the error (from the terminal)
ArsenyPie commented 5 months ago

Python 3.11.4 Netbox 3.7.0 Validity 2.1.1

I have not found how to make a traceback of this problem using Netbox Docker. I looked at the logs of the main container with Netbox, but found nothing there(

amyasnikov commented 5 months ago

I'm afraid I can't help you without traceback. If the error happens during polling, you should check the logs of netbox-worker:

docker compose logs netbox-worker
ArsenyPie commented 5 months ago

I found the traceback! I get something like the following when I try to test 250 network devices:

new-netbox-docker-netbox-worker-1 | 05:20:57 default: extras.scripts.run_script(commit=True, data={'sync_datasources': True, 'make_report': True, 'selectors': <RestrictedQue..., job=, request=<utilities.utils.NetBoxFakeRequest object at 0x7f10bd25bbd0>) (f204fcbc-7963-48c7-936f-57bc142119d1) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | Skipping config initialization (database unavailable) new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | Failed to execute test qos_pre_classify_comparison for device Pribaykalye_ISR4331, ValueError: Cannot iterate over null (null) new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | Failed to execute test qos_pre_classify_comparison for device VES-ISR4331, ValueError: Cannot iterate over null (null) new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | Failed to execute test acl_comparison for device aiiskue_ps_klyuchi_r881 (6429), KeyError: 'access_lists' new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | Failed to execute test qos_pre_classify_comparison for device aoptu_r2911, ValueError: Cannot iterate over null (null) new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | Failed to execute test qos_pre_classify_comparison for device be-tts_r921, ValueError: Cannot iterate over null (null) new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | Failed to execute test qos_pre_classify_comparison for device burz_r2921, ValueError: Cannot iterate over null (null) new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | ttp.lazy_import_functions: failed to save cache at '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle', error '[Errno 13] Permission denied: '/opt/netbox/venv/lib/python3.11/site-packages/ttp/ttp_dict_cache.pickle'' new-netbox-docker-netbox-worker-1 | Failed to execute test qos_pre_classify_comparison for device ces_r921, ValueError: Cannot iterate over null (null)

I showed only a part of the logs, as there are a lot of them(

amyasnikov commented 5 months ago

I need traceback of the error, not just logs. Please try to find something like

 OperationalError: connection failed: FATAL: sorry, too many clients already
ArsenyPie commented 5 months ago

Unfortunately, I did not find such a thing(( Is there any way I can change the number of threads? And why did you choose 500 pieces?

amyasnikov commented 5 months ago
  1. Your error says that there are too many connections to PostgreSQL DB
  2. Number of threads cannot be larger than the number of polled devices. 500 is the maximum possible number.
  3. Briefly looking I don't see these threads performing any DB operations. But MAYBE they still create these connections somehow. This needs to be investigated.
amyasnikov commented 5 months ago

Hey @ArsenyPie. I've found the piece of code which caused DB connections inside poller threads and fixed it. Now polling does not create a DB connection. Despite this fact, I have also introduced polling_threads plugin setting for extra flexibility. You can try it in version 2.2.1

ArsenyPie commented 5 months ago

Thanks! Although I have already fixed this problem by simply increasing the number of database connections in postgresql.conf)