2022-01-13 20:47:37 UTC | CORE | ERROR | (pkg/collector/worker/check_logger.go:68 in Error) | check:ns1 | Error running check: [{"message": "429 Client Error: Too Many Requests for url: https://api.nsone.net/v1/zones/redacted1.com", "traceback": "Traceback (most recent call last):\n File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py\", line 1017, in run\n self.check(instance)\n File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/ns1/check.py\", line 48, in check\n checkUrl = self.create_url(self.metrics, self.query_params, self.networks)\n File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/ns1/check.py\", line 92, in create_url\n checkUrl.update(self.ns1.get_stats_url_usage(key, val, networknames))\n File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/ns1/ns1_url_utils.py\", line 61, in get_stats_url_usage\n records =self.check.get_zone_records(domain)\n File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/ns1/check.py\", line 140, in get_zone_records\n res = self.get_stats(url)\n File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/ns1/check.py\", line 395, in get_stats\n response.raise_for_status()\n File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/models.py\", line 943, in raise_for_status\n raise HTTPError(http_error_msg, response=self)\nrequests.exceptions.HTTPError: 429 Client Error: Too Many Requests for url: https://api.nsone.net/v1/zones/redacted1.com\n"}]
2022-01-13 20:47:37 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:56 in CheckFinished) | check:ns1 | Done running check
2022-01-13 20:47:53 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:37 in CheckStarted) | check:ns1 | Running check...
2022-01-13 20:47:53 UTC | CORE | INFO | (pkg/collector/python/datadog_agent.go:126 in LogMessage) | ns1:c04117e28d82671f | (check.py:42) | Startup
2022-01-13 20:48:20 UTC | CORE | ERROR | (pkg/collector/worker/check_logger.go:68 in Error) | check:ns1 | Error running check: [{"message": "429 Client Error: Too Many Requests for url: https://api.nsone.net/v1/zones/redacted2.com", "traceback": "Traceback (most recent call last):\n File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py\", line 1017, in run\n self.check(instance)\n File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/ns1/check.py\", line 48, in check\n checkUrl = self.create_url(self.metrics, self.query_params, self.networks)\n File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/ns1/check.py\", line 87, in create_url\n checkUrl.update(self.ns1.get_stats_url_qps(key,val))\n File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/ns1/ns1_url_utils.py\", line 136, in get_stats_url_qps\n records = self.check.get_zone_records(domain)\n File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/ns1/check.py\", line 140, in get_zone_records\n res = self.get_stats(url)\n File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/ns1/check.py\", line 395, in get_stats\n response.raise_for_status()\n File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/models.py\", line 943, in raise_for_status\n raise HTTPError(http_error_msg, response=self)\nrequests.exceptions.HTTPError: 429 Client Error: Too Many Requests for url: https://api.nsone.net/v1/zones/redacted2.com\n"}]
Steps to reproduce the issue:
Have a large number of zones & records
Let agent run and max out NS1 API queries
Agent will fail the lookup and continue to meet additional failures via API query limit
Describe the results you received:
When testing the agent against our org's zone list, we found that the module does not have a way to:
1) Control the rate of API queries made so the API query limit would not be met
2) Retry failed API queries
3) Exponentially back off on failed queries
Additional information you deem important (e.g. issue happens only occasionally):
Additionally, we think the application might pre-maturely abort querying a list of zones and records for stats when encountering a misconfiguration, e.g. a record that doesn't exist. Ideally, the entire configuration list would not be dropped at the first record that doesn't exist as dns records can often be added, changed, or removed.
Steps to reproduce the issue:
Describe the results you received: When testing the agent against our org's zone list, we found that the module does not have a way to: 1) Control the rate of API queries made so the API query limit would not be met 2) Retry failed API queries 3) Exponentially back off on failed queries
Describe the results you expected: Expected that that application would be aware of API query limits in NS1 ( https://help.ns1.com/hc/en-us/articles/360020250573-About-API-rate-limiting ) and work with a large zone/record list.
Additional information you deem important (e.g. issue happens only occasionally): Additionally, we think the application might pre-maturely abort querying a list of zones and records for stats when encountering a misconfiguration, e.g. a record that doesn't exist. Ideally, the entire configuration list would not be dropped at the first record that doesn't exist as dns records can often be added, changed, or removed.