chexma / checkmk_plugins

1 stars 1 forks source link

After update E-Series NetApp the check failed #10

Open RobSwoss opened 1 week ago

RobSwoss commented 1 week ago

After updating our netapps to the version SANtricity OS 11.80.1R2 the check failed with the following error: [special_netappeseries] requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Details Traceback (most recent call last): File "/omd/sites/mon/lib/python3.12/site-packages/requests/models.py", line 971, in json return complexjson.loads(self.text, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/omd/sites/mon/lib/python3.12/json/init.py", line 346, in loads return _default_decoder.decode(s) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/omd/sites/mon/lib/python3.12/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/omd/sites/mon/lib/python3.12/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/omd/sites/mon/lib/python3/cmk/special_agents/v0_unstable/agent_common.py", line 149, in _special_agent_main_core return main_fn(args) ^^^^^^^^^^^^^ File "/omd/sites/mon/local/lib/python3/cmk/plugins/netapp_eseries/special_agents/agent_netappeseries.py", line 514, in agent_netapp_eseries_main fetch_storage_data(session, sections, args, base_url, controller_ids) File "/omd/sites/mon/local/lib/python3/cmk/plugins/netapp_eseries/special_agents/agent_netappeseries.py", line 160, in fetch_storage_data ).json() ^^^^^^ File "/omd/sites/mon/lib/python3.12/site-packages/requests/models.py", line 975, in json raise RequestsJSONDecodeError(e.msg, e.doc, e.pos) requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Thanks for your help. Robin

chexma commented 1 week ago

Hi Rob,

based on the given output i can´t see, what the cause is. As I have no systems on that version yet, i will have to download the simulator and try if I can reproduce the error. But that will take some time.

chexma commented 1 week ago

Hi Rob,

unfortunately the E-Series simulator is not available in the new version, i can´t test it that way. You can send me the agent output if the special agent with the --debug flag added, but i can´t promise, if i find the error that way.

exhaustivesolving commented 5 days ago

Hello chexma,

We have about 4x E-Series in production and this started popping up right after we upgraded to 11.80.1R2. I fired up a lab site, installed the extension and pointed it at the monitoring user on one of the e-series systems to get you info. Here is the debug output - please let me know if there is additional info I could provide that might help.

OMD[LAB220]:~$ cmk -nvvp --debug eseries_test
Checkmk version 2.2.0p35
+ FETCHING DATA
  Source: SourceInfo(hostname='eseries_test', ipaddress='192.168.1.10', ident='special_netappeseries', fetcher_type=<FetcherType.SPECIAL_AGENT: 6>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7fb84a6133d0]
Read from cache: AgentFileCache(eseries_test, path_template=/omd/sites/LAB220/tmp/check_mk/data_source_cache/special_netappeseries/{hostname}, max_age=MaxAge(checking=0, discovery=90.0, inventory=90.0), simulation=False, use_only_cache=False, file_cache_mode=6)
Not using cache (does not exist)
[ProgramFetcher] Execute data source
Calling: /omd/sites/LAB220/local/share/check_mk/agents/special/agent_netappeseries -u monitor -s '<REDACTED>' --sections batteries,controllers,drawers,drives,esms,fans,interfaces,pools,powerSupplies,system,thermalSensors,trays,volumes 192.168.1.10
[cpu_tracking] Stop [7fb84a6133d0 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.28, children_system=0.02, elapsed=0.41999999806284904))]
  Source: SourceInfo(hostname='eseries_test', ipaddress='192.168.1.10', ident='piggyback', fetcher_type=<FetcherType.PIGGYBACK: 4>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7fb84a2fb490]
Read from cache: NoCache(eseries_test, path_template=/dev/null, max_age=MaxAge(checking=0.0, discovery=0.0, inventory=0.0), simulation=False, use_only_cache=False, file_cache_mode=1)
[PiggybackFetcher] Execute data source
No piggyback files for 'eseries_test'. Skip processing.
No piggyback files for '192.168.1.10'. Skip processing.
[cpu_tracking] Stop [7fb84a2fb490 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
+ PARSE FETCHER RESULTS
  HostKey(hostname='eseries_test', source_type=<SourceType.HOST: 1>)  -> Not adding sections: Agent exited with code 1: Traceback (most recent call last):
  File "/omd/sites/LAB220/lib/python3.11/site-packages/requests/models.py", line 971, in json
    return complexjson.loads(self.text, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/LAB220/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/LAB220/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/LAB220/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/omd/sites/LAB220/local/share/check_mk/agents/special/agent_netappeseries", line 501, in <module>
    main()
  File "/omd/sites/LAB220/local/share/check_mk/agents/special/agent_netappeseries", line 497, in main
    fetch_storage_data(session, sections, args, base_url, controller_ids)
  File "/omd/sites/LAB220/local/share/check_mk/agents/special/agent_netappeseries", line 154, in fetch_storage_data
    verify=args.verify_ssl).json()
                            ^^^^^^
  File "/omd/sites/LAB220/lib/python3.11/site-packages/requests/models.py", line 975, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
  HostKey(hostname='eseries_test', source_type=<SourceType.HOST: 1>)  -> Add sections: []
Received no piggyback data
[cpu_tracking] Start [7fb849b242d0]
value store: synchronizing
Trying to acquire lock on /omd/sites/LAB220/tmp/check_mk/counters/eseries_test
Got lock on /omd/sites/LAB220/tmp/check_mk/counters/eseries_test
value store: loading from disk
Releasing lock on /omd/sites/LAB220/tmp/check_mk/counters/eseries_test
Released lock on /omd/sites/LAB220/tmp/check_mk/counters/eseries_test
No piggyback files for 'eseries_test'. Skip processing.
No piggyback files for '192.168.1.10'. Skip processing.
[cpu_tracking] Stop [7fb849b242d0 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.010000001639127731))]
[special_netappeseries] requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)(!!), [piggyback] Success (but no data found for this host), execution time 0.4 sec | execution_time=0.430 user_time=0.000 system_time=0.000 children_user_time=0.280 children_system_time=0.020 cmk_time_ds=0.120 cmk_time_agent=0.000
Agent exited with code 1: Traceback (most recent call last):
  File "/omd/sites/LAB220/lib/python3.11/site-packages/requests/models.py", line 971, in json
    return complexjson.loads(self.text, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/LAB220/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/LAB220/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/LAB220/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/omd/sites/LAB220/local/share/check_mk/agents/special/agent_netappeseries", line 501, in <module>
    main()
  File "/omd/sites/LAB220/local/share/check_mk/agents/special/agent_netappeseries", line 497, in main
    fetch_storage_data(session, sections, args, base_url, controller_ids)
  File "/omd/sites/LAB220/local/share/check_mk/agents/special/agent_netappeseries", line 154, in fetch_storage_data
    verify=args.verify_ssl).json()
                            ^^^^^^
  File "/omd/sites/LAB220/lib/python3.11/site-packages/requests/models.py", line 975, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)(!!)
exhaustivesolving commented 4 days ago

Update:

We've isolated the problem to the "controllers" section. As a workaround unselecting the "controllers" section allows the extension to function and monitor as normal in check_mk

Comparing:

            name="controllers",
            uri="/controllers",
            perfdata_uri="/analysed-controller-statistics",
            perfdata_identifier="controllerId",

with the web API on the device I can't find these URIs

The API methods: /storage-systems/{system-id}/analyzed/controller-statistics /storage-systems/{system-id}/controller-statistics/{idlist} (depricated)

seem to be related but perhaps the API endpoints changed on the netapp side - I am opening a ticket with netapp to get details on the changes since 11.80.1R2 either doesn't have release info or I'm failing to find it

chexma commented 3 days ago

@exhaustivesolving Wow, thanks for the analysis ! Yeah, the output seems to be changed in the netapp api, what should not happen with a versioned api in a minor upgrade.

chexma commented 3 days ago

As a side note :

https://kb.netapp.com/Support_Bulletins/Customer_Bulletins/SU570

Affected models • E-Series Systems: E2800, E5700, EF280, EF570, EF300, EF600 • StorageGRID Appliances: SGF6024, SG6060 and SG6160

Workaround For systems running any of the affected releases: • Do not upgrade drive firmware until a fix is available in SANtricity OS. OR • Perform offline drive firmware upgrade o For StorageGRID appliances, please visit this page for detailed instructions: <gelöscht>

chexma commented 3 days ago
        name="controllers",
        uri="/controllers",
        perfdata_uri="/analysed-controller-statistics",
        perfdata_identifier="controllerId",

Did they rename analysed to analyzyed-controller-statistics ? Maybe you can try to rewrite the perdata_uri to the new path.