Open RobSwoss opened 1 month ago
Hi Rob,
based on the given output i can´t see, what the cause is. As I have no systems on that version yet, i will have to download the simulator and try if I can reproduce the error. But that will take some time.
Hi Rob,
unfortunately the E-Series simulator is not available in the new version, i can´t test it that way. You can send me the agent output if the special agent with the --debug flag added, but i can´t promise, if i find the error that way.
Hello chexma,
We have about 4x E-Series in production and this started popping up right after we upgraded to 11.80.1R2. I fired up a lab site, installed the extension and pointed it at the monitoring user on one of the e-series systems to get you info. Here is the debug output - please let me know if there is additional info I could provide that might help.
OMD[LAB220]:~$ cmk -nvvp --debug eseries_test
Checkmk version 2.2.0p35
+ FETCHING DATA
Source: SourceInfo(hostname='eseries_test', ipaddress='192.168.1.10', ident='special_netappeseries', fetcher_type=<FetcherType.SPECIAL_AGENT: 6>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7fb84a6133d0]
Read from cache: AgentFileCache(eseries_test, path_template=/omd/sites/LAB220/tmp/check_mk/data_source_cache/special_netappeseries/{hostname}, max_age=MaxAge(checking=0, discovery=90.0, inventory=90.0), simulation=False, use_only_cache=False, file_cache_mode=6)
Not using cache (does not exist)
[ProgramFetcher] Execute data source
Calling: /omd/sites/LAB220/local/share/check_mk/agents/special/agent_netappeseries -u monitor -s '<REDACTED>' --sections batteries,controllers,drawers,drives,esms,fans,interfaces,pools,powerSupplies,system,thermalSensors,trays,volumes 192.168.1.10
[cpu_tracking] Stop [7fb84a6133d0 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.28, children_system=0.02, elapsed=0.41999999806284904))]
Source: SourceInfo(hostname='eseries_test', ipaddress='192.168.1.10', ident='piggyback', fetcher_type=<FetcherType.PIGGYBACK: 4>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7fb84a2fb490]
Read from cache: NoCache(eseries_test, path_template=/dev/null, max_age=MaxAge(checking=0.0, discovery=0.0, inventory=0.0), simulation=False, use_only_cache=False, file_cache_mode=1)
[PiggybackFetcher] Execute data source
No piggyback files for 'eseries_test'. Skip processing.
No piggyback files for '192.168.1.10'. Skip processing.
[cpu_tracking] Stop [7fb84a2fb490 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
+ PARSE FETCHER RESULTS
HostKey(hostname='eseries_test', source_type=<SourceType.HOST: 1>) -> Not adding sections: Agent exited with code 1: Traceback (most recent call last):
File "/omd/sites/LAB220/lib/python3.11/site-packages/requests/models.py", line 971, in json
return complexjson.loads(self.text, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/omd/sites/LAB220/lib/python3.11/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/omd/sites/LAB220/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/omd/sites/LAB220/lib/python3.11/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/omd/sites/LAB220/local/share/check_mk/agents/special/agent_netappeseries", line 501, in <module>
main()
File "/omd/sites/LAB220/local/share/check_mk/agents/special/agent_netappeseries", line 497, in main
fetch_storage_data(session, sections, args, base_url, controller_ids)
File "/omd/sites/LAB220/local/share/check_mk/agents/special/agent_netappeseries", line 154, in fetch_storage_data
verify=args.verify_ssl).json()
^^^^^^
File "/omd/sites/LAB220/lib/python3.11/site-packages/requests/models.py", line 975, in json
raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
HostKey(hostname='eseries_test', source_type=<SourceType.HOST: 1>) -> Add sections: []
Received no piggyback data
[cpu_tracking] Start [7fb849b242d0]
value store: synchronizing
Trying to acquire lock on /omd/sites/LAB220/tmp/check_mk/counters/eseries_test
Got lock on /omd/sites/LAB220/tmp/check_mk/counters/eseries_test
value store: loading from disk
Releasing lock on /omd/sites/LAB220/tmp/check_mk/counters/eseries_test
Released lock on /omd/sites/LAB220/tmp/check_mk/counters/eseries_test
No piggyback files for 'eseries_test'. Skip processing.
No piggyback files for '192.168.1.10'. Skip processing.
[cpu_tracking] Stop [7fb849b242d0 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.010000001639127731))]
[special_netappeseries] requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)(!!), [piggyback] Success (but no data found for this host), execution time 0.4 sec | execution_time=0.430 user_time=0.000 system_time=0.000 children_user_time=0.280 children_system_time=0.020 cmk_time_ds=0.120 cmk_time_agent=0.000
Agent exited with code 1: Traceback (most recent call last):
File "/omd/sites/LAB220/lib/python3.11/site-packages/requests/models.py", line 971, in json
return complexjson.loads(self.text, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/omd/sites/LAB220/lib/python3.11/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/omd/sites/LAB220/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/omd/sites/LAB220/lib/python3.11/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/omd/sites/LAB220/local/share/check_mk/agents/special/agent_netappeseries", line 501, in <module>
main()
File "/omd/sites/LAB220/local/share/check_mk/agents/special/agent_netappeseries", line 497, in main
fetch_storage_data(session, sections, args, base_url, controller_ids)
File "/omd/sites/LAB220/local/share/check_mk/agents/special/agent_netappeseries", line 154, in fetch_storage_data
verify=args.verify_ssl).json()
^^^^^^
File "/omd/sites/LAB220/lib/python3.11/site-packages/requests/models.py", line 975, in json
raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)(!!)
Update:
We've isolated the problem to the "controllers" section. As a workaround unselecting the "controllers" section allows the extension to function and monitor as normal in check_mk
Comparing:
name="controllers",
uri="/controllers",
perfdata_uri="/analysed-controller-statistics",
perfdata_identifier="controllerId",
with the web API on the device I can't find these URIs
The API methods:
/storage-systems/{system-id}/analyzed/controller-statistics
/storage-systems/{system-id}/controller-statistics/{idlist}
(depricated)
seem to be related but perhaps the API endpoints changed on the netapp side - I am opening a ticket with netapp to get details on the changes since 11.80.1R2 either doesn't have release info or I'm failing to find it
@exhaustivesolving Wow, thanks for the analysis ! Yeah, the output seems to be changed in the netapp api, what should not happen with a versioned api in a minor upgrade.
As a side note :
https://kb.netapp.com/Support_Bulletins/Customer_Bulletins/SU570
Affected models • E-Series Systems: E2800, E5700, EF280, EF570, EF300, EF600 • StorageGRID Appliances: SGF6024, SG6060 and SG6160
Workaround For systems running any of the affected releases: • Do not upgrade drive firmware until a fix is available in SANtricity OS. OR • Perform offline drive firmware upgrade o For StorageGRID appliances, please visit this page for detailed instructions: <gelöscht>
name="controllers",
uri="/controllers",
perfdata_uri="/analysed-controller-statistics",
perfdata_identifier="controllerId",
Did they rename analysed to analyzyed-controller-statistics ? Maybe you can try to rewrite the perdata_uri to the new path.
Maybe you can try to rewrite the perdata_uri to the new path.
we are affected from this issue, too. when i disable controller check, the error goes away. i'd like to test rewriting the url but i don't get it... rewrite to what?
it looks like this:
name="controllers", uri="/controllers", perfdata_uri="/analysed-controller-statistics", perfdata_identifier="controllerId",
Hi,
unfortunately I have no system running on that firmware yet. You can try to change perfdata_uri to /analyzed/controller-statistics
If someone has the chance to fetch the API of the analyzed controller statistics data with e.g. postman, I could try to fix the problem without direct access.
Hi,
changed the url but did not work. But found API doc :) Executing this: curl -X GET "https://HOSTNAME/devmgr/v2/storage-systems/1/analyzed/controller-statistics?statisticsFetchTime=60" -H "accept: application/json"
returns this:
{
"statistics": [
{
"observedTime": "2024-10-31T07:56:04.000+00:00",
"observedTimeInMS": "1730361364000",
"sourceController": "CONTROLLERID",
"readIOps": 37.24333333333333,
"writeIOps": 20.243333333333332,
"otherIOps": 0,
"combinedIOps": 57.486666666666665,
"readThroughput": 3.9289347330729165,
"writeThroughput": 0.12590726216634116,
"combinedThroughput": 4.054841995239258,
"readResponseTime": 13.0910230516201,
"readResponseTimeStdDev": 165.25260443364118,
"writeResponseTime": 0.06098923608202972,
"writeResponseTimeStdDev": 0.6451068423419701,
"combinedResponseTime": 6.680828394941147,
"combinedResponseTimeStdDev": 156.00596177799304,
"averageReadOpSize": 110618.09719860378,
"averageWriteOpSize": 6521.81788243043,
"readOps": 11173,
"writeOps": 6073,
"readPhysicalIOps": 37.769999999999996,
"writePhysicalIOps": 19.930000000000007,
"controllerId": "CONTROLLERID",
"cacheHitBytesPercent": 1.5127675037219444,
"randomIosPercent": 35.5233400985793,
"mirrorBytesPercent": 0,
"fullStripeWritesBytesPercent": 0,
"maxCpuUtilization": 38,
"maxCpuUtilizationPerCore": [
38
],
"cpuAvgUtilization": 37.18333333333333,
"cpuAvgUtilizationPerCore": [
37.18333333333333
],
"cpuAvgUtilizationPerCoreStdDev": [
0.3869395588750369
],
"raid0BytesPercent": 0,
"raid1BytesPercent": 0,
"raid5BytesPercent": 0,
"raid6BytesPercent": 0,
"ddpBytesPercent": 3.1051089614383836,
"readHitResponseTime": 0.0025785714285714283,
"readHitResponseTimeStdDev": 0.002450188273240292,
"writeHitResponseTime": 0.06098923608202972,
"writeHitResponseTimeStdDev": 0.06098923608202972,
"combinedHitResponseTime": 0.06048610153885449,
"combinedHitResponseTimeStdDev": 0.0604858827594107,
"maxPossibleBpsUnderCurrentLoad": 4847683500,
"maxPossibleIopsUnderCurrentLoad": 216905
}
],
"tokenId": null
}
maybe structure of response or parameters changed? statisticsFetchTime is required field: "The number of seconds of historical statistics data to retrieve. After the initial query has started (a token has been provided), this value is ignored."
After updating our netapps to the version SANtricity OS 11.80.1R2 the check failed with the following error: [special_netappeseries] requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Details Traceback (most recent call last): File "/omd/sites/mon/lib/python3.12/site-packages/requests/models.py", line 971, in json return complexjson.loads(self.text, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/omd/sites/mon/lib/python3.12/json/init.py", line 346, in loads return _default_decoder.decode(s) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/omd/sites/mon/lib/python3.12/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/omd/sites/mon/lib/python3.12/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/omd/sites/mon/lib/python3/cmk/special_agents/v0_unstable/agent_common.py", line 149, in _special_agent_main_core return main_fn(args) ^^^^^^^^^^^^^ File "/omd/sites/mon/local/lib/python3/cmk/plugins/netapp_eseries/special_agents/agent_netappeseries.py", line 514, in agent_netapp_eseries_main fetch_storage_data(session, sections, args, base_url, controller_ids) File "/omd/sites/mon/local/lib/python3/cmk/plugins/netapp_eseries/special_agents/agent_netappeseries.py", line 160, in fetch_storage_data ).json() ^^^^^^ File "/omd/sites/mon/lib/python3.12/site-packages/requests/models.py", line 975, in json raise RequestsJSONDecodeError(e.msg, e.doc, e.pos) requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Thanks for your help. Robin