Closed jmg011 closed 2 years ago
@jmg011 There is a similiar issue reported about negative counters related to svm nfs v3 #762. Could you confirm the numbers reported in system manager if they are in millions or billions?
We have handled negative counters #1205 by changing negative counters to 0 for our upcoming release.
hi @jmg011 can you also share the ONTAP version and whether these are NFS v3, v4, or v4.1 shares?
@rahulguptajss When you say system manager you mean OCUM? Ocum reports in million. Prometheus scrapped Billion from Harvest exporter.
@cgrinds Version: NetApp Release 9.8P9 & NFS v3
Thanks for the ONTAP and NFS version information. Our suspicion is this is an ONTAP counter bug since we have a few customer reports of negative counters with NFS. The Harvest logic for handling NFS counters is the same as the other performance counters.
System Manager is the UI that can be used to manage a cluster.
E.g.
hi @jmg011
conf/zapiperf/cdot/9.8.0/nfsv3.yaml
template that captures this metric?bin/poller --promPort 19002 --poller $poller-name --collectors ZapiPerf --objects NFSv3 --loglevel 0 2>&1 | tee nfs.txt
Let that run for 30 minutes or so and then email the nfs.txt
file to ng-harvest-files@netapp.com
@cgrinds No changes to the template. I will start the poller in my dev to capture logs and will send it to the ng-harvest-files@netapp.com today
Thanks again for the log files @jmg011; they were very helpful. We're working on some improvements in this area and will ping you when they made it through CI and integration tests.
This issue is now fixed in main branch. Solution is to skip any negative counters or spikes generated due to this kind of data.
hi @jmg011 when you get a chance, could you grab nightly and see if our latest fix address your billions problem? Thanks!
verified negative counter logic in 22.11
Noticed a billion IOPs for an SVM with 10 nodes with svm_nfs_ops metric. Sometimes it also shows half a billion negative IOPs.
Running Latest Major Release for the Harvest
bin/harvest version harvest version 22.05.0-1 (commit 2bc2942) (build date 2022-05-11T07:57:16-0400) linux/amd64
1 day timeseries for svm_nfs_ops metric on a single SVM with 10 nodes. The spikes are billion IOPs.
Can you help check if it is Harvest Bug? OCUM shows 1 Million IOPs for the same duration for SVM when Prometheus shows 1B IOPs.