Open waddles opened 9 years ago
The MIB was 'thrown together' without much regard (but 'some') to what the values actually where, so I'm not overly surprised by this. I haven't been running it myself in a while, because I have stability issues because of 'load sensitivity' on my primary (bad SAS/SATA card/driver).
The Integer32
value on some/all of these needs to be updated with the factual size of the value. This require going through the code in ZFS/ZoL..
I'll see what I can do, but if you have concrete changes, feel free to open a pull request.
Ok so I changed the MIB to use Integer64 for the values in zfsPoolStatusTable but Net-SNMP still does not return them properly. Then I found this patch https://sourceforge.net/p/net-snmp/patches/737/ but it does not appear to have been applied. I am running Ubuntu Vivid (15.04) with Net-SNMP 5.7.2 but even latest upstream doesn't look like it handles it properly.
Then I don't know off-hand what to do :(
On a side note, I love the clean code in https://github.com/calmh/solaris-extra-snmp/blob/master/zfs-snmp although it depends on kstat and doesn't appear to keep persistency, but that could be fixed fairly easily.
I think a better way of getting the zpool usage (instead of using zpool iostat
then converting it to a somewhat rough estimate by multiplying by powers of 1024) is to use zfs list -p <pool>
# zpool iostat
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
data 3.74T 25.3T 41 129 4.46M 3.79M
# zfs list -p
NAME USED AVAIL REFER MOUNTPOINT
data 3577557345060 23702699187420 30260 /data
data/atlassian 2227441270 23702699187420 2227441270 /data/atlassian
data/backup 3570098684040 23702699187420 3570098684040 /data/backup
data/bamboo 234328990 23702699187420 234328990 /data/bamboo
data/confluence 1980329210 23702699187420 1980329210 /data/confluence
data/crowd 47012470 23702699187420 47012470 /data/crowd
data/jira 2189892170 23702699187420 2189892170 /data/jira
data/postgresql 434886040 23702699187420 434886040 /data/postgresql
data/stash 268084910 23702699187420 268084910 /data/stash
# zfs list -p data
NAME USED AVAIL REFER MOUNTPOINT
data 3577557345060 23702699187420 30260 /data
Total capacity is obviously the sum of all 3 values
That still don't help unfortunately. 3577557345060 + 23702699187420 + 30260 = 27280256562740
which is still much, much higher than the maximum value of a (unsigned) 32-bit int (which is 4,294,967,295
). The signed int is half that...
The maximum value of a (unsigned) 64-bit int is 18,446,744,073,709,551,615
(which would allow for 18445 petabyte :), which is plenty high. A signed 64-bit int is half that. Don't know what the Integer64
would be, signed or unsigned, but either way, that would do it. But if the snmpd doesn't support it, it's not much I can do :(
Discussion actually jogs some distant memories though. It feels like I've had this discussion with myself but couldn't solve it…
https://en.wikipedia.org/wiki/Integer_(computer_science)#Common_integral_data_types
I've been trying to do something about this in https://github.com/FransUrbo/snmp-modules/tree/int64_size-free, but it didn't work as I expected.
With this patch on the 5.7.3+dfsg-1
version, I got it to work. I'm currently trying to figure out how to implement this in the MIB.
I took your recommendation to use zfs get
to get the exact sizes, instead of the "human readable" values one gets from
I'm still trying to figure out how to fix the MIB. BUT, the code in the int64_size-free
branch will now correctly return a integer64
instead of a integer32
:
$ snmpget localhost zfsPoolSize zfsPoolSize
BAYOUR-COM-MIB::zfsPoolSize.1 = Opaque: Int64: 8256506880
BAYOUR-COM-MIB::zfsPoolSize.1 = Opaque: Int64: 8256506880
The fact that it returns a Opaque: Int64
and not a Integer64
is the current problem. Not quite sure how to fix that just yet. I have some test MIB entries in that branch, but they don't seem to be working. I think I'm roughly on the right track here. There's something about the https://tools.ietf.org/html/draft-perkins-bigint-00 I need to figure out.
https://tools.ietf.org/html/draft-perkins-opaque-01 might help you understand more.
Looking at that patch and the file it applies to, that section of code is all about unsigned longs which means it should be returning a type of ASN_OPAQUE_U64 and have a definition of 'Unsigned64'. That then leaves no 'integer64' (signed) in which case the #ifdef probably also needs another clause added to handle signed 64-bit integers. The implementation would be the same for all 3 if I'm not wrong.
The difference between Counter64, Integer64 and Unsigned64 is that Counters don't decrease and of course the interpretation of +/-. For our purposes we really want Unsigned64.
See also https://sourceforge.net/p/net-snmp/code/ci/1b4ca14972d39d61a93bb0e3e4eea76795bedb89/tree/include/net-snmp/library/asn1.h line 80 and onwards.
Tripple checking and actually LOOKING at the code more closely this time, you're probably right. Using a unsigned instead of signed in the code, because we don't need negative values,
In practice though, it shouldn't really matter right now. We can return a 9ZB value (instead of a 18ZB value with unsigned). That still isn't enough to account for the total size of a ZFS pool :). But it should be enough for almost everyone. For now. To be able to return the value of the maximum size of a ZFS pool (256ZB), we need a 128bit value!
However, although you're right in that, the problem is currently how to incorporate that into the MIB. I have added both a I64 and a U64, but neither work as expected.
But I'm starting to wonder if it matter if it returns a Integer64
instead of Opaque: Int64
. The value is what we need, not the type…
Could you try the int64_size-free
branch and see if it works for you?
I've taken your suggestions for net-snmp and walked (not ran :) with it - http://sourceforge.net/p/net-snmp/mailman/message/34291537/.
However, my two patches isn't included in the web archive for some reason.
https://gist.github.com/FransUrbo/a2bfee606ffda0b7b81e https://gist.github.com/FransUrbo/b891f94b1100f2a3b251
This gives me:
# for i in {1..6}; do snmpget localhost .1.3.6.1.4.1.22222.42.$i.0; done
SNMPv2-SMI::enterprises.22222.42.1.0 = INTEGER: 123456
SNMPv2-SMI::enterprises.22222.42.2.0 = Opaque: Int64: 9223372036854775806
SNMPv2-SMI::enterprises.22222.42.3.0 = Counter32: 123456
SNMPv2-SMI::enterprises.22222.42.4.0 = Counter64: 9223372036854775806
SNMPv2-SMI::enterprises.22222.42.5.0 = Gauge32: 4294967294
SNMPv2-SMI::enterprises.22222.42.6.0 = Opaque: UInt64: 18446744073709551614
which seems to just fine (except that instead of a UInt32
(or whatever it should have been), I get a Gauge32
). No biggie, but it looks strange...
Don't seem to need any special stuff in the MIB. Just made the zfsPoolSize
and zfsPoolSize
and Integer64
(although smiling
complains about this) and return a unsigned64
value from the agent and this all seems to be working just fine!
# snmpget localhost zfsPoolSize zfsPoolSize
BAYOUR-COM-MIB::zfsPoolSize.1 = Opaque: UInt64: 8256506880
BAYOUR-COM-MIB::zfsPoolSize.1 = Opaque: UInt64: 8256506880
# snmpwalk localhost zfsPoolStatusTable
BAYOUR-COM-MIB::zfsPoolStatusIndex.1 = INTEGER: 1
BAYOUR-COM-MIB::zfsPoolStatusIndex.2 = INTEGER: 2
BAYOUR-COM-MIB::zfsPoolName.1 = STRING: rpool
BAYOUR-COM-MIB::zfsPoolName.2 = STRING: rpool 2
BAYOUR-COM-MIB::zfsPoolGUID.1 = STRING: 4977845871582736322
BAYOUR-COM-MIB::zfsPoolGUID.2 = STRING: 3787144349319647945
BAYOUR-COM-MIB::zfsPoolSize.1 = Opaque: UInt64: 8256506880
BAYOUR-COM-MIB::zfsPoolSize.2 = Opaque: UInt64: 8256506880
BAYOUR-COM-MIB::zfsPoolAlloc.1 = INTEGER: 132096
BAYOUR-COM-MIB::zfsPoolAlloc.2 = INTEGER: 111616
BAYOUR-COM-MIB::zfsPoolFree.1 = Opaque: UInt64: 8256374784
BAYOUR-COM-MIB::zfsPoolFree.2 = Opaque: UInt64: 8256395264
BAYOUR-COM-MIB::zfsPoolCap.1 = INTEGER: 0
BAYOUR-COM-MIB::zfsPoolCap.2 = INTEGER: 0
BAYOUR-COM-MIB::zfsPoolDedup.1 = STRING: 1.00
BAYOUR-COM-MIB::zfsPoolDedup.2 = STRING: 1.00
BAYOUR-COM-MIB::zfsPoolHealth.1 = INTEGER: online(4)
BAYOUR-COM-MIB::zfsPoolHealth.2 = INTEGER: online(4)
BAYOUR-COM-MIB::zfsPoolAltRoot.1 = STRING: -
BAYOUR-COM-MIB::zfsPoolAltRoot.2 = STRING: -
BAYOUR-COM-MIB::zfsPoolUsedBySnaps.1 = INTEGER: 0
BAYOUR-COM-MIB::zfsPoolUsedBySnaps.2 = INTEGER: 0
BAYOUR-COM-MIB::zfsPoolUsed.1 = INTEGER: 282624
BAYOUR-COM-MIB::zfsPoolUsed.2 = INTEGER: 111616
# snmptable -CB localhost zfsPoolStatusTable
SNMP table: BAYOUR-COM-MIB::zfsPoolStatusTable
zfsPoolName zfsPoolGUID zfsPoolSize zfsPoolAlloc zfsPoolFree zfsPoolCap zfsPoolDedup zfsPoolHealth zfsPoolAltRoot zfsPoolUsedBySnaps zfsPoolUsed
rpool 4977845871582736322 8256506880 132096 8256374784 0 1.00 online - 0 282624
rpool 2 3787144349319647945 8256506880 111616 8256395264 0 1.00 online - 0 111616
#
zfsPoolAlloc
also needs to be a UInt64
, just-in-case...
Same code on a host that doesn't have a patched Net-SNMP:
# snmpget localhost zfsPoolSize zfsPoolSize zfsPoolAlloc
BAYOUR-COM-MIB::zfsPoolSize.1 = Gauge32: 3961545728
BAYOUR-COM-MIB::zfsPoolSize.1 = Gauge32: 3961545728
BAYOUR-COM-MIB::zfsPoolAlloc.1 = Gauge32: 51384320
# snmpwalk localhost zfsPoolStatusTable
BAYOUR-COM-MIB::zfsPoolStatusIndex.1 = INTEGER: 1
BAYOUR-COM-MIB::zfsPoolName.1 = STRING: rpool
BAYOUR-COM-MIB::zfsPoolGUID.1 = STRING: 11847949639043149139
BAYOUR-COM-MIB::zfsPoolSize.1 = Gauge32: 3961545728
BAYOUR-COM-MIB::zfsPoolAlloc.1 = Gauge32: 51384320
BAYOUR-COM-MIB::zfsPoolFree.1 = Gauge32: 3910161408
BAYOUR-COM-MIB::zfsPoolCap.1 = INTEGER: 0
BAYOUR-COM-MIB::zfsPoolDedup.1 = STRING: 1.00
BAYOUR-COM-MIB::zfsPoolHealth.1 = INTEGER: online(4)
BAYOUR-COM-MIB::zfsPoolAltRoot.1 = STRING: -
BAYOUR-COM-MIB::zfsPoolUsedBySnaps.1 = INTEGER: 0
BAYOUR-COM-MIB::zfsPoolUsed.1 = INTEGER: 153585254
# snmptable -CB localhost zfsPoolStatusTable
SNMP table: BAYOUR-COM-MIB::zfsPoolStatusTable
zfsPoolName zfsPoolGUID zfsPoolSize zfsPoolAlloc zfsPoolFree zfsPoolCap zfsPoolDedup zfsPoolHealth zfsPoolAltRoot zfsPoolUsedBySnaps zfsPoolUsed
rpool 11847949639043149139 3961545728 51384320 3910161408 0 1.00 online - 0 153585254
# zfs get -H -oproperty,value -p used,available,referenced rpool
used 51384320
available 8205103104
referenced 25600
# expr 51384320 + 8205103104 + 25600 ; echo 3961545728
8256513024
3961545728
# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
rpool 7.94G 49.1M 7.89G - - 0% 1.00x ONLINE -
Querying a unpatched server from a OSX Lion:
$ snmptable -CB unpatched-server zfsPoolStatusTable
SNMP table: BAYOUR-COM-MIB::zfsPoolStatusTable
zfsPoolName zfsPoolGUID zfsPoolSize zfsPoolAlloc zfsPoolFree zfsPoolCap zfsPoolDedup zfsPoolHealth zfsPoolAltRoot zfsPoolUsedBySnaps zfsPoolUsed
rpool 11847949639043149139 3961545728 51362816 3910182912 0 1.00 online - 0 153585254
And to the patched server:
$ snmptable -CB patched-server zfsPoolStatusTable
SNMP table: BAYOUR-COM-MIB::zfsPoolStatusTable
zfsPoolName zfsPoolGUID zfsPoolSize zfsPoolAlloc zfsPoolFree zfsPoolCap zfsPoolDedup zfsPoolHealth zfsPoolAltRoot zfsPoolUsedBySnaps zfsPoolUsed
rpool 4977845871582736322 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 1.00 online - 0 613376
rpool 2 3787144349319647945 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2D 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 1.00 online - 0 437248
So I guess the patch still needs some work. Or possibly the MIB.
Great work on developing these modules but I seem to be overflowing the 32bit counters for my zpool info:
Seems ok coming out of the perl script:
Any suggestions?