LarkIT / newrelic-nfsiostat

NFS IOSTAT Module for NewRelic
2 stars 3 forks source link

NFS data not displaying #6

Closed ghost closed 10 years ago

ghost commented 10 years ago

While trying to set this up on a centos 6 server (testing purposes before moving to production) I discovered that there was no data being sent to newrelic.

was working on CentOS release 6.4 (Final) Linux default-centos-64.vagrantup.com 2.6.32-358.el6.x86_64 #1 SMP Fri Feb 22 00:31:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Upon discussions in https://github.com/LarkIT/newrelic-nfsiostat/issues/3 I took the advice and enabled the debug option in the python code

origin error was:

errors  line 187, in _get_nfs_stat_for
    op_prefix + '/Operations[ops/second]': op_stat[0],
TypeError: 'NoneType' object is unsubscriptable

This lead me to believe something was being passed as None to op_stat and therefore erroring out.

After adding some print statements to the code I got this;

value of op is:   Access 
value of op_stat is:   [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]  
value of op is:   Lookup 
value of op_stat is:   [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]  
value of op is:   ReadDir  
value of op_stat is:   [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 
value of op is:   ReadDirPlus  
value of op_stat is:   None 

Noting that ReadDirPlus op has an op_state of None

To ensure i was on write track i removed the ReadDirPlus value from self.nfs_ops like this https://github.com/nukepuppy/newrelic-nfsiostat/commit/2d99b6b0efc13d6251c7e4b78554ae52a945b136

and this started to report..

here is output of the /proc/self/mountstats

device 127.0.0.1:/var/www mounted on /mnt/website with fstype nfs4 statvers=1.1
        opts:   rw,vers=4,rsize=65536,wsize=65536,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,minorversion=0,local_lock=none
        age:    30
        caps:   caps=0xffff,wtmult=512,dtsize=32768,bsize=0,namlen=255
        nfsv4:  bm0=0xfdffbfff,bm1=0xf9be3e,acl=0x3
        sec:    flavor=1,pseudoflavor=1
        events: 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        bytes:  0 0 0 0 0 0 0 0
        RPC iostats version: 1.0  p/v: 100003/4 (nfs)
        xprt:   tcp 853 0 1 0 24 21 21 0 20 0 2 0 0
        per-op statistics
                NULL: 0 0 0 0 0 0 0 0
                READ: 0 0 0 0 0 0 0 0
               WRITE: 0 0 0 0 0 0 0 0
              COMMIT: 0 0 0 0 0 0 0 0
                OPEN: 0 0 0 0 0 0 0 0
        OPEN_CONFIRM: 0 0 0 0 0 0 0 0
         OPEN_NOATTR: 0 0 0 0 0 0 0 0
        OPEN_DOWNGRADE: 0 0 0 0 0 0 0 0
               CLOSE: 0 0 0 0 0 0 0 0
             SETATTR: 0 0 0 0 0 0 0 0
              FSINFO: 1 1 0 164 108 0 0 0
               RENEW: 0 0 0 0 0 0 0 0
         SETCLIENTID: 0 0 0 0 0 0 0 0
        SETCLIENTID_CONFIRM: 0 0 0 0 0 0 0 0
                LOCK: 0 0 0 0 0 0 0 0
               LOCKT: 0 0 0 0 0 0 0 0
               LOCKU: 0 0 0 0 0 0 0 0
              ACCESS: 1 1 0 172 236 0 0 0
             GETATTR: 1 1 0 164 220 0 0 0
              LOOKUP: 1 1 0 180 268 0 0 0
         LOOKUP_ROOT: 0 0 0 0 0 0 0 0
              REMOVE: 0 0 0 0 0 0 0 0
              RENAME: 0 0 0 0 0 0 0 0
                LINK: 0 0 0 0 0 0 0 0
             SYMLINK: 0 0 0 0 0 0 0 0
              CREATE: 0 0 0 0 0 0 0 0
            PATHCONF: 1 1 0 160 72 0 0 0
              STATFS: 2 2 0 328 232 0 0 0
            READLINK: 0 0 0 0 0 0 0 0
             READDIR: 0 0 0 0 0 0 0 0
         SERVER_CAPS: 2 2 0 320 176 0 0 0
         DELEGRETURN: 0 0 0 0 0 0 0 0
              GETACL: 0 0 0 0 0 0 0 0
              SETACL: 0 0 0 0 0 0 0 0
        FS_LOCATIONS: 0 0 0 0 0 0 0 0
        RELEASE_LOCKOWNER: 0 0 0 0 0 0 0 0
             SECINFO: 0 0 0 0 0 0 0 0
         EXCHANGE_ID: 0 0 0 0 0 0 0 0
        CREATE_SESSION: 0 0 0 0 0 0 0 0
        DESTROY_SESSION: 0 0 0 0 0 0 0 0
            SEQUENCE: 0 0 0 0 0 0 0 0
        GET_LEASE_TIME: 0 0 0 0 0 0 0 0
        RECLAIM_COMPLETE: 0 0 0 0 0 0 0 0
           LAYOUTGET: 0 0 0 0 0 0 0 0
        GETDEVICEINFO: 0 0 0 0 0 0 0 0
        LAYOUTCOMMIT: 0 0 0 0 0 0 0 0
        LAYOUTRETURN: 0 0 0 0 0 0 0 0
ghost commented 10 years ago

so that led me to thinking more today..

The reason this doesn't work on this particular server is that it is using nfs v4.. which doesn't use readdirplus..

127.0.0.1:/var/www/ on /mnt/website type nfs (rw,vers=4,addr=127.0.0.1,clientaddr=127.0.0.1)

after making a test and editing /etc/nfsmount.conf and adding which will make it default Nfsvers=3

my test mount was now forced to v3 127.0.0.1:/var/www/ on /mnt/website type nfs (rw,nfsvers=3,addr=127.0.0.1)

and now i have readdirplus

device 127.0.0.1:/var/www/ mounted on /mnt/website with fstype nfs statvers=1.1
        opts:   rw,vers=3,rsize=65536,wsize=65536,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=127.0.0.1,mountvers=3,mountport=60709,mountproto=udp,local_lock=none
        age:    10
        caps:   caps=0x3fcf,wtmult=4096,dtsize=4096,bsize=0,namlen=255
        sec:    flavor=1,pseudoflavor=1
        events: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        bytes:  0 0 0 0 0 0 0 0
        RPC iostats version: 1.0  p/v: 100003/3 (nfs)
        xprt:   tcp 742 1 1 0 8 8 8 0 7 0 2 0 0
        per-op statistics
                NULL: 0 0 0 0 0 0 0 0
             GETATTR: 2 2 0 264 224 0 0 0
             SETATTR: 0 0 0 0 0 0 0 0
              LOOKUP: 0 0 0 0 0 0 0 0
              ACCESS: 0 0 0 0 0 0 0 0
            READLINK: 0 0 0 0 0 0 0 0
                READ: 0 0 0 0 0 0 0 0
               WRITE: 0 0 0 0 0 0 0 0
              CREATE: 0 0 0 0 0 0 0 0
               MKDIR: 0 0 0 0 0 0 0 0
             SYMLINK: 0 0 0 0 0 0 0 0
               MKNOD: 0 0 0 0 0 0 0 0
              REMOVE: 0 0 0 0 0 0 0 0
               RMDIR: 0 0 0 0 0 0 0 0
              RENAME: 0 0 0 0 0 0 0 0
                LINK: 0 0 0 0 0 0 0 0
             READDIR: 0 0 0 0 0 0 0 0
         READDIRPLUS: 0 0 0 0 0 0 0 0
              FSSTAT: 1 1 0 132 84 0 0 0
              FSINFO: 2 2 0 264 160 0 0 0
            PATHCONF: 1 1 0 132 56 0 0 0
              COMMIT: 0 0 0 0 0 0 0 0
ghost commented 10 years ago

My suggestions would definitely be some kind of detection of v3 vs v4 nfs mounts and just dont fail 'hard' when it is nfsv4 .. or just making the self.nfs_ops more dynamic

TJM commented 10 years ago

So, we need to try to iterate over the set of available stats instead of trying to force a list (as I have done). It may require some "browsing" of the nfsiostat code to see "how they handle it" ... if you want to take it on, feel free. We are working through some "other" issues at the moment, but I welcome improvement. We will be moving to NFSv4 soon, once the new filer is in place, as long as performance is reasonable. :) ... then this will become priority for me too.

Alternatively maybe we can use a "with" or "if op_stat" to get around the "None" issue as a temporary fix?

~tommy

TJM commented 10 years ago

Pulled in your changes temporarily to remove ReadDirPlus. As noted above, I would like to "detect" its presence somehow, but for now this makes it better. https://github.com/LarkIT/newrelic-nfsiostat/pull/9

TJM commented 10 years ago

Fixed in https://github.com/LarkIT/newrelic-nfsiostat/releases/tag/newrelic-nfsiostat-v0.2.3

jsdizon commented 9 years ago

Hi! Sorry for my comment. Im new to this plugin, I know this is closed but, I have installed version 2.4 and also tried version 2.5 on centos 6.6 but still no data appears on the graph in the new relic plugins tab. Can you please advise? Many Thanks!

jsdizon commented 9 years ago

I also got this error.

ERROR newrelicnfs.plugin:_get_nfs_stat_for: 'NoneType' object is unsubscriptable Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/newrelicnfs/plugin.py", line 191, in _get_nfs_stat_for op_prefix + '/Operations[ops/second]': op_stat[0], TypeError: 'NoneType' object is unsubscriptable

Thanks in advance!

ghost commented 9 years ago

jsdizon what version of NFS do you have? try forcing v3 see if you get the error and v4 etc

jsdizon commented 9 years ago

Thanks for approaching nukepuppy! Im using v4, I'll try to force it to v3. Thanks!

jsdizon commented 9 years ago

Hi nukepuppy! Thanks! It works! Im just curious on the v4? why it does not show any data? Many Thanks!!! :+1:

ghost commented 9 years ago

@jsdizon it would seem if you check cat /proc/self/mountstats when you mount in nfs v4 vs nfsv3 you will see that there is values that nfs v4 has that nfs v3 doesn't and vice versa... since the code right statically wants to push certain stats up - if it finds one it doesn't know about you get the error NoneType... basically saying this unknown value can't be scripted.

This is where in future it needs to be improved to 'scan' the values and then populate the charts based on what it found (instead of a static set of values which are nfs v3 specific)

hope it helps

TJM commented 9 years ago

For what its worth, we are forced to NFSv3 (for historical reasons), but we have had a patch (https://github.com/LarkIT/newrelic-nfsiostat/blob/master/src/plugin.py#L184-L187) that makes NFSv4 work. I am not sure why it would be missing operations / sec, unless it was like in the process of unmounting or something? (and I am pretty sure that case is accounted for). Note that the "bulk" of the code comes directly from "nfsiostat" (with very minor modifications). What does your nfsiostat report? (nfsiostat is in the nfs-utils package)

TJM commented 9 years ago

No worries... this was originally forked from https://github.com/jduncan-rva/newRHELic ... and modified to suit our needs. It probably needs overhauled. I am happy to take pull requests if you guys have any ideas. I think it still fails to start at boot time too, so beware. :)

Since your issue is separate from this one (since this one was fixed by @nukepuppy) , I would recommend creating a new issue, and pasting in output from cat /proc/self/mountstats and output of nfsiostat.

Tommy

TJM commented 9 years ago

I can't bend my brain in python ways right now, but it "feels like" we need something like

if op_stat is None
    continue

or nil or null or if ! op_stat or something? :)