LLNL / lmt

Lustre Monitoring Tools
GNU General Public License v2.0
67 stars 21 forks source link

osc metric fails with message "lmtmetric: osc metric: No such file or directory" #30

Closed dagoodma closed 4 years ago

dagoodma commented 8 years ago

I'm having trouble getting lmtmetric to work with lustre 2.8.

# lmtmetric -m osc
lmtmetric: osc metric: No such file or directory

I see my OST names listed under /proc/fs/lustre/obdfilter, so I'm not sure what's wrong.

Note that lmtmetric -m ost and lmtmetric -m mdt seem to work. I haven't got a chance to try any of this on a lustre 2.7 system yet.

PS. I built lmt 3.2.2 from source on CentOS 6.8 (2.6.32-642.1.1.el6.x86_64).

ofaaland commented 7 years ago

Hi, can you run strace -e open lmtmetric -m osc and post the last 20 lines? Thanks

dagoodma commented 7 years ago

I ran strace with lmtmetric -m osc, and seems like I'm missing some libraries (like cerebro--though yum shows I have cerebro-1.18-1.x8664 installed). Note that I do have a /proc/fs/lustre_ directory, and osc/ within contains links that look correct.

18:52:37 # strace -e open lmtmetric -m osc open("/usr/lib64/mpich/lib/tls/x86_64/libcerebro.so.1", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/lib64/mpich/lib/tls/libcerebro.so.1", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/lib64/mpich/lib/x86_64/libcerebro.so.1", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/lib64/mpich/lib/libcerebro.so.1", O_RDONLY) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 open("/usr/lib64/libcerebro.so.1", O_RDONLY) = 3 open("/usr/lib64/mpich/lib/libcerebro_error.so.0", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/lib64/libcerebro_error.so.0", O_RDONLY) = 3 open("/usr/lib64/mpich/lib/liblua-5.1.so", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/lib64/liblua-5.1.so", O_RDONLY) = 3 open("/usr/lib64/mpich/lib/libm.so.6", O_RDONLY) = -1 ENOENT (No such file or directory) open("/lib64/libm.so.6", O_RDONLY) = 3 open("/usr/lib64/mpich/lib/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory) open("/lib64/libdl.so.2", O_RDONLY) = 3 open("/usr/lib64/mpich/lib/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory) open("/lib64/libc.so.6", O_RDONLY) = 3 open("/etc/lmt/lmt.conf", O_RDONLY) = 3 open("/etc/lmt/rwpasswd", O_RDONLY) = -1 ENOENT (No such file or directory) open("/proc/fs/lustre/osc", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 lmtmetric: osc metric: No such file or directory +++ exited with 0 +++

Looking for cerebro libraries... they do exist in /usr/lib64, but not in /usr/lib64/mpich/lib/tls. Same with lua-devel libraries.

ShijunDeng commented 7 years ago

@dagoodma,hello,I have the same problem,Did you solve it?

dagoodma commented 7 years ago

@ShijunDeng No. I haven't dug into it much, but I will update this issue thread if I make any progress.

mkgilbert commented 7 years ago

I'm having the same issue, but I'm not getting any ENOENT errors. In fact the output of the strace command is fairly minimal:

[root@mds1 ~]# strace -e open lmtmetric -m osc
open("/etc/ld.so.cache", O_RDONLY)      = 3
open("/usr/lib64/libcerebro.so.1", O_RDONLY) = 3
open("/usr/lib64/libcerebro_error.so.0", O_RDONLY) = 3
open("/usr/lib64/liblua-5.1.so", O_RDONLY) = 3
open("/lib64/libm.so.6", O_RDONLY)      = 3
open("/lib64/libdl.so.2", O_RDONLY)     = 3
open("/lib64/libc.so.6", O_RDONLY)      = 3
open("/proc/fs/lustre/osc", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
lmtmetric: osc metric: No such file or directory
+++ exited with 0 +++

Along with this, the node that is running the lmt-server package is getting a bunch of errors when attempting to connect to the database:

[root@tillit ~]# tail -f /var/log/messages
Dec 15 13:30:51 tillit /usr/sbin/cerebrod[34864]: strstr: boottime can't be found
Dec 15 13:30:52 tillit /usr/sbin/cerebrod[34864]: lmt_mysql: connected to database
Dec 15 13:30:52 tillit /usr/sbin/cerebrod[34864]: lmt_mysql: blizzard-OST0004: no database
Dec 15 13:30:52 tillit /usr/sbin/cerebrod[34864]: lmt_mysql: blizzard-OST0001: no database
Dec 15 13:30:53 tillit /usr/sbin/cerebrod[34864]: lmt_mysql: blizzard-OST0006: no database
Dec 15 13:30:54 tillit /usr/sbin/cerebrod[34864]: lmt_mysql: blizzard-OST0002: no database
Dec 15 13:30:55 tillit /usr/sbin/cerebrod[34864]: lmt_mysql: blizzard-OST0005: no database
Dec 15 13:30:55 tillit /usr/sbin/cerebrod[34864]: lmt_mysql: blizzard-OST0003: no database
Dec 15 13:30:56 tillit /usr/sbin/cerebrod[34864]: lmt_mysql: blizzard-OST0000: no database
Dec 15 13:30:56 tillit /usr/sbin/cerebrod[34864]: lmt_mysql: blizzard-MDT0000: no database
Dec 15 13:30:57 tillit /usr/sbin/cerebrod[34864]: lmt_mysql: blizzard-OST0004: no database

Does anyone have any idea what might be going on? By the way, we're running CentOS 6.8 and lmt 3.1.8 on Lustre 2.8.

6speedlt1 commented 7 years ago

It appears that the links in /proc/fs/lustre/osc/ are discarded in proc.c line 226. A change from if ((flag & PROC_READDIR_NOFILE) && d->d_type != DT_DIR)

to if ((flag & PROC_READDIR_NOFILE) && d->d_type != DT_DIR && d->d_type != DT_LNK) Makes lmtmetric return values but I don't know enough of lmt to say this is a safe fix.

MarkW

ofaaland commented 4 years ago

@6speedlt1 thanks for catching that.

6speedlt1 commented 4 years ago

Happy to help.

On Oct 28, 2019, at 12:30 AM, Olaf Faaland notifications@github.com wrote:

 @6speedlt1 thanks for catching that.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.