LLNL / lmt

Lustre Monitoring Tools
GNU General Public License v2.0
67 stars 21 forks source link

lustre 2.12.2 produces message lmtmetric: error reading lforge-OST0000 brw_stats: No such file or directory #39

Closed ldd91 closed 5 years ago

ldd91 commented 5 years ago

I use lmt 3.2.6 install in one of my lustre enviroment which version is 2.12.0,everything goes well,and ltop works,however i install lmt in another lustre which version is 2.12.2 and used Infiniband network,when i exec /usr/sbin/lmtmetric -m mdt it goes well .but when i exec /usr/sbin/lmtmetric -m ost in OSS server it shows lmtmetric: error reading lustre iblfs-OST0000: No such file or directory.I exec ltop in manager node it shows ltop: No live file system data found

ofaaland commented 5 years ago

Hi @ldd91 , I have not tested lmt 3.2.6 against Lustre 2.12.2, so something important may have changed. Is it possible for you to post the output on stderr of

strace -e open,stat,fstat lmtmetric -m ost

in the ticket?

ldd91 commented 5 years ago

This is the output

[root@atlantic-221 ~]# strace -e open,stat,fstat lmtmetric -m ost open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=54517, ...}) = 0 open("/lib64/tls/x86_64/libcerebro.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) stat("/lib64/tls/x86_64", 0x7ffed96f2860) = -1 ENOENT (No such file or directory) open("/lib64/tls/libcerebro.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) stat("/lib64/tls", {st_mode=S_IFDIR|0555, st_size=6, ...}) = 0 open("/lib64/x86_64/libcerebro.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) stat("/lib64/x86_64", 0x7ffed96f2860) = -1 ENOENT (No such file or directory) open("/lib64/libcerebro.so.1", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0755, st_size=91936, ...}) = 0 open("/lib64/tls/libcerebro_error.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/lib64/libcerebro_error.so.0", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0755, st_size=11368, ...}) = 0 open("/lib64/liblua-5.1.so", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0755, st_size=193864, ...}) = 0 open("/lib64/libm.so.6", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0755, st_size=1137016, ...}) = 0 open("/lib64/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0755, st_size=19288, ...}) = 0 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0755, st_size=2151672, ...}) = 0 stat("/sys/fs/lustre/obdfilter", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 stat("/etc/sysconfig/64bit_strstr_via_64bit_strstr_sse2_unaligned", 0x7ffed96e35b0) = -1 ENOENT (No such file or directory) stat("/sys/stat", 0x7ffed96e3e20) = -1 ENOENT (No such file or directory) open("/proc/stat", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 stat("/sys/meminfo", 0x7ffed96e3f10) = -1 ENOENT (No such file or directory) open("/proc/meminfo", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0000/uuid", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0000/uuid", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/version", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/version", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/version", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/version", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/version", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/version", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/version", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/version", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0000/stats", 0x7ffed96e2ec0) = -1 ENOENT (No such file or directory) open("/proc/fs/lustre/obdfilter/ib-lfs-OST0000/stats", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0000/brw_stats", 0x7ffed96e2e80) = -1 ENOENT (No such file or directory) lmtmetric: error reading lustre ib-lfs-OST0000 brw_stats: No such file or directory stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0000/filesfree", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0000/filesfree", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0000/filestotal", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0000/filestotal", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0000/kbytesfree", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0000/kbytesfree", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0000/kbytestotal", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0000/kbytestotal", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0000/num_exports", 0x7ffed96e2e90) = -1 ENOENT (No such file or directory) open("/proc/fs/lustre/obdfilter/ib-lfs-OST0000/num_exports", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 stat("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0000_UUID/lock_count", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0000_UUID/lock_count", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0000_UUID/pool/grant_rate", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0000_UUID/pool/grant_rate", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0000_UUID/pool/cancel_rate", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0000_UUID/pool/cancel_rate", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0000/recovery_status", 0x7ffed96e2ea0) = -1 ENOENT (No such file or directory) open("/proc/fs/lustre/obdfilter/ib-lfs-OST0000/recovery_status", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0002/uuid", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0002/uuid", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/version", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/version", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/version", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/version", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/version", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/version", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/version", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/version", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0002/stats", 0x7ffed96e2ec0) = -1 ENOENT (No such file or directory) open("/proc/fs/lustre/obdfilter/ib-lfs-OST0002/stats", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0002/brw_stats", 0x7ffed96e2e80) = -1 ENOENT (No such file or directory) lmtmetric: error reading lustre ib-lfs-OST0002 brw_stats: No such file or directory stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0002/filesfree", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0002/filesfree", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0002/filestotal", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0002/filestotal", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0002/kbytesfree", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0002/kbytesfree", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0002/kbytestotal", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0002/kbytestotal", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0002/num_exports", 0x7ffed96e2e90) = -1 ENOENT (No such file or directory) open("/proc/fs/lustre/obdfilter/ib-lfs-OST0002/num_exports", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 stat("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0002_UUID/lock_count", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0002_UUID/lock_count", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0002_UUID/pool/grant_rate", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0002_UUID/pool/grant_rate", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0002_UUID/pool/cancel_rate", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0002_UUID/pool/cancel_rate", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0002/recovery_status", 0x7ffed96e2ea0) = -1 ENOENT (No such file or directory) open("/proc/fs/lustre/obdfilter/ib-lfs-OST0002/recovery_status", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0004/uuid", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0004/uuid", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/version", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/version", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/version", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/version", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/version", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/version", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/version", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/version", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0004/stats", 0x7ffed96e2ec0) = -1 ENOENT (No such file or directory) open("/proc/fs/lustre/obdfilter/ib-lfs-OST0004/stats", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0004/brw_stats", 0x7ffed96e2e80) = -1 ENOENT (No such file or directory) lmtmetric: error reading lustre ib-lfs-OST0004 brw_stats: No such file or directory stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0004/filesfree", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0004/filesfree", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0004/filestotal", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0004/filestotal", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0004/kbytesfree", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0004/kbytesfree", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0004/kbytestotal", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0004/kbytestotal", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0004/num_exports", 0x7ffed96e2e90) = -1 ENOENT (No such file or directory) open("/proc/fs/lustre/obdfilter/ib-lfs-OST0004/num_exports", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 stat("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0004_UUID/lock_count", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0004_UUID/lock_count", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0004_UUID/pool/grant_rate", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0004_UUID/pool/grant_rate", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0004_UUID/pool/cancel_rate", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0004_UUID/pool/cancel_rate", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0004/recovery_status", 0x7ffed96e2ea0) = -1 ENOENT (No such file or directory) open("/proc/fs/lustre/obdfilter/ib-lfs-OST0004/recovery_status", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0006/uuid", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0006/uuid", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/version", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/version", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/version", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/version", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/version", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/version", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/version", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/version", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0006/stats", 0x7ffed96e2ec0) = -1 ENOENT (No such file or directory) open("/proc/fs/lustre/obdfilter/ib-lfs-OST0006/stats", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0006/brw_stats", 0x7ffed96e2e80) = -1 ENOENT (No such file or directory) lmtmetric: error reading lustre ib-lfs-OST0006 brw_stats: No such file or directory stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0006/filesfree", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0006/filesfree", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0006/filestotal", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0006/filestotal", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0006/kbytesfree", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0006/kbytesfree", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0006/kbytestotal", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0006/kbytestotal", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0006/num_exports", 0x7ffed96e2e90) = -1 ENOENT (No such file or directory) open("/proc/fs/lustre/obdfilter/ib-lfs-OST0006/num_exports", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 stat("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0006_UUID/lock_count", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0006_UUID/lock_count", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0006_UUID/pool/grant_rate", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0006_UUID/pool/grant_rate", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0006_UUID/pool/cancel_rate", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0006_UUID/pool/cancel_rate", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0006/recovery_status", 0x7ffed96e2ea0) = -1 ENOENT (No such file or directory) open("/proc/fs/lustre/obdfilter/ib-lfs-OST0006/recovery_status", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0008/uuid", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0008/uuid", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/version", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/version", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/version", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/version", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/version", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/version", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/version", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/version", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0008/stats", 0x7ffed96e2ec0) = -1 ENOENT (No such file or directory) open("/proc/fs/lustre/obdfilter/ib-lfs-OST0008/stats", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0008/brw_stats", 0x7ffed96e2e80) = -1 ENOENT (No such file or directory) lmtmetric: error reading lustre ib-lfs-OST0008 brw_stats: No such file or directory stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0008/filesfree", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0008/filesfree", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0008/filestotal", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0008/filestotal", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0008/kbytesfree", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0008/kbytesfree", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0008/kbytestotal", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/obdfilter/ib-lfs-OST0008/kbytestotal", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0008/num_exports", 0x7ffed96e2e90) = -1 ENOENT (No such file or directory) open("/proc/fs/lustre/obdfilter/ib-lfs-OST0008/num_exports", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 stat("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0008_UUID/lock_count", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0008_UUID/lock_count", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0008_UUID/pool/grant_rate", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0008_UUID/pool/grant_rate", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0008_UUID/pool/cancel_rate", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/fs/lustre/ldlm/namespaces/filter-ib-lfs-OST0008_UUID/pool/cancel_rate", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 stat("/sys/fs/lustre/obdfilter/ib-lfs-OST0008/recovery_status", 0x7ffed96e2ea0) = -1 ENOENT (No such file or directory) open("/proc/fs/lustre/obdfilter/ib-lfs-OST0008/recovery_status", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 ost: 2;atlantic-221.unisound.ai;0.568956;98.805783;ib-lfs-OST0000;89466705;89603584;90917613464;90983835132;649351938048;624168555032;0;4;72;0;0;0;0;COMPLETE 1/1 0s remaining;ib-lfs-OST0002;89466688;89603584;90936891912;90983835132;382725885952;252159491847;0;4;49;0;0;0;0;INACTIVE 0s remaining;ib-lfs-OST0004;89466701;89603584;90869858836;90983835132;368264495104;322765724959;0;4;61;0;0;0;0;INACTIVE 0s remaining;ib-lfs-OST0006;89466711;89603584;90916900060;90983835132;296974934016;1327194620002;0;4;58;0;0;0;0;INACTIVE 0s remaining;ib-lfs-OST0008;89466719;89603584;90863260060;90983835132;876284801024;675902515283;0;4;33;0;0;0;0;COMPLETE 0/1 0s remaining; +++ exited with 0 +++

ofaaland commented 5 years ago

Are you sure you didn't make a typo in your original testing?

I ask because it looks like this succeeded:

[root@atlantic-221 ~]# strace -e open,stat,fstat lmtmetric -m ost
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
<redacted>
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
ost: 2;atlantic-221.unisound.ai;0.568956;98.805783;ib-lfs-OST0000;89466705;89603584;90917613464;90983835132;649351938048;624168555032;0;4;72;0;0;0;0;COMPLETE 1/1 0s remaining;ib-lfs-OST0002;89466688;89603584;90936891912;90983835132;382725885952;252159491847;0;4;49;0;0;0;0;INACTIVE 0s remaining;ib-lfs-OST0004;89466701;89603584;90869858836;90983835132;368264495104;322765724959;0;4;61;0;0;0;0;INACTIVE 0s remaining;ib-lfs-OST0006;89466711;89603584;90916900060;90983835132;296974934016;1327194620002;0;4;58;0;0;0;0;INACTIVE 0s remaining;ib-lfs-OST0008;89466719;89603584;90863260060;90983835132;876284801024;675902515283;0;4;33;0;0;0;0;COMPLETE 0/1 0s remaining;
+++ exited with 0 +++

Note that the command exited with 0 (success) and shows output for 5 OST's - ib-lfs-OST000{0,2,4,6,8}

ldd91 commented 5 years ago

Yes,i didn't make a typo in my original testing,I exec the command 'lmtmetric -m ost' in oss server again and the output is same:lmtmetric: error reading ib-lfs-OST0000 brw_stats: No such file or directory :lmtmetric: error reading ib-lfs-OST0002 brw_stats: No such file or directory :lmtmetric: error reading ib-lfs-OST0004 brw_stats: No such file or directory :lmtmetric: error reading ib-lfs-OST0006 brw_stats: No such file or directory :lmtmetric: error reading ib-lfs-OST0008 brw_stats: No such file or directory

ofaaland commented 5 years ago

I do see the

lmtmetric: error reading ib-lfs-OST0000 brw_stats: No such file or directory

errors in your strace output, so I'll look at how to fix that.

I believe it's a separate problem from whatever resulted in

No live file system data found

when you ran ltop, but we can fix this issue and then see what happens with ltop.

ldd91 commented 5 years ago

Thank you for your timely reply,i am looking forward to your conclusion

ofaaland commented 5 years ago

I found the relevant code and the reason for the error message. There used to be a brw_strats procfile provided by a module named "obdfilter". This stats file was used to populate the "IOPS" column ltop shows. This stats file is not provided under Lustre 2.12.2 (I'm not sure about Lustre 2.12.0).

This is not a fatal error. Early versions of lustre did not provide those stats either, so lmt issues the error message but continues to gather the other stats and send them.

In addition, I set up a test system running Luste 2.12.2 and lmt 3.2.6, and ltop works properly even though I also see the message

lmtmetric: error reading lforge-OST0000 brw_stats: No such file or directory

Is it possible for you to run

lmt_metric -m ost

on all your OSS nodes, and

lmt_metric -m lmt_mdt

on all your MDS nodes, put all the output (including stdout and stderr) in a file, and attach them to the ticket? thanks.

ldd91 commented 5 years ago

hi ofaaland, I install LMT Manage Server in an server with an Ethernet card,and my lustre clustre all are using Infiniband,but they are in the same subnet.I don't know if that's going to cause this problem

ofaaland commented 5 years ago

On any of your nodes, you should be able to see cerebro sending the metric data, using tcpdump, like this:

tcpdump -i XXX | grep cerebro

where XXX is the name of the interface (ie eth0, or ib0) which you configured cerebro to use in /etc/cerebro.conf.

You should see messages like this:

08:32:50.783274 IP YYY.cerebro-send > 239.2.11.72.cerebro-recv: UDP, length 302
08:32:50.825049 IP YYY.cerebro-send > 239.2.11.72.cerebro-recv: UDP, length 72
08:32:51.029782 IP YYY.cerebro-send > 239.2.11.72.cerebro-recv: UDP, length 72
08:32:51.031473 IP YYY.cerebro-send > 239.2.11.72.cerebro-recv: UDP, length 420

where I've replaced my hostnames with YYY

Run this on one MDS node, one OSS node, and on the node with lmt-server installed. They should all see the same set of messages. If they don't, then your cerebro config or network config may be the problem.

ldd91 commented 5 years ago

Hi ofaaland,I run tcpdump -i ens192 | grep cerebro and it shows nothing,i think it is caused by network config

ofaaland commented 5 years ago

Hi @ldd91 none of them show anything? Or you see output only on the MDS and OSS nodes?

ldd91 commented 5 years ago

All of them show nothing

ofaaland commented 5 years ago

@ldd91 ,

All of them show nothing

That probably means that the interface you're monitoring with tcpdump is not the one cerebro is using. To see the address you have cerebro configured for, do this:

# grep cerebrod_speak_message_config /etc/cerebro.conf
cerebrod_speak_message_config 0.0.0.0 0 0 192.168.64.0/24

You can then grep for that address in your configured network interfaces

# ip addr | grep 192.168
    inet 192.168.64.1/24 brd 192.168.64.255 scope global eno1

And make sure that the address and netmask specified in cerebro.conf match the address and netmask of a configured interface.

And check to confirm the cerebrod service is running.

ofaaland commented 5 years ago

In any case, though, this is a cerebro configuration problem, not an LMT problem. So I'm going to close this issue and re-title it to reflect what we found. If you'd like more help, please create an issue at

https://github.com/chaos/cerebro

ofaaland commented 5 years ago

Lustre 2.12 no longer creates /proc/fs/lustre/obdfilter/brw_stats file. This causes the "file not found" error message. However the error is not fatal - it only causes the IOPS field not to be populated.