darshan-hpc / darshan

Darshan I/O characterization tool
Other
55 stars 27 forks source link

Darshan-util fails with poorly formatted mount table #995

Open bellenlau opened 4 weeks ago

bellenlau commented 4 weeks ago

Hello,

darhsan-util clis fail with the following message, starting from darshan-parser

Error: poorly formatted mount table in darshan log file.

I am instrumenting an MPI application using intel-oneapi-mpi/2021.10.0 with darshan-runtime/3.4.5. The logfile is generated without error and the darshan-parser works on logfiles generated by instrumenting the same application using openmpi.

Thank you for your help,

Laura

shanedsnyder commented 3 weeks ago

Can you comment on how this relates to the other issue you opened (#994)? You mention in the other issue that some applications work fine with Darshan, but perhaps you ultimately ran into this issue when parsing? Or is this an entirely different application than the ones mentioned in the original issue?

bellenlau commented 3 weeks ago

Hello, this does not realte to the other issue. I am testing two different installations and applications. The software stack is always based on intel-oneapi-mpi, but in the case of this issue I installed darshan/3.4.5 manually and the application tested is a toy code doing some MPI-IO, not a scientific application. In the case of this issue, theres should be no mismatch between the mpi used to configure darshan-runtime and to compile the miniapp.

After further investigation, I noticed that the issue arise when I run the program on two different partitions of the cluster. The simulations are launched from the same area of the cluster, but run in compute nodes from different partitions. Any suggestion regarding how this could affect the measurement?

shanedsnyder commented 3 weeks ago

Okay, thanks for the details.

I can't think of anything obvious after having a closer look at the code that writes/reads that portion of the log file you're getting errors about (the mount table). That portion of the log file is a fixed length, so I was initially wondering if the error is somehow related to this fixed-length memory being exhausted. I hacked Darshan to use a much smaller size mount table log region, but it doesn't seem to ever give me any errors like you're getting.

Is it possible to share the output of the mount command on the nodes that Darshan gives this error on? I don't know how it would happen, but maybe it's formatted in a way that confuses Darshan.

bellenlau commented 3 weeks ago

Yes, please find the output attached. I can share also the darshan logfile if needed.

/dev/mapper/live-rw on / type ext4 (rw,relatime)
rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
none on /sys/kernel/tracing type tracefs (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
tracefs on /sys/kernel/debug/tracing type tracefs (rw,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=263690608k,nr_inodes=65922652,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
mqueue on /dev/mqueue type mqueue (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=40,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=13134)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,size=5242880k)
/dev/nvme0n1p1 on /nvme type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
10.128.82.1@o2ib:10.128.82.2@o2ib:/larchive/work on /leonardo_work type lustre (rw,flock,lazystatfs,noencrypt)
10.128.80.1@o2ib:10.128.80.2@o2ib:/lhomefs/leonardo on /leonardo type lustre (rw,flock,lazystatfs,noencrypt)
10.128.84.1@o2ib:10.128.84.2@o2ib:/lscratch/leonardo on /leonardo_scratch type lustre (rw,flock,lazystatfs,noencrypt)
10.128.80.1@o2ib:10.128.80.2@o2ib:/lhomefs/leonardo on /opt/cuda type lustre (rw,flock,lazystatfs,noencrypt)
10.128.80.1@o2ib:10.128.80.2@o2ib:/lhomefs/leonardo on /opt/intel type lustre (rw,flock,lazystatfs,noencrypt)
10.128.80.1@o2ib:10.128.80.2@o2ib:/lhomefs/leonardo on /etc/nhc type lustre (rw,flock,lazystatfs,noencrypt)
10.128.80.1@o2ib:10.128.80.2@o2ib:/lhomefs/leonardo on /home type lustre (rw,flock,lazystatfs,noencrypt)
/dev/nvme0n1p1 on /tmp type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
tmpfs on /dev/shm type tmpfs (rw,relatime)
shanedsnyder commented 2 weeks ago

Yeah, if you don't mind sharing the log file, that would be helpful, too. I don't see anything obvious in the above mount output, but maybe stepping through how it's parsing this info in a debugger will help me find out what's going on.

bellenlau commented 2 weeks ago

The logfile is attached, thank you for your help. darsh-s8000000000-np16.log

bellenlau commented 1 week ago

I add this piece of information: the report can be generated with Pydarshan; the same error pops up but the html does not seem to present issues.

Thank you,

Laura

carns commented 1 week ago

I found some confusing behavior when trying to reproduce this just now.

If I build Darshan by hand (from origin/main, though I don't think the version is really a factor) then I cannot trigger the problem:

carns-x1-11g ~/w/d/darshan-issue-995-mount-table> ~/working/install/bin/darshan-parser darsh-s8000000000-np16.darshan |head
# darshan log version: 3.41
# compression method: ZLIB
# exe: ./misterio.exe -n misterio.out -s 8000000000
# uid: 33678
# jobid: 1099726
# start_time: 1721201978
# start_time_asci: Wed Jul 17 03:39:38 2024
# end_time: 1721201984
# end_time_asci: Wed Jul 17 03:39:44 2024
# nprocs: 16

... but if I build the same version of Darshan (darshan-util@main) in Spack, then I get the error that you are reporting:

carns-x1-11g ~/w/d/darshan-issue-995-mount-table> darshan-parser darsh-s8000000000-np16.darshan |head
Error: poorly formatted mount table in darshan log file.

I haven't been able to work out what the key difference is, though. I've tried to put the spack-built libzlib-ng and libdarshan-util in the library path for my manual build, but they seem fine there. There is something subtle different about the environment that is causing a problem.

@bellenlau are you building in Spack as well? If so can you share what ldd of your darshan-parser looks like? Mine looks like this:

carns-x1-11g ~/w/d/darshan-issue-995-mount-table> which darshan-parser
/home/carns/working/src/spack/var/spack/environments/mochi-bedrock/.spack-env/view/bin/darshan-parser
carns-x1-11g ~/w/d/darshan-issue-995-mount-table> ldd /home/carns/working/src/spack/var/spack/environments/mochi-bedrock/.spack-env/view/bin/darshan-parser
    linux-vdso.so.1 (0x00007ffe065de000)
    libdarshan-util.so.0 => /home/carns/working/src/spack/opt/spack/linux-ubuntu24.04-skylake/gcc-13.2.0/darshan-util-main-dzh4l6vagqjy7hlfgymov7vctkbjozja/lib/libdarshan-util.so.0 (0x00007936e42a5000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007936e4000000)
    libz.so.1 => /home/carns/working/src/spack/opt/spack/linux-ubuntu24.04-skylake/gcc-13.2.0/zlib-ng-2.2.1-zqgjz4orlscjsrb2zc43q4y2oi4myktz/lib/libz.so.1 (0x00007936e4269000)
    /lib64/ld-linux-x86-64.so.2 (0x00007936e42d3000)
carns commented 1 week ago

I'm trying to catch the spack-built darshan-util in a debugger to see what the table looks like that it is trying to parse in darshan_log_get_mounts(), but I'm having a hard time getting a build out of it that includes debugging symbols without optimizations. I'm trying this but its obviously still getting -O2 so I can't figure out much:

spack add darshan-util@main cppflags="-g3 -O0" cflags="-g3 -O0"
bellenlau commented 1 week ago

Hello,

thank you for the insight. I checked with a manually installed darshan and indeed the issue does not arise. Here is the output of ldd

[lbellen1@login02 misterio]$ ldd /leonardo/pub/userinternal/lbellen1/spack-0.21.0-05/install/linux-rhel8-icelake/gcc-8.5.0/darshan-util-3.4.5-vibaclfthya6ury7q6omqgdinzdx35o4/bin/darshan-parser
        linux-vdso.so.1 (0x00007fffb8dcc000)
        libdarshan-util.so.0 => /leonardo/pub/userinternal/lbellen1/spack-0.21.0-05/install/linux-rhel8-icelake/gcc-8.5.0/darshan-util-3.4.5-vibaclfthya6ury7q6omqgdinzdx35o4/lib/libdarshan-util.so.0 (0x00007f1283813000)
        libbz2.so.1 => /lib64/libbz2.so.1 (0x00007f1283602000)
        libz.so.1 => /leonardo/prod/spack/05/install/0.21/linux-rhel8-icelake/gcc-8.5.0/zlib-ng-2.1.4-6htiapkoa6fx2medhyabzo575skozuir/lib/libz.so.1 (0x00007f12833dc000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f1283006000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f1283a35000)
bellenlau commented 1 week ago

Hello,

I tried building with spack but by forcing ^zlib instead of ^zlib-ng and the error does not arise; the issue seems to be triggered by zlib-ng. If you prefer I can report in on the spack github.

Thank you,

Laura

carns commented 1 week ago

Oh, interesting. Thanks for finding that!

Let's debug just a little further to see what exactly the difference is before we report a problem with zlib-ng. It's possible that darshan-utils has a subtle error (like with a string terminator or something) that just happens to work Ok with the original zlib. Or if not we can maybe make a simpler reproducer.

carns commented 6 days ago

FWIW here is a bug report on not being able to build spack packages with debug symbols: https://github.com/spack/spack/issues/45507

In this case I'd like to build darshan-util with debug symbols to confirm what the buffer looks in the Spack build, but either it's broken or I'm missing something.