[POSIX] Pass path to df(1) reference rather than device

matoro commented 11 months ago

Summary

OS: POSIX
Bug fix: no
Type: tests
Fixes:

Description

Otherwise this test fails for environments with active bind-mounts. For example, consider the following:

$ mount test.img testdir
$ mount --rbind /usr/src testdir/usr/src
$ chroot testdir /bin/bash

$ python3
Python 3.11.6 (main, Dec  5 2023, 11:03:00) [GCC 13.2.1 20230826] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import psutil, pprint
>>> pprint.pprint(psutil.disk_partitions(all=False))
[sdiskpart(device='/dev/loop0', mountpoint='/', fstype='ext4', opts='rw,relatime', maxfile=255, maxpath=4096),
 sdiskpart(device='/dev/loop0', mountpoint='/usr/src', fstype='ext4', opts='ro,noatime', maxfile=255, maxpath=4096)]
>>> pprint.pprint([psutil.disk_usage(x.mountpoint) for x in psutil.disk_partitions(all=False)])
[sdiskusage(total=20530814976, used=7411703808, free=12053757952, percent=38.1),
 sdiskusage(total=134679105536, used=42708791296, free=85081747456, percent=33.4)]
>>>

$ df / && df /usr/src
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/loop0      20049624 7237992  11771248  39% /
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/root      131522564 41707804  83087644  34% /usr/src

Note that psutil has the correct size data, but the wrong block device. If we pass the path to df(1) instead of the incorrect block device, then df(1) will return results from the correct filesystem.

giampaolo commented 11 months ago

Note that psutil has the correct size data, but the wrong block device.

Why is it wrong? (I don't mean related to this test, but in general) Do you think it should be changed? If yes, how?

matoro commented 11 months ago

Note that psutil has the correct size data, but the wrong block device.

Why is it wrong? (I don't mean related to this test, but in general) Do you think it should be changed? If yes, how?

Because /usr/src is not on /dev/loop0. I see that psutil is just passing along getmntent() here, I'm not sure why getmntent() is returning this wrong block device. /etc/mtab seems to have correct information:

$ grep "^/dev" /etc/mtab
/dev/loop0 / ext4 rw,relatime 0 0
/dev/root /usr/src ext4 ro,noatime 0 0

So maybe that is the correct thing to do? This SO answer says:

There's no portable way to programmatically get the list of mounted filesystems. (getmntent() gets fstab entries, which is not the same thing.)

But I'm not really sure.

giampaolo commented 11 months ago

Hmm, interesting. It must be noted that we use getmntent() on Linux, AIX and SunOS. On BSD* and macOS we use something else, so probably they are fine. This answer suggests that we may use the /sys filesystem to get the correct info.

I'm tempted not to merge your PR since the test failure reliably reproduces the problem, so it's good we have it. I'd rather try to fix the problem itself.

matoro commented 11 months ago

Hmm, interesting. It must be noted that we use getmntent() on Linux, AIX and SunOS. On BSD* and macOS we use something else, so probably they are fine. This answer suggests that we may use the /sys filesystem to get the correct info.

I'm tempted not to merge your PR since the test failure reliably reproduces the problem, so it's good we have it. I'd rather try to fix the problem itself.

That only tells us how to go from major/minor to block name, but what call would we use to go from mountpoint to major/minor? If we switch from getmntent() to something else, we also need a way to iterate over the mountpoints.

What about if we left getmntent() alone for everything except the block device (since it seems correct), and then parsed /etc/mtab for the block device?

Edit: This might not work on SunOS or HP-UX.

matoro commented 11 months ago

coreutils uses /proc/self/mountinfo on Linux: https://github.com/coreutils/gnulib/blob/master/lib/mountlist.c#L480-L481

This appears to have major/minor numbers available, AND true block device names.

grep "/dev" /proc/self/mountinfo
54 57 7:0 / / rw,relatime - ext4 /dev/loop0 rw
56 54 0:5 / /dev rw,nosuid,noexec,relatime - devtmpfs devtmpfs rw,size=10240k,nr_inodes=97648,mode=755
58 56 0:17 / /dev/mqueue rw,nosuid,nodev,noexec,relatime - mqueue mqueue rw
59 56 0:23 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=000
60 56 0:24 / /dev/shm rw,nosuid,nodev,noexec,relatime - tmpfs shm rw
67 54 254:2 /usr/src /usr/src ro,noatime - ext4 /dev/root rw

But this is again Linux-specific and not for other Unix.

matoro commented 11 months ago

After research here I have come to conclusion that the root of the issue is indeed using /proc/self/mounts instead of /proc/self/mountinfo and while not formally designated as such, the former is deprecated in favor of the latter, which is also strictly a superset of the former. mountinfo was introduced in 2008 specifically to solve this bind-mount problem. GNOME ran into this problem in 2009 for their disks utility and even downstream now considers mountinfo "the authoritative source to check your mounts". Golang uses this file to back their mountinfo package in their stdlib.

Therefore I believe the correct approach to this is to prioritize /proc/self/mountinfo (which needs a new parser due to its differing format) and keep the old /proc/self/mounts logic as a fallback.

giampaolo commented 11 months ago

Can you please open a new issue pointing to this one which contains a lot of useful info?

giampaolo / psutil

[POSIX] Pass path to df(1) reference rather than device #2344

Summary

Description