eudev-project / eudev

Repository for eudev development
GNU General Public License v2.0
521 stars 145 forks source link

inotify_add_watch(6, /dev/loop0p1, 10) failed: No such file or directory #279

Open martinetd opened 5 months ago

martinetd commented 5 months ago

Hello,

On our slow arm board I can reproduce this error message 100% with the following code:

# setup; create a "disk image" with a single partition
truncate -s 100M test; sgdisk --new 1:: test
# reproducer: create and remove loop device immediately
# (note it assumes there was no other loop device and hardcodes index, beware if you have others)
losetup -P -f test; losetup -d /dev/loop0

Which yields:

# losetup -P -f test; losetup -d /dev/loop0
# [  926.651754] udevd[2366]: inotify_add_watch(6, /dev/loop0p1, 10) failed: No such file or directory
[  926.662323] udevd[2366]: inotify_add_watch(6, /dev/loop0p1, 10) failed: No such file or directory

Looking at what happens, it's obvious that udev lost the race - it got netlink messages that devices were create and looking at strace output could even open it, but when it tries to add it to inotify later the device has already been removed by the kernel.

Our real code actually does a few checks on the device before removing the loop dev, but it happens fast enough that udev somewhat reliably lose the races and since it's broadcasted to consoles apparently scares out users a bit (think there is a problem when this would be safe to ignore). We can obviously make the script artificially wait a second there and that'd get rid of it 99% of the times but slow devices are slow and there is no guarantee that's enough (and the whole process is slow enough I don't really want to make it slower anyway)

I'm not sure what this inotify watch is for, but given the race I'd think we could just ignore ENOENT errors -- at least until I checked issues here and found #181 ; which is unrelated but also complains on ENOENT so we'd hide that problem if someone reproduces it...

I checked systemd and they apparently do not print that error anywhere I could see (tried with a test file with 95 partitions, and strace caught such ENODEV errors with no message in journal/kernel:

[pid 2082808] inotify_add_watch(8, "/dev/loop0p94", IN_CLOSE_WRITE <unfinished ...>
[pid 2082808] <... inotify_add_watch resumed>) = -1 ENOENT (No such file or directory)

) Looking at the code they're logging the error at debug mode since https://github.com/systemd/systemd/commit/691a596da15cb4171a86c5f95b30ad5ba91b6745 so perhaps we could do the same. (I'm not sure about the TOCTOU it describes, but lowering the level in itself is easy enough...)

What do you think?