Closed lars-t-hansen closed 7 months ago
Grubbing through the source and github for the users
library that we use for this, I find that it is no longer maintained (https://github.com/ogham/rust-users), plus it probably does not handle all possible errors as smoothly as it should. It does retry for ERANGE (not enough buffer space) but not ENOMEM (insufficient memory to allocate passwd structure), which could be transient, along with several other cases, see man page for getpwuid_r. Given the load this machine was under last night we could simply have run into a transient failure due to resource exhaustion.
There's a maintained fork of this library called uzers
(see https://github.com/ogham/rust-users/issues/54) but given our very simple needs and the MIT license of the code we might be better off just lifting what code we need and maintaining that ourselves.
I agree that we better lift out the part that we need, with attribution, for easier maintainability.
A couple of possible reasons why we can't map UID to username:
This is a new problem on ML6 - just before 2AM on 21 February a lot of python+perl processes were reported to use the CPU heavily and to be run by user
_noinfo_
, which means a UID that is not in the passwd database. In order to allow problems like this to be diagnosed we should log the UID as part of the_noinfo_
string.