Forgive me if this is a stupid question, but what is the motivation for using mmap in groot/riofs to read files, instead of just a plain os.Open? A side effect of using mmap is that the resident set size in memory to keep growing, if the system isn't under enough pressure to drop unused pages. AFAIK this is normally a harmless accounting error, but it may become important if RSS is monitored and used for anything.
Context:
I was just running some microbenchmarks comparing ROOT I/O to uproot to groot, and I noticed higher than expected RSS on the groot side. This reminded me that in https://github.com/go-hep/hep/issues/885 there was a test commit (https://github.com/sbinet-hep/hep/commit/839e08fd377b3ab43a7f838e62088a9f08157852) where os.Open was used. Testing it out again, this reduces the RSS as I would naive expect, and it doesn't seem to cause any performance regressions, but linux's disk I/O cache makes this annoyingly hard to test properly. I thought I should just ask before I get too carried away trying to test it, in case I'm overlooking some need to use mmap here.
My theoretical concern is that high RSS may cause monitoring tools (like the ones used in batch systems at HEP computing sites) to incorrectly flag the program as being over its memory budget, and preemptively kill it, before the system is under enough pressure for the mmaped pages to be dropped. Or reschedule it to a slot with a needlessly large memory allocation, leading to a less efficient use of computing resource (in a way which may be hard to detect). Using os.Open for regular file access, and letting the OS disk I/O cache speed up subsequent reads, may avoid triggering false positives in such cases. That being said, most batch systems probably rely on network I/O, which ideally wouldn't be mmaped (if files aren't be accessed through a FUSE mount or something), so this example may be a moot point.
Forgive me if this is a stupid question, but what is the motivation for using
mmap
ingroot/riofs
to read files, instead of just a plainos.Open
? A side effect of usingmmap
is that the resident set size in memory to keep growing, if the system isn't under enough pressure to drop unused pages. AFAIK this is normally a harmless accounting error, but it may become important if RSS is monitored and used for anything.Context: I was just running some microbenchmarks comparing
ROOT
I/O touproot
togroot
, and I noticed higher than expected RSS on thegroot
side. This reminded me that in https://github.com/go-hep/hep/issues/885 there was a test commit (https://github.com/sbinet-hep/hep/commit/839e08fd377b3ab43a7f838e62088a9f08157852) whereos.Open
was used. Testing it out again, this reduces the RSS as I would naive expect, and it doesn't seem to cause any performance regressions, but linux's disk I/O cache makes this annoyingly hard to test properly. I thought I should just ask before I get too carried away trying to test it, in case I'm overlooking some need to usemmap
here.My theoretical concern is that high RSS may cause monitoring tools (like the ones used in batch systems at HEP computing sites) to incorrectly flag the program as being over its memory budget, and preemptively kill it, before the system is under enough pressure for the
mmap
ed pages to be dropped. Or reschedule it to a slot with a needlessly large memory allocation, leading to a less efficient use of computing resource (in a way which may be hard to detect). Usingos.Open
for regular file access, and letting the OS disk I/O cache speed up subsequent reads, may avoid triggering false positives in such cases. That being said, most batch systems probably rely on network I/O, which ideally wouldn't bemmap
ed (if files aren't be accessed through a FUSE mount or something), so this example may be a moot point.