Open kwitaszczyk opened 5 months ago
sort
's got several realloc loops which grow the allocation by a fixed number of entries. cscope generates a decently large input file (230MB), so sort
calls realloc() many times. This interacts poorly with MRS, which implements realloc() by always allocating a new buffer, copying, and freeing the old one. In particular, the quarantine ends up growing very quickly and triggers many revocation scans.
This behaviour in sort
dates back to the original implementation, I guess it's just fast enough that no one's really tried to tune it. Changing all of the loops to grow the allocation size by a factor of 2 rather than just adding 1024 or 128 new entries lets sort
finish in a reasonable amount of time. With that change on my morello system, a make -C sys cscope CSCOPE_ARCHDIR=arm64
takes:
We should try to find out how much of this overhead is caused by MRS having a pessimal realloc() implementation.
For this release, I suppose we could simply commit these changes to sort
. I think they are low-risk.
A couple of experiments to try when I have a bit more time (or if someone else would like to take a look):
I think it's broken that sort is O(n^2) when realloc doesn't mask it by extending in place. There's no good reason for that (I guess virtual memory size limits, but we normally we live in an overcommit world were address space is free.)
I think it's broken that sort is O(n^2) when realloc doesn't mask it by extending in place. There's no good reason for that (I guess virtual memory size limits, but we normally we live in an overcommit world were address space is free.)
I'm 99% sure there's no principled reason for doing it this way. I'll propose changing that upstream.
I think it's broken that sort is O(n^2) when realloc doesn't mask it by extending in place. There's no good reason for that (I guess virtual memory size limits, but we normally we live in an overcommit world were address space is free.)
I'm 99% sure there's no principled reason for doing it this way. I'll propose changing that upstream.
We could always fall back to a more conservative behaviour if we detect a virtual memory limit, but I'm not sure it's worth the hassle until we see a reason. I've only ever seen that limit used to try and catch memory leaks before they consume all of a system's RAM.
The current status is:
On dev (https://github.com/CTSRD-CHERI/cheribsd/commit/bdeff30fb6b1744816f43ed8a3c2f0a133d872c1) running GENERIC-MORELLO-PURECAP, I executed
make cscope CSCOPE_ARCHDIR=arm64
in sys/ and I noticed it never finishes withsecurity.cheri.runtime_revocation_default
set to1
regardless of the value ofsecurity.cheri.runtime_revocation_async
, even thoughcscope
was hybrid.It turned out it was a CheriABI
sort
process that doesn't finish: