CTSRD-CHERI / cheribsd

FreeBSD adapted for CHERI-RISC-V and Arm Morello.
http://cheribsd.org
Other
162 stars 59 forks source link

sort on a large file never ends with enabled revocations #2131

Open kwitaszczyk opened 2 months ago

kwitaszczyk commented 2 months ago

On dev (https://github.com/CTSRD-CHERI/cheribsd/commit/bdeff30fb6b1744816f43ed8a3c2f0a133d872c1) running GENERIC-MORELLO-PURECAP, I executed make cscope CSCOPE_ARCHDIR=arm64 in sys/ and I noticed it never finishes with security.cheri.runtime_revocation_default set to 1 regardless of the value of security.cheri.runtime_revocation_async, even though cscope was hybrid.

It turned out it was a CheriABI sort process that doesn't finish:

kw543@cheribsd:~ $ sudo sysctl security.cheri.runtime_revocation_default=0
security.cheri.runtime_revocation_default: 1 -> 0
kw543@cheribsd:~ $ time env LC_ALL=C sort -T /tmp/ cscope.1548/cscope.1 >/dev/null
       23.75 real        21.85 user         1.89 sys
kw543@cheribsd:~ $ sudo sysctl security.cheri.runtime_revocation_default=1
security.cheri.runtime_revocation_default: 0 -> 1
kw543@cheribsd:~ $ sudo sysctl security.cheri.runtime_revocation_async=1
security.cheri.runtime_revocation_async: 1 -> 1
kw543@cheribsd:~ $ time env LC_ALL=C sort -T /tmp/ cscope.1548/cscope.1 >/dev/null
load: 1.72  cmd: sort 1767 [vm map (user)] 231.34r 40.89u 105.47s 51% 18350112k
mi_switch+0x1c0 sleepq_switch+0x118 _sx_xlock_hard+0x448 _sx_xlock+0xcc _vm_map_lock+0x40 sys_cheri_revoke+0x88 do_el0_sync+0x61c handle_el0_sync+0x30 
^C      232.82 real        40.89 user       105.56 sys
kw543@cheribsd:~ $ sudo sysctl security.cheri.runtime_revocation_default=1
security.cheri.runtime_revocation_default: 1 -> 1
kw543@cheribsd:~ $ sudo sysctl security.cheri.runtime_revocation_async=0
security.cheri.runtime_revocation_async: 1 -> 0
kw543@cheribsd:~ $ time env LC_ALL=C sort -T /tmp/ cscope.1548/cscope.1 >/dev/null
load: 1.40  cmd: sort 1784 [running] 747.62r 158.63u 586.87s 100% 52613668k
^C
      757.22 real       158.84 user       596.27 sys
markjdb commented 2 months ago

sort's got several realloc loops which grow the allocation by a fixed number of entries. cscope generates a decently large input file (230MB), so sort calls realloc() many times. This interacts poorly with MRS, which implements realloc() by always allocating a new buffer, copying, and freeing the old one. In particular, the quarantine ends up growing very quickly and triggers many revocation scans.

This behaviour in sort dates back to the original implementation, I guess it's just fast enough that no one's really tried to tune it. Changing all of the loops to grow the allocation size by a factor of 2 rather than just adding 1024 or 128 new entries lets sort finish in a reasonable amount of time. With that change on my morello system, a make -C sys cscope CSCOPE_ARCHDIR=arm64 takes:

We should try to find out how much of this overhead is caused by MRS having a pessimal realloc() implementation.

For this release, I suppose we could simply commit these changes to sort. I think they are low-risk.

markjdb commented 2 months ago

A couple of experiments to try when I have a bit more time (or if someone else would like to take a look):

brooksdavis commented 2 months ago

I think it's broken that sort is O(n^2) when realloc doesn't mask it by extending in place. There's no good reason for that (I guess virtual memory size limits, but we normally we live in an overcommit world were address space is free.)

markjdb commented 2 months ago

I think it's broken that sort is O(n^2) when realloc doesn't mask it by extending in place. There's no good reason for that (I guess virtual memory size limits, but we normally we live in an overcommit world were address space is free.)

I'm 99% sure there's no principled reason for doing it this way. I'll propose changing that upstream.

markjdb commented 2 months ago

I think it's broken that sort is O(n^2) when realloc doesn't mask it by extending in place. There's no good reason for that (I guess virtual memory size limits, but we normally we live in an overcommit world were address space is free.)

I'm 99% sure there's no principled reason for doing it this way. I'll propose changing that upstream.

We could always fall back to a more conservative behaviour if we detect a virtual memory limit, but I'm not sure it's worth the hassle until we see a reason. I've only ever seen that limit used to try and catch memory leaks before they consume all of a system's RAM.

markjdb commented 1 month ago

The current status is: