Closed rwatson closed 1 year ago
Are you able to get line numbers for the backtrace out of the kernel dump?
I don’t have a working kgdb on this box, but a non-working kgdb suggests that the process is kscreenlocker_greet, but is not able to give me any details on the kernel stack trace (e.g., locals, etc). I do have a vmcore if I manage to find myself a working kgdb:
* 312 Thread 100575 (PID=1564: kscreenlocker_greet) 0x0000000000000000 in ?? ()
Well, in kqueue_cheri_revoke()
we have:
for (ix = 0; ix <= kq->kq_knhashmask; ix++) {
if (kqueue_cheri_revoke_list(kq, crc,
&kq->kq_knhash[ix]))
goto again;
but the loop bounds only work if the mask is non-zero.
See PR #1877 for a possible solution.
I now have a backtrace from kgdb:
get_curthread ()
at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/arm64/include/pcpu.h:94
94 /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/arm64/include/pcpu.h: No such file or directory.
(kgdb) bt
#0 get_curthread ()
at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/arm64/include/pcpu.h:94
#1 doadump (textdump=textdump@entry=1)
at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/kern/kern_shutdown.c:413
#2 0xffff0000004f151c in kern_reboot (howto=260)
at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/kern/kern_shutdown.c:534
#3 0xffff0000004f1a94 in vpanic (
fmt=fmt@entry=0xffff000000996b20 "data abort in critical section or under mutex", ap=...)
at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/kern/kern_shutdown.c:994
#4 0xffff0000004f17ac in panic (
fmt=0xffff000000996b20 "data abort in critical section or under mutex")
at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/kern/kern_shutdown.c:918
#5 0xffff00000083d760 in data_abort (td=0xffff000174df2700,
frame=0xffff000174ec70c0, esr=2516582407, far=0, lower=0)
at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/arm64/arm64/trap.c:444
#6 <signal handler called>
#7 0xffff00000049bb00 in kqueue_cheri_revoke_list (
kq=kq@entry=0xffffa00009b9c300, crc=crc@entry=0xffff000174ec7470, list=0x0)
at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/kern/kern_event.c:2913
#8 0xffff00000049ba80 in kqueue_cheri_revoke (fdp=<optimized out>,
crc=crc@entry=0xffff000174ec7470)
at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/kern/kern_event.c:2991
#9 0xffff000000477980 in cheri_revoke_hoarders (p=0xffff0001750c5b30,
crc=0xffff000174ec7470)
at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/kern/kern_cheri_revoke.c:71
#10 kern_cheri_revoke (td=<optimized out>, flags=4096,
start_epoch=<optimized out>, crsi=0x0)
at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/kern/kern_cheri_revoke.c:379
#11 sys_cheri_revoke (td=0xffff000174df2700, uap=<optimized out>)
at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/kern/kern_cheri_revoke.c:582
#12 0xffff00000083cd94 in syscallenter (td=0xffff000174df2700)
at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/arm64/arm64/../../kern/subr_syscall.c:202
#13 svc_handler (td=0xffff000174df2700, frame=<optimized out>)
at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/arm64/arm64/trap.c:246
#14 do_el0_sync (td=0xffff000174df2700, frame=<optimized out>)
at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/arm64/arm64/trap.c:771
#15 <signal handler called>
#16 0x0000000043b13610 in ?? ()
#17 0x0000000043b50660 in ?? ()
I now have a backtrace from kgdb:
Right, this looks like the bug I fixed in the aforementioned PR.
Running with our 2023-09-29 demo branch image, I got the following crash while sitting in user level GDB doing some debugging (although it was idle when it crashed). The desktop was running on the system console, although I was actually debugging from the serial console. I had temporal safety turned on with
security.cheri.runtime_quarantine_default=1
set.