CTSRD-CHERI / cheribsd

FreeBSD adapted for CHERI-RISC-V and Arm Morello.
http://cheribsd.org
Other
164 stars 59 forks source link

Crash in cheri_revoke() -> kqueue_cheri_revoke_list() code relating to revocation #1876

Closed rwatson closed 1 year ago

rwatson commented 1 year ago

Running with our 2023-09-29 demo branch image, I got the following crash while sitting in user level GDB doing some debugging (although it was idle when it crashed). The desktop was running on the system console, although I was actually debugging from the serial console. I had temporal safety turned on with security.cheri.runtime_quarantine_default=1 set.

root@cheri-blossom:~ # Kernel page fault with the following non-sleepable locks held:
exclusive sleep mutex kqueue (kqueue) r = 0 (0xffffa00009b9c300) locked @ /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/kern/kern_event.c:2983
stack backtrace:
#0 0xffff000000565ee0 at witness_debugger+0x5c
#1 0xffff0000005670f0 at witness_warn+0x3e8
#2 0xffff00000083d510 at data_abort+0xec
#3 0xffff000000816010 at handle_el1h_sync+0x10
  x0: 0xffffa00009b9c300
  x1: 0xffff000174ec7470 (cdce_etag + 0x219bb70)
  x2: 0x0000000000000000
  x3: 0x0000000000000ba7
  x4: 0xffffa08f7fd8da80
  x5: 0x0000000000000035
  x6: 0x0000000000000000
  x7: 0x0000000000000000
  x8: 0x0000000000000000
  x9: 0x0000000000000000
 x10: 0x0000000000000000
 x11: 0x0000000000000000
 x12: 0xffff0000010095d0 (w_locklistdata + 0x3a350)
 x13: 0xffff0001750a4450 (cdce_etag + 0x2378b50)
 x14: 0x0000000000010000
 x15: 0x0000000000000001
 x16: 0x0000000000000008
 x17: 0x0000000000000001
 x18: 0xffff000174ec7380 (cdce_etag + 0x219ba80)
 x19: 0xffff0001750a4450 (cdce_etag + 0x2378b50)
 x20: 0xffff000174ec7470 (cdce_etag + 0x219bb70)
 x21: 0xffffa00009b9c300
 x22: 0xffff000000a58074 (console_pausestr + 0x33906)
 x23: 0xffffa00009b9c318
 x24: 0x0000000000000001
 x25: 0x0000000000000100
 x26: 0xffff000174df2700 (cdce_etag + 0x20c6e00)
 x27: 0x0000000000000001
 x28: 0x0000000000000000
 x29: 0xffff000174ec73a0 (cdce_etag + 0x219baa0)
 ddc: 0x0000000000000000 [rwRW,0x0000000000000000-0x0001000000000000]
  sp: 0xffff000174ec7380
  lr: 0xffff00000049ba80 (kqueue_cheri_revoke + 0xc0)
 elr: 0xffff00000049bb00 [rwxRW,0x0000000000000000-0xffffffffffffffff] (kqueue_cheri_revoke_list + 0x18)
spsr: 0x0000000060400045
 far: 0x0000000000000000
 esr: 0x0000000096000007
WARNING !list_empty(&lock->head) failed at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/dev/drm/core/drm_modeset_lock.c:268
WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:617
WARNING !drm_modeset_is_locked(&dev->mode_config.connection_mutex) failed at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:667
WARNING !drm_modeset_is_locked(&plane->mutex) failed at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:892
WARNING !drm_modeset_is_locked(&plane->mutex) failed at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:892
<3>[drm: 0xffff000000170ff0] *ERROR* [CRTC:33:crtc-0] hw_done timed out
<3>[drm: 0xffff00000017101c] *ERROR* [CRTC:33:crtc-0] flip_done timed out
<3>[drm: 0xffff0000001710a4] *ERROR* [CONNECTOR:35:HDMI-A-1] hw_done timed out
<3>[drm: 0xffff0000001710d0] *ERROR* [CONNECTOR:35:HDMI-A-1] flip_done timed out
<3>[drm: 0xffff000000171160] *ERROR* [PLANE:31:plane-0] hw_done timed out
<3>[drm: 0xffff00000017118c] *ERROR* [PLANE:31:plane-0] flip_done timed out
<3>[drm: 0xffff000000171160] *ERROR* [PLANE:32:plane-1] hw_done timed out
<3>[drm: 0xffff00000017118c] *ERROR* [PLANE:32:plane-1] flip_done timed out
panic: data abort in critical section or under mutex
cpuid = 2
time = 1695994726
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x19c
panic() at panic+0x60
data_abort() at data_abort+0x338
handle_el1h_sync() at handle_el1h_sync+0x10
--- exception, esr 0x96000007
kqueue_cheri_revoke_list() at kqueue_cheri_revoke_list+0x18
kqueue_cheri_revoke() at kqueue_cheri_revoke+0xbc
sys_cheri_revoke() at sys_cheri_revoke+0x540
do_el0_sync() at do_el0_sync+0x5b8
handle_el0_sync() at handle_el0_sync+0x38
--- exception, esr 0x56000000
WARNING !list_empty(&lock->head) failed at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/dev/drm/core/drm_modeset_lock.c:268
WARNING !list_empty(&lock->head) failed at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/dev/drm/core/drm_modeset_lock.c:268
WARNING !list_empty(&lock->head) failed at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/dev/drm/core/drm_modeset_lock.c:268
WARNING !list_empty(&lock->head) failed at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/dev/drm/core/drm_modeset_lock.c:268
WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:617
WARNING !drm_modeset_is_locked(&dev->mode_config.connection_mutex) failed at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:667
WARNING !drm_modeset_is_locked(&plane->mutex) failed at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:892
WARNING !drm_modeset_is_locked(&plane->mutex) failed at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:892
<3>[drm: 0xffff000000170ff0] *ERROR* [CRTC:33:crtc-0] hw_done timed out
<3>[drm: 0xffff00000017101c] *ERROR* [CRTC:33:crtc-0] flip_done timed out
<3>[drm: 0xffff0000001710a4] *ERROR* [CONNECTOR:35:HDMI-A-1] hw_done timed out
<3>[drm: 0xffff0000001710d0] *ERROR* [CONNECTOR:35:HDMI-A-1] flip_done timed out
<3>[drm: 0xffff000000171160] *ERROR* [PLANE:31:plane-0] hw_done timed out
<3>[drm: 0xffff00000017118c] *ERROR* [PLANE:31:plane-0] flip_done timed out
<3>[drm: 0xffff000000171160] *ERROR* [PLANE:32:plane-1] hw_done timed out
<3>[drm: 0xffff00000017118c] *ERROR* [PLANE:32:plane-1] flip_done timed out
Uptime: 14m44s
WARNING !list_empty(&lock->head) failed at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/dev/drm/core/drm_modeset_lock.c:268
WARNING !list_empty(&lock->head) failed at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/dev/drm/core/drm_modeset_lock.c:268
WARNING !list_empty(&lock->head) failed at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/dev/drm/core/drm_modeset_lock.c:268
WARNING !list_empty(&lock->head) failed at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/dev/drm/core/drm_modeset_lock.c:268
WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:617
WARNING !drm_modeset_is_locked(&dev->mode_config.connection_mutex) failed at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:667
WARNING !drm_modeset_is_locked(&plane->mutex) failed at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:892
WARNING !drm_modeset_is_locked(&plane->mutex) failed at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:892
<3>[drm: 0xffff000000170ff0] *ERROR* [CRTC:33:crtc-0] hw_done timed out
<3>[drm: 0xffff00000017101c] *ERROR* [CRTC:33:crtc-0] flip_done timed out
<3>[drm: 0xffff0000001710a4] *ERROR* [CONNECTOR:35:HDMI-A-1] hw_done timed out
<3>[drm: 0xffff0000001710d0] *ERROR* [CONNECTOR:35:HDMI-A-1] flip_done timed out
<3>[drm: 0xffff000000171160] *ERROR* [PLANE:31:plane-0] hw_done timed out
<3>[drm: 0xffff00000017118c] *ERROR* [PLANE:31:plane-0] flip_done timed out
<3>[drm: 0xffff000000171160] *ERROR* [PLANE:32:plane-1] hw_done timed out
<3>[drm: 0xffff00000017118c] *ERROR* [PLANE:32:plane-1] flip_done timed out
Dumping 2026 out of 65476 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
Dump complete
markjdb commented 1 year ago

Are you able to get line numbers for the backtrace out of the kernel dump?

rwatson commented 1 year ago

I don’t have a working kgdb on this box, but a non-working kgdb suggests that the process is kscreenlocker_greet, but is not able to give me any details on the kernel stack trace (e.g., locals, etc). I do have a vmcore if I manage to find myself a working kgdb:

* 312  Thread 100575 (PID=1564: kscreenlocker_greet)                0x0000000000000000 in ?? ()
markjdb commented 1 year ago

Well, in kqueue_cheri_revoke() we have:

for (ix = 0; ix <= kq->kq_knhashmask; ix++) {
    if (kqueue_cheri_revoke_list(kq, crc,
        &kq->kq_knhash[ix]))
            goto again;

but the loop bounds only work if the mask is non-zero.

markjdb commented 1 year ago

See PR #1877 for a possible solution.

rwatson commented 1 year ago

I now have a backtrace from kgdb:

get_curthread ()
    at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/arm64/include/pcpu.h:94
94      /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/arm64/include/pcpu.h: No such file or directory.
(kgdb) bt
#0  get_curthread ()
    at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/arm64/include/pcpu.h:94
#1  doadump (textdump=textdump@entry=1)
    at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/kern/kern_shutdown.c:413
#2  0xffff0000004f151c in kern_reboot (howto=260)
    at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/kern/kern_shutdown.c:534
#3  0xffff0000004f1a94 in vpanic (
    fmt=fmt@entry=0xffff000000996b20 "data abort in critical section or under mutex", ap=...)
    at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/kern/kern_shutdown.c:994
#4  0xffff0000004f17ac in panic (
    fmt=0xffff000000996b20 "data abort in critical section or under mutex")
    at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/kern/kern_shutdown.c:918
#5  0xffff00000083d760 in data_abort (td=0xffff000174df2700, 
    frame=0xffff000174ec70c0, esr=2516582407, far=0, lower=0)
    at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/arm64/arm64/trap.c:444
#6  <signal handler called>
#7  0xffff00000049bb00 in kqueue_cheri_revoke_list (
    kq=kq@entry=0xffffa00009b9c300, crc=crc@entry=0xffff000174ec7470, list=0x0)
    at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/kern/kern_event.c:2913
#8  0xffff00000049ba80 in kqueue_cheri_revoke (fdp=<optimized out>, 
    crc=crc@entry=0xffff000174ec7470)
    at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/kern/kern_event.c:2991
#9  0xffff000000477980 in cheri_revoke_hoarders (p=0xffff0001750c5b30, 
    crc=0xffff000174ec7470)
    at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/kern/kern_cheri_revoke.c:71
#10 kern_cheri_revoke (td=<optimized out>, flags=4096, 
    start_epoch=<optimized out>, crsi=0x0)
    at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/kern/kern_cheri_revoke.c:379
#11 sys_cheri_revoke (td=0xffff000174df2700, uap=<optimized out>)
    at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/kern/kern_cheri_revoke.c:582
#12 0xffff00000083cd94 in syscallenter (td=0xffff000174df2700)
    at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/arm64/arm64/../../kern/subr_syscall.c:202
#13 svc_handler (td=0xffff000174df2700, frame=<optimized out>)
    at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/arm64/arm64/trap.c:246
#14 do_el0_sync (td=0xffff000174df2700, frame=<optimized out>)
    at /local/scratch/jenkins/workspace/CheriBSD-pipeline_demo-2023-10@3/cheribsd/sys/arm64/arm64/trap.c:771
#15 <signal handler called>
#16 0x0000000043b13610 in ?? ()
#17 0x0000000043b50660 in ?? ()
markjdb commented 1 year ago

I now have a backtrace from kgdb:

Right, this looks like the bug I fixed in the aforementioned PR.