CTSRD-CHERI / cheribsd

FreeBSD adapted for CHERI-RISC-V and Arm Morello.
http://cheribsd.org
Other
170 stars 60 forks source link

panic: Assertion vmap != NULL failed at /usr/src/sys/dev/drm/freebsd/drm_os_freebsd.c:370 #2097

Closed rwatson closed 1 month ago

rwatson commented 6 months ago

Running with a kernel/userlevel from #2080, I saw this kernel panic when starting an aarch64 Chromium web browser within an otherwise entirely purecap (kernel, userlevel, desktop) environment:

panic: Assertion vmap != NULL failed at /usr/src/sys/dev/drm/freebsd/drm_os_freebsd.c:370

The kernel build was:

FreeBSD cheri-blossom.sec.cl.cam.ac.uk 15.0-CURRENT FreeBSD 15.0-CURRENT #19 c18n_procstat-n268168-8e6f163a2c50: Tue Apr 9 02:29:44 UTC 2024 robert@cheri-blossom.sec.cl.cam.ac.uk:/usr/obj/usr/src/arm64.aarch64c/sys/GENERIC-MORELLO-PURECAP arm64

Async revocation and default enabled c18n are both turned on:

# sysctl security.cheri
security.cheri.ptrace_caps: 0
security.cheri.cloadtags_stride: 4
security.cheri.sealcap: 0x4 [,0x4-0x2000]
security.cheri.runtime_revocation_async: 1
security.cheri.runtime_revocation_every_free_default: 0
security.cheri.runtime_revocation_default: 1
security.cheri.lib_based_c18n_default: 1
security.cheri.bound_legacy_capabilities: 0
security.cheri.abort_on_memcpy_tag_loss: 0
security.cheri.debugger_on_sandbox_syscall: 0
security.cheri.stats.untagged_ptrace_caps: 0
security.cheri.stats.forged_ptrace_caps: 0
security.cheri.stats.syscall_violations: 0
security.cheri.capability_size: 16
security.cheri.cidcap: 0x0 [,0x0-0x8000000000000000]

Console output:

panic: Assertion vmap != NULL failed at /usr/src/sys/dev/drm/freebsd/drm_os_freebsd.c:370
cpuid = 2
time = 1714877899
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
vpanic() at vpanic+0x17c
panic() at panic+0x48
drm_cdev_pager_populate() at drm_cdev_pager_populate+0x218
vm_fault_allocate() at vm_fault_allocate+0x434
vm_fault() at vm_fault+0x45c
vm_fault_trap() at vm_fault_trap+0x78
data_abort() at data_abort+0x1b8
do_el0_sync() at do_el0_sync+0xac
handle_el0_sync() at handle_el0_sync+0x30
--- exception, esr 0x92000047

KGDB on the crashdump reports:

b) bt
#0  get_curthread () at /usr/src/sys/arm64/include/pcpu.h:92
#1  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:411
#2  0xffff00000057ca30 in kern_reboot (howto=18007856)
    at /usr/src/sys/kern/kern_shutdown.c:529
#3  0xffff00000057d028 in vpanic (
    fmt=0xffff000000a852fb [rR,0xffff000000a852fb-0xffff000000a85318] (invalid) "Assertion %s failed at %s:%d", ap=<optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:989
#4  0xffff00000057cd10 in panic (
    fmt=0xffff000000a852fb [rR,0xffff000000a852fb-0xffff000000a85318] (invalid) "Assertion %s failed at %s:%d") at /usr/src/sys/kern/kern_shutdown.c:913
#5  0xffff0000001dbe6c in drm_cdev_pager_populate (
    vm_obj=0xffffa080d61241f0 [rwRW,0xffffa080d61241f0-0xffffa080d61243e0] (invalid), pidx=0, fault_type=0, max_prot=<optimized out>, 
    first=0xffff0001943a1db0 [rwRW,0xffff0001943a1db0-0xffff0001943a1db8] (invalid), 
    last=0xffff0001943a1da8 [rwRW,0xffff0001943a1da8-0xffff0001943a1db0] (invalid))
    at /usr/src/sys/dev/drm/freebsd/drm_os_freebsd.c:371
#6  0xffff0000008bc1f8 in vm_pager_populate (object=0x1, pidx=18446462598750848816, 
    fault_type=16901008, 
    first=0xffff00000112c124 <boottrace_enabled> [rwRW,0xffff00000112c124-0xffff00000112c125] (invalid), 
    last=0xffff0000010ab174 <cold> [rwRW,0xffff0000010ab174-0xffff0000010ab178] (invalid), max_prot=<optimized out>) at /usr/src/sys/vm/vm_pager.h:185
#7  vm_fault_populate (
    fs=0xffff0001943a2010 [rwRW,0xffff0001943a2010-0xffff0001943a2110] (invalid))
    at /usr/src/sys/vm/vm_fault.c:668
#8  vm_fault_allocate (
    fs=0xffff0001943a2010 [rwRW,0xffff0001943a2010-0xffff0001943a2110] (invalid))
    at /usr/src/sys/vm/vm_fault.c:1500
#9  0xffff0000008ba7a4 in vm_fault_object (
    fs=0xffff0001943a2010 [rwRW,0xffff0001943a2010-0xffff0001943a2110] (invalid), 
    behindp=<optimized out>, aheadp=<optimized out>)
--Type <RET> for more, q to quit, c to continue without paging--
    at /usr/src/sys/vm/vm_fault.c:1767
#10 vm_fault (map=<optimized out>, vaddr=2261975040, fault_type=0 '\000', 
    fault_flags=<optimized out>, m_hold=<optimized out>)
    at /usr/src/sys/vm/vm_fault.c:1901
#11 0xffff0000008ba198 in vm_fault_trap (
    map=0xffff00019890c140 [rwRW,0xffff00019890c140-0xffff00019890c420] (invalid), 
    vaddr=18446462598750318964, fault_type=0 '\000', fault_flags=16901008, 
    signo=0xffff0001943a22ec [rwRW,0xffff0001943a22ec-0xffff0001943a22f0] (invalid), 
    ucode=0xffff0001943a22e8 [rwRW,0xffff0001943a22e8-0xffff0001943a22ec] (invalid))
    at /usr/src/sys/vm/vm_fault.c:925
#12 0xffff00000093e4a8 in data_abort (
    td=0xffff00019a659c80 [rwRW,0xffff00019a659c80-0xffff00019a65a5f0] (invalid), 
    frame=0xffff0001943a25d0 [rwRW,0xffff00019439d000-0xffff0001943a3000] (invalid), 
    esr=18446462598750848816, far=18446638520598272768, lower=18006308)
    at /usr/src/sys/arm64/arm64/trap.c:485
#13 0xffff00000093d678 in do_el0_sync (
    td=0xffff00019a659c80 [rwRW,0xffff00019a659c80-0xffff00019a65a5f0] (invalid), 
    frame=0xffff0001943a25d0 [rwRW,0xffff00019439d000-0xffff0001943a3000] (invalid))
    at /usr/src/sys/arm64/arm64/trap.c:781
#14 <signal handler called>
#15 0x0000000044046934 in ?? ()
#16 0x000000005e7a3a44 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

The process in question was plasmashell

markjdb commented 6 months ago

So DRM maintains a global linked list of VMA structures that reference mapped GEM objects. The VM object corresponding to a GEM object keeps a pointer to the GEM object in handle; during a fault, we look up the VMA using the handle as a key. We crashed because we took a fault but couldn't find the mapping VMA in the global list.

The VM object and GEM object look fine, i.e., they're valid, haven't been destroyed. The refcount on the VM object is 2. Interestingly, at the time of the panic, a child process of plasmashell was in the middle of exec'ing, so it was busy freeing its mappings and dropping VM object references.

In drm_fstub_do_mmap(), we allocate a VMA and insert it into the global list. But apparently we handle the case where the key already exists, i.e., the GEM object has already been mapped via more than one VM object. In this case, we don't insert the VMA:

523                 rw_wlock(&drm_vma_lock);                                                                                                                                                                                                                                                                                  
524                 TAILQ_FOREACH(ptr, &drm_vma_head, vm_entry) {                                                                                                                                                                                                                                                             
525                         if (ptr->vm_private_data == vm_private_data)                                                                                                                                                                                                                                                      
526                                 break;                                                                                                                                                                                                                                                                                    
527                 }                                                                                                                                                                                                                                                                                                         
528                 /* check if there is an existing VM area struct */                                                                                                                                                                                                                                                        
529                 if (ptr != NULL) {                                                                                                                                                                                                                                                                                        
530                         /* check if the VM area structure is invalid */                                                                                                                                                                                                                                                   
531                         if (ptr->vm_ops == NULL ||                                                                                                                                                                                                                                                                        
532                             ptr->vm_ops->open == NULL ||                                                                                                                                                                                                                                                                  
533                             ptr->vm_ops->close == NULL) {                                                                                                                                                                                                                                                                 
534                                 rv = ESTALE;                                                                                                                                                                                                                                                                              
535                                 vm_no_fault = 1;                                                                                                                                                                                                                                                                          
536                         } else {                                                                                                                                                                                                                                                                                          
537                                 rv = EEXIST;                                                                                                                                                                                                                                                                              
538                                 vm_no_fault = (ptr->vm_ops->fault == NULL);                                                                                                                                                                                                                                               
539                         }                                                                                                                                                                                                                                                                                                 
540                 } else {                                                                                                                                                                                                                                                                                                  
541                         /* insert VM area structure into list */                                                                                                                                                                                                                                                          
542                         TAILQ_INSERT_TAIL(&drm_vma_head, vmap, vm_entry);                                                                                                                                                                                                                                                 
543                         rv = 0;                                                                                                                                                                                                                                                                                           
544                         vm_no_fault = (vmap->vm_ops->fault == NULL);                                                                                                                                                                                                                                                      
545                 }                                                                                                                                                                                                                                                                                                         
546                 rw_wunlock(&drm_vma_lock);                                                                                                                                                                                                                                                                                
547                                                                                                                                                                                                                                                                                                                           
548                 if (rv != 0) {                                                                                                                                                                                                                                                                                            
549                         /* free allocated VM area struct */                                                                                                                                                                                                                                                               
550                         drm_vmap_free(vmap);                                                                                                                                                                                                                                                                              
551                         /* check for stale VM area struct */                                                                                                                                                                                                                                                              
552                         if (rv != EEXIST)                                                                                                                                                                                                                                                                                 
553                                 return (rv);                                                                                                                                                                                                                                                                              
554                 }                                                                                                                                                                                                                                                                                                         
555                                                                                                                                                                                                                                                                                                                           
556                 /* check if there is no fault handler */                                                                                                                                                                                                                                                                  
557                 if (vm_no_fault) {                                                                                                                                                                                                                                                                                        
558                         *obj = cdev_pager_allocate(vm_private_data,                                                                                                                                                                                                                                                       
559                             OBJT_DEVICE, &drm_dev_pg_ops, size, prot,                                                                                                                                                                                                                                                     
560                             *foff, td->td_ucred);                                                                                                                                                                                                                                                                         
561                 } else {                                                                                                                                                                                                                                                                                                  
562                         *obj = cdev_pager_allocate(vm_private_data,                                                                                                                                                                                                                                                       
563                             OBJT_MGTDEVICE, &drm_mgtdev_pg_ops, size, prot,                                                                                                                                                                                                                                               
564                             *foff, td->td_ucred);                                                                                                                                                                                                                                                                         
565                 }

So in the rv = EEXIST case, it looks like we can end up with >1 VM object that references the same GEM object. When one of those VM objects is destroyed (maybe triggered by the child process exec()ing), the corresponding VMA will be removed from the global list, but then a fault on a different referencing VM object will fail to look up a VMA, and we'll trigger the panic that Robert hit. I can't really see any other way it could happen.

Thus my questions are:

  1. Under what circumstances does drm_fstub_do_mmap() find an existing GEM object in the global list?
  2. If there is a legitimate explanation for 1, how do we handle it? A refcount on the VMA list entries would help, but some of those fields assume that the mapping is unique. That is, if you have two mappings referencing a GEM object via the VMA list, what meaning do the vm_start and vm_end fields of the VMA have? @bukinr do you have any idea what causes 1)?
bukinr commented 6 months ago

I am trying to figure out. I inserted printfs, opened chromium (10 tabs), so far it does not reach any of these printf paths.

                /* check if there is an existing VM area struct */
                if (ptr != NULL) {
                        /* check if the VM area structure is invalid */
                        if (ptr->vm_ops == NULL ||
                            ptr->vm_ops->open == NULL ||
                            ptr->vm_ops->close == NULL) {
printf("%s stale\n", __func__);
                                rv = ESTALE;
                                vm_no_fault = 1;
                        } else {
printf("%s eexist\n", __func__);
                                rv = EEXIST;
                                vm_no_fault = (ptr->vm_ops->fault == NULL);
                        }

Note there are around 300 entries in the drm_vma_head when chromium is open.

rwatson commented 6 months ago

In my experience, sudden termination of the display server can cause unhappiness in the kernel when other applications bump into the mess left behind. I wonder if it could be the case that something else triggered a bug in the display server .. and then plasmashell tripped over it?

bukinr commented 6 months ago

could be no memory situation that is not handled correctly. We don't have IOMMU enabled (iommu code exists and working). I got another panic during vm_page_reclaim_contig() that panfrost is calling when it can't allocate large chunks of contiguos memory

WARNING !list_empty(&lock->head) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_modeset_lock.c:268
WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:617
WARNING !drm_modeset_is_locked(&dev->mode_config.connection_mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:667
WARNING !drm_modeset_is_locked(&plane->mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:892
WARNING !drm_modeset_is_locked(&plane->mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:892
<3>[drm: 0xffff0000001768c8] *ERROR* [CRTC:33:crtc-0] hw_done timed out
<3>[drm: 0xffff0000001768f4] *ERROR* [CRTC:33:crtc-0] flip_done timed out
<3>[drm: 0xffff00000017697c] *ERROR* [CONNECTOR:35:HDMI-A-1] hw_done timed out
<3>[drm: 0xffff0000001769a8] *ERROR* [CONNECTOR:35:HDMI-A-1] flip_done timed out
<3>[drm: 0xffff000000176a38] *ERROR* [PLANE:31:plane-0] hw_done timed out
<3>[drm: 0xffff000000176a64] *ERROR* [PLANE:31:plane-0] flip_done timed out
<3>[drm: 0xffff000000176a38] *ERROR* [PLANE:32:plane-1] hw_done timed out
<3>[drm: 0xffff000000176a64] *ERROR* [PLANE:32:plane-1] flip_done timed out
panic: page 0xffffa08368d80080 is PG_FICTITIOUS or PG_MARKER
cpuid = 2
time = 1715087326
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x34
vpanic() at vpanic+0x13c
panic() at panic+0x64
vm_page_reclaim_contig_domain_ext() at vm_page_reclaim_contig_domain_ext+0xfc4
vm_page_reclaim_contig() at vm_page_reclaim_contig+0x70
panfrost_gem_get_pages() at panfrost_gem_get_pages+0xd8
panfrost_gem_open() at panfrost_gem_open+0x190
drm_gem_handle_create_tail() at drm_gem_handle_create_tail+0x180
panfrost_gem_create_object_with_handle() at panfrost_gem_create_object_with_handle+0x114
panfrost_ioctl_create_bo() at panfrost_ioctl_create_bo+0x38
drm_ioctl_kernel() at drm_ioctl_kernel+0xcc
drm_ioctl() at drm_ioctl+0x194
drm_fstub_ioctl() at drm_fstub_ioctl+0x84
kern_ioctl() at kern_ioctl+0x2e0
user_ioctl() at user_ioctl+0x178
do_el0_sync() at do_el0_sync+0x630
handle_el0_sync() at handle_el0_sync+0x3c
--- exception, esr 0x56000000
KDB: enter: panic
WARNING !list_empty(&lock->head) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_modeset_lock.c:268
WARNING !list_empty(&lock->head) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_modeset_lock.c:268
WARNING !list_empty(&lock->head) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_modeset_lock.c:268
WARNING !list_empty(&lock->head) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_modeset_lock.c:268
WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:617
WARNING !drm_modeset_is_locked(&dev->mode_config.connection_mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:667
WARNING !drm_modeset_is_locked(&plane->mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:892
WARNING !drm_modeset_is_locked(&plane->mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:892
<3>[drm: 0xffff0000001768c8] *ERROR* [CRTC:33:crtc-0] hw_done timed out
<3>[drm: 0xffff0000001768f4] *ERROR* [CRTC:33:crtc-0] flip_done timed out
<3>[drm: 0xffff00000017697c] *ERROR* [CONNECTOR:35:HDMI-A-1] hw_done timed out
<3>[drm: 0xffff0000001769a8] *ERROR* [CONNECTOR:35:HDMI-A-1] flip_done timed out
<3>[drm: 0xffff000000176a38] *ERROR* [PLANE:31:plane-0] hw_done timed out
<3>[drm: 0xffff000000176a64] *ERROR* [PLANE:31:plane-0] flip_done timed out
<3>[drm: 0xffff000000176a38] *ERROR* [PLANE:32:plane-1] hw_done timed out
<3>[drm: 0xffff000000176a64] *ERROR* [PLANE:32:plane-1] flip_done timed out
[ thread pid 1145 tid 101378 ]
Stopped at      kdb_enter+0x48: str     xzr, [x19]
db> 
markjdb commented 4 months ago

could be no memory situation that is not handled correctly. We don't have IOMMU enabled (iommu code exists and working). I got another panic during vm_page_reclaim_contig() that panfrost is calling when it can't allocate large chunks of contiguos memory


WARNING !list_empty(&lock->head) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_modeset_lock.c:268
WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:617
WARNING !drm_modeset_is_locked(&dev->mode_config.connection_mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:667
WARNING !drm_modeset_is_locked(&plane->mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:892
WARNING !drm_modeset_is_locked(&plane->mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:892
<3>[drm: 0xffff0000001768c8] *ERROR* [CRTC:33:crtc-0] hw_done timed out
<3>[drm: 0xffff0000001768f4] *ERROR* [CRTC:33:crtc-0] flip_done timed out
<3>[drm: 0xffff00000017697c] *ERROR* [CONNECTOR:35:HDMI-A-1] hw_done timed out
<3>[drm: 0xffff0000001769a8] *ERROR* [CONNECTOR:35:HDMI-A-1] flip_done timed out
<3>[drm: 0xffff000000176a38] *ERROR* [PLANE:31:plane-0] hw_done timed out
<3>[drm: 0xffff000000176a64] *ERROR* [PLANE:31:plane-0] flip_done timed out
<3>[drm: 0xffff000000176a38] *ERROR* [PLANE:32:plane-1] hw_done timed out
<3>[drm: 0xffff000000176a64] *ERROR* [PLANE:32:plane-1] flip_done timed out
panic: page 0xffffa08368d80080 is PG_FICTITIOUS or PG_MARKER

I just hit this on my new morello system, will debug a bit.

markjdb commented 4 months ago

panic: page 0xffffa08368d80080 is PG_FICTITIOUS or PG_MARKER

I just hit this on my new morello system, will debug a bit.

Well, panfrost_alloc_pages_contig() allocates regular physical memory, then sets PG_FICTITIOUS in the pages for some reason. The code which scans physical memory during defragmentation does not expect this, hence the panic.

I don't really understand why we use an OBJT_MGTDEVICE object for panfrost GEM objects. They are for containing fictitious pages (i.e., device memory or stolen pages of RAM), but the panfrost driver is allocating regular physical RAM. It should be OBJT_PHYS or OBJT_SWAP. The latter if we care about being able to remove all mappings of a given page, but I don't see anything in panfrost that relies on that...?

markjdb commented 4 months ago

New panic today:

panic: refcount 0xffffa0808003f708 wraparound                                                                                                                                                                                                                                                                                 
cpuid = 3                                                                                                                                                                                                                                                                                                                     
time = 1721744621                                                                                                                                                                                                                                                                                                             
KDB: stack backtrace:                                                                                                                                                                                                                                                                                                         
db_trace_self() at db_trace_self                                                                                                                                                                                                                                                                                              
db_trace_self_wrapper() at db_trace_self_wrapper+0x34                                                                                                                                                                                                                                                                         
vpanic() at vpanic+0x13c                                                                                                                                                                                                                                                                                                      
panic() at panic+0x64                                                                                                                                                                                                                                                                                                         
__drm_atomic_helper_connector_destroy_state() at __drm_atomic_helper_connector_destroy_state+0xa0                                                                                                                                                                                                                             
drm_atomic_helper_connector_destroy_state() at drm_atomic_helper_connector_destroy_state+0x18                                                                                                                                                                                                                                 
drm_atomic_state_default_clear() at drm_atomic_state_default_clear+0x80                                                                                                                                                                                                                                                       
__drm_atomic_state_free() at __drm_atomic_state_free+0x34                                                                                                                                                                                                                                                                     
drm_mode_atomic_ioctl() at drm_mode_atomic_ioctl+0x81c                                                                                                                                                                                                                                                                        
drm_ioctl_kernel() at drm_ioctl_kernel+0xcc                                                                                                                                                                                                                                                                                   
drm_ioctl() at drm_ioctl+0x194                                                                                                                                                                                                                                                                                                
drm_fstub_ioctl() at drm_fstub_ioctl+0x84                                                                                                                                                                                                                                                                                     
kern_ioctl() at kern_ioctl+0x2e0                                                                                                                                                                                                                                                                                              
user_ioctl() at user_ioctl+0x178                                                                                                                                                                                                                                                                                              
do_el0_sync() at do_el0_sync+0x630                                                                                                                                                                                                                                                                                            
handle_el0_sync() at handle_el0_sync+0x3c
paulmetzger commented 4 months ago

So DRM maintains a global linked list of VMA structures that reference mapped GEM objects. The VM object corresponding to a GEM object keeps a pointer to the GEM object in handle; during a fault, we look up the VMA using the handle as a key. We crashed because we took a fault but couldn't find the mapping VMA in the global list.

The VM object and GEM object look fine, i.e., they're valid, haven't been destroyed. The refcount on the VM object is 2. Interestingly, at the time of the panic, a child process of plasmashell was in the middle of exec'ing, so it was busy freeing its mappings and dropping VM object references.

In drm_fstub_do_mmap(), we allocate a VMA and insert it into the global list. But apparently we handle the case where the key already exists, i.e., the GEM object has already been mapped via more than one VM object. In this case, we don't insert the VMA:

523                 rw_wlock(&drm_vma_lock);                                                                                                                                                                                                                                                                                  
524                 TAILQ_FOREACH(ptr, &drm_vma_head, vm_entry) {                                                                                                                                                                                                                                                             
525                         if (ptr->vm_private_data == vm_private_data)                                                                                                                                                                                                                                                      
526                                 break;                                                                                                                                                                                                                                                                                    
527                 }                                                                                                                                                                                                                                                                                                         
528                 /* check if there is an existing VM area struct */                                                                                                                                                                                                                                                        
529                 if (ptr != NULL) {                                                                                                                                                                                                                                                                                        
530                         /* check if the VM area structure is invalid */                                                                                                                                                                                                                                                   
531                         if (ptr->vm_ops == NULL ||                                                                                                                                                                                                                                                                        
532                             ptr->vm_ops->open == NULL ||                                                                                                                                                                                                                                                                  
533                             ptr->vm_ops->close == NULL) {                                                                                                                                                                                                                                                                 
534                                 rv = ESTALE;                                                                                                                                                                                                                                                                              
535                                 vm_no_fault = 1;                                                                                                                                                                                                                                                                          
536                         } else {                                                                                                                                                                                                                                                                                          
537                                 rv = EEXIST;                                                                                                                                                                                                                                                                              
538                                 vm_no_fault = (ptr->vm_ops->fault == NULL);                                                                                                                                                                                                                                               
539                         }                                                                                                                                                                                                                                                                                                 
540                 } else {                                                                                                                                                                                                                                                                                                  
541                         /* insert VM area structure into list */                                                                                                                                                                                                                                                          
542                         TAILQ_INSERT_TAIL(&drm_vma_head, vmap, vm_entry);                                                                                                                                                                                                                                                 
543                         rv = 0;                                                                                                                                                                                                                                                                                           
544                         vm_no_fault = (vmap->vm_ops->fault == NULL);                                                                                                                                                                                                                                                      
545                 }                                                                                                                                                                                                                                                                                                         
546                 rw_wunlock(&drm_vma_lock);                                                                                                                                                                                                                                                                                
547                                                                                                                                                                                                                                                                                                                           
548                 if (rv != 0) {                                                                                                                                                                                                                                                                                            
549                         /* free allocated VM area struct */                                                                                                                                                                                                                                                               
550                         drm_vmap_free(vmap);                                                                                                                                                                                                                                                                              
551                         /* check for stale VM area struct */                                                                                                                                                                                                                                                              
552                         if (rv != EEXIST)                                                                                                                                                                                                                                                                                 
553                                 return (rv);                                                                                                                                                                                                                                                                              
554                 }                                                                                                                                                                                                                                                                                                         
555                                                                                                                                                                                                                                                                                                                           
556                 /* check if there is no fault handler */                                                                                                                                                                                                                                                                  
557                 if (vm_no_fault) {                                                                                                                                                                                                                                                                                        
558                         *obj = cdev_pager_allocate(vm_private_data,                                                                                                                                                                                                                                                       
559                             OBJT_DEVICE, &drm_dev_pg_ops, size, prot,                                                                                                                                                                                                                                                     
560                             *foff, td->td_ucred);                                                                                                                                                                                                                                                                         
561                 } else {                                                                                                                                                                                                                                                                                                  
562                         *obj = cdev_pager_allocate(vm_private_data,                                                                                                                                                                                                                                                       
563                             OBJT_MGTDEVICE, &drm_mgtdev_pg_ops, size, prot,                                                                                                                                                                                                                                               
564                             *foff, td->td_ucred);                                                                                                                                                                                                                                                                         
565                 }

So in the rv = EEXIST case, it looks like we can end up with >1 VM object that references the same GEM object. When one of those VM objects is destroyed (maybe triggered by the child process exec()ing), the corresponding VMA will be removed from the global list, but then a fault on a different referencing VM object will fail to look up a VMA, and we'll trigger the panic that Robert hit. I can't really see any other way it could happen.

Thus my questions are:

1. Under what circumstances does `drm_fstub_do_mmap()` find an existing GEM object in the global list?

2. If there is a legitimate explanation for 1, how do we handle it? A refcount on the VMA list entries would help, but some of those fields assume that the mapping is unique. That is, if you have two mappings referencing a GEM object via the VMA list, what meaning do the `vm_start` and `vm_end` fields of the VMA have?
   @bukinr do you have any idea what causes 1)?

@rwatson @markjdb @bukinr I found the bug and have a C program that can reproduce the panic. The C program is here: https://www.cl.cam.ac.uk/~pffm2/cheribsd_issue2097/reproducers/main.c Everything needed to build it is here. The bug manifests itself only with some interleavings of two processes that concurrently mmap and munmap the same GEM GPU buffer. This patch for sys/dev/drm/freebsd/drm_os_freebsd.c ensures with two pause() calls the interleaving that eventually leads to the panic when the C program is executed: https://www.cl.cam.ac.uk/~pffm2/cheribsd_issue2097/reproducers/drm_os_frebsd.c.patch

The C program also triggers fairly reliably the bug without this patch if this sleep() call is commented out: https://github.com/CTSRD-CHERI/pffm2/blob/c50cb472bc278114d82ae328382243c493467278/misc/issue2097/main.c#L48

In summary, the problem is that during an mmap() call that maps a GEM GPU buffer object this check if a corresponding VMA object already exists can be positive (meaning the object exists) even though another thread is in a code path that will free the VMA object. With the interleaving described below, the check in cdev_pager_allocate here and here if a corresponding VM Object exists is negative. Hence, a new VM_Object is created with a reference in its handle field that is later on used in an attempt to find a VMA object that was freed. In more detail, drm_vmap_find in drm_cdev_pager_populate cannot find the VMA object with the reference in the VM_Object handle, which then leads to the panic.

Here is the interleaving that causes the panic with the C program that I mentioned above:

  1. The parent process mmaps the GEM GPU buffer and unmaps it with a call to munmap() right after that. The munmap code is executed up to the beginning of drm_cdev_pager_dtor(). The code in this function and the code that's executed after will free the VMA and VM_Object, but isn't executed yet.
  2. The child process mmaps the GEM buffer and executes the mmap code up to the line after the MPASS check in drm_cdev_pager_ctor (see here). The check here if a VMA object for this GEM object exists was positive and we took the rv = EEXIST code path that intends to reuse the existing VMA object that @markjdb mentioned. drm_cdev_pager_ctor is then called in the implementation of cdev_pager_allocate, which is called either here or here. In this step we don't create a new VMA object because we found an existing one.
  3. The parent process continues to execute the munmap() code, which frees the VMA object that was found in step 2 and the corresponding VM_Object is also freed.
  4. The child process continues to execute the mmap() code. A check in cdev_pager_allocate if a VM_Object already exists fails and a new VM_Object is created. The handle field of this new VM_Object is set to a pointer to the mmaped() GEM object. VMA objects also hold this pointer in their vm_private_data field and drm_vmap_find in drm_cdev_pager_populate uses this pointer to find VMA objects.
  5. The child process attempts to write to the mmaped GEM GPU buffer, which in turn causes drm_cdev_pager_populate to be executed. drm_cdev_pager_populate passes the pointer for the GEM object, that's held in the handle field of the VM_Object, to drm_vmap_find. drm_vmap_find can't find the VMA object because it was freed in step 2 and returns NULL, which in turn then causes the panic.

I don't have a fully thought through idea for a fix yet. Maybe a ref counter for VMA objects, like @markjdb suggested, would fix it.

markjdb commented 4 months ago

@rwatson @markjdb @bukinr I found the bug and have a C program that can reproduce the panic.

Thank you for the thorough investigation. The links to your reproducers don't work, however - could you please update them?

The DRM code currently assumes that there is a 1-1 mapping between VMAs and GEM objects, which is wrong since it's apparently possible for multiple processes to share mappings of a GEM object (and presumably it's possible for a single process to map a GEM object multiple times).

I think the solution is to rework drm_vma_head. What we want is a list of mapped GEM objects (effectively a list of unique vm_private_data pointers), and for each entry, 1) a reference count, 2) the associated VM object (there should only be one), and perhaps 3) a list of VMAs mapping that object.

Note that there are several DRM object types potentially affected by this change, corresponding to: drm_vm_dma_ops, drm_vm_ops, drm_vm_shm_ops, drm_vm_sg_ops, panfrost_gem_vm_ops.

I'll try to implement this today.

paulmetzger commented 4 months ago

I updated the links to the reproducer. They were pointing into a private repository. I also updated the paragraph that starts with "In summary,..." to make it a bit clearer.

I added a refcounter for VMA objects and corresponding locking to drm_cdev_pager_ctor(), which seems to have fixed the bug. A patch file is attached. It contains some other debugging code that I added and IIRC the RW_RECURSE flag in drm_stub_init() isn't necessary. I tested it with my reproducer and OpenGL traces with which I encountered the panic before. However, changing the code so that a separate VMA object is created every time a GEM object is mmaped sounds better.

Note: I removed the patch file because I noticed a mistake in it.

paulmetzger commented 4 months ago

Here is the patch that I mentioned yesterday, which adds a reference counter for the VMA objects. I'd be happy to test if with the latest version of CheriBSD (I'm using a fork of the 2022 release of CheriBSD with some backported DRM fixes) and create a PR if we want to use it in the interim.

markjdb commented 1 month ago

At long last, I believe that the panics reported here are now fixed in dev.