Closed rwatson closed 1 month ago
So DRM maintains a global linked list of VMA structures that reference mapped GEM objects. The VM object corresponding to a GEM object keeps a pointer to the GEM object in handle
; during a fault, we look up the VMA using the handle as a key. We crashed because we took a fault but couldn't find the mapping VMA in the global list.
The VM object and GEM object look fine, i.e., they're valid, haven't been destroyed. The refcount on the VM object is 2. Interestingly, at the time of the panic, a child process of plasmashell was in the middle of exec'ing, so it was busy freeing its mappings and dropping VM object references.
In drm_fstub_do_mmap()
, we allocate a VMA and insert it into the global list. But apparently we handle the case where the key already exists, i.e., the GEM object has already been mapped via more than one VM object. In this case, we don't insert the VMA:
523 rw_wlock(&drm_vma_lock);
524 TAILQ_FOREACH(ptr, &drm_vma_head, vm_entry) {
525 if (ptr->vm_private_data == vm_private_data)
526 break;
527 }
528 /* check if there is an existing VM area struct */
529 if (ptr != NULL) {
530 /* check if the VM area structure is invalid */
531 if (ptr->vm_ops == NULL ||
532 ptr->vm_ops->open == NULL ||
533 ptr->vm_ops->close == NULL) {
534 rv = ESTALE;
535 vm_no_fault = 1;
536 } else {
537 rv = EEXIST;
538 vm_no_fault = (ptr->vm_ops->fault == NULL);
539 }
540 } else {
541 /* insert VM area structure into list */
542 TAILQ_INSERT_TAIL(&drm_vma_head, vmap, vm_entry);
543 rv = 0;
544 vm_no_fault = (vmap->vm_ops->fault == NULL);
545 }
546 rw_wunlock(&drm_vma_lock);
547
548 if (rv != 0) {
549 /* free allocated VM area struct */
550 drm_vmap_free(vmap);
551 /* check for stale VM area struct */
552 if (rv != EEXIST)
553 return (rv);
554 }
555
556 /* check if there is no fault handler */
557 if (vm_no_fault) {
558 *obj = cdev_pager_allocate(vm_private_data,
559 OBJT_DEVICE, &drm_dev_pg_ops, size, prot,
560 *foff, td->td_ucred);
561 } else {
562 *obj = cdev_pager_allocate(vm_private_data,
563 OBJT_MGTDEVICE, &drm_mgtdev_pg_ops, size, prot,
564 *foff, td->td_ucred);
565 }
So in the rv = EEXIST
case, it looks like we can end up with >1 VM object that references the same GEM object. When one of those VM objects is destroyed (maybe triggered by the child process exec()ing), the corresponding VMA will be removed from the global list, but then a fault on a different referencing VM object will fail to look up a VMA, and we'll trigger the panic that Robert hit. I can't really see any other way it could happen.
Thus my questions are:
drm_fstub_do_mmap()
find an existing GEM object in the global list?vm_start
and vm_end
fields of the VMA have?
@bukinr do you have any idea what causes 1)?I am trying to figure out. I inserted printfs, opened chromium (10 tabs), so far it does not reach any of these printf paths.
/* check if there is an existing VM area struct */
if (ptr != NULL) {
/* check if the VM area structure is invalid */
if (ptr->vm_ops == NULL ||
ptr->vm_ops->open == NULL ||
ptr->vm_ops->close == NULL) {
printf("%s stale\n", __func__);
rv = ESTALE;
vm_no_fault = 1;
} else {
printf("%s eexist\n", __func__);
rv = EEXIST;
vm_no_fault = (ptr->vm_ops->fault == NULL);
}
Note there are around 300 entries in the drm_vma_head when chromium is open.
In my experience, sudden termination of the display server can cause unhappiness in the kernel when other applications bump into the mess left behind. I wonder if it could be the case that something else triggered a bug in the display server .. and then plasmashell
tripped over it?
could be no memory situation that is not handled correctly. We don't have IOMMU enabled (iommu code exists and working). I got another panic during vm_page_reclaim_contig() that panfrost is calling when it can't allocate large chunks of contiguos memory
WARNING !list_empty(&lock->head) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_modeset_lock.c:268
WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:617
WARNING !drm_modeset_is_locked(&dev->mode_config.connection_mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:667
WARNING !drm_modeset_is_locked(&plane->mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:892
WARNING !drm_modeset_is_locked(&plane->mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:892
<3>[drm: 0xffff0000001768c8] *ERROR* [CRTC:33:crtc-0] hw_done timed out
<3>[drm: 0xffff0000001768f4] *ERROR* [CRTC:33:crtc-0] flip_done timed out
<3>[drm: 0xffff00000017697c] *ERROR* [CONNECTOR:35:HDMI-A-1] hw_done timed out
<3>[drm: 0xffff0000001769a8] *ERROR* [CONNECTOR:35:HDMI-A-1] flip_done timed out
<3>[drm: 0xffff000000176a38] *ERROR* [PLANE:31:plane-0] hw_done timed out
<3>[drm: 0xffff000000176a64] *ERROR* [PLANE:31:plane-0] flip_done timed out
<3>[drm: 0xffff000000176a38] *ERROR* [PLANE:32:plane-1] hw_done timed out
<3>[drm: 0xffff000000176a64] *ERROR* [PLANE:32:plane-1] flip_done timed out
panic: page 0xffffa08368d80080 is PG_FICTITIOUS or PG_MARKER
cpuid = 2
time = 1715087326
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x34
vpanic() at vpanic+0x13c
panic() at panic+0x64
vm_page_reclaim_contig_domain_ext() at vm_page_reclaim_contig_domain_ext+0xfc4
vm_page_reclaim_contig() at vm_page_reclaim_contig+0x70
panfrost_gem_get_pages() at panfrost_gem_get_pages+0xd8
panfrost_gem_open() at panfrost_gem_open+0x190
drm_gem_handle_create_tail() at drm_gem_handle_create_tail+0x180
panfrost_gem_create_object_with_handle() at panfrost_gem_create_object_with_handle+0x114
panfrost_ioctl_create_bo() at panfrost_ioctl_create_bo+0x38
drm_ioctl_kernel() at drm_ioctl_kernel+0xcc
drm_ioctl() at drm_ioctl+0x194
drm_fstub_ioctl() at drm_fstub_ioctl+0x84
kern_ioctl() at kern_ioctl+0x2e0
user_ioctl() at user_ioctl+0x178
do_el0_sync() at do_el0_sync+0x630
handle_el0_sync() at handle_el0_sync+0x3c
--- exception, esr 0x56000000
KDB: enter: panic
WARNING !list_empty(&lock->head) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_modeset_lock.c:268
WARNING !list_empty(&lock->head) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_modeset_lock.c:268
WARNING !list_empty(&lock->head) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_modeset_lock.c:268
WARNING !list_empty(&lock->head) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_modeset_lock.c:268
WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:617
WARNING !drm_modeset_is_locked(&dev->mode_config.connection_mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:667
WARNING !drm_modeset_is_locked(&plane->mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:892
WARNING !drm_modeset_is_locked(&plane->mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:892
<3>[drm: 0xffff0000001768c8] *ERROR* [CRTC:33:crtc-0] hw_done timed out
<3>[drm: 0xffff0000001768f4] *ERROR* [CRTC:33:crtc-0] flip_done timed out
<3>[drm: 0xffff00000017697c] *ERROR* [CONNECTOR:35:HDMI-A-1] hw_done timed out
<3>[drm: 0xffff0000001769a8] *ERROR* [CONNECTOR:35:HDMI-A-1] flip_done timed out
<3>[drm: 0xffff000000176a38] *ERROR* [PLANE:31:plane-0] hw_done timed out
<3>[drm: 0xffff000000176a64] *ERROR* [PLANE:31:plane-0] flip_done timed out
<3>[drm: 0xffff000000176a38] *ERROR* [PLANE:32:plane-1] hw_done timed out
<3>[drm: 0xffff000000176a64] *ERROR* [PLANE:32:plane-1] flip_done timed out
[ thread pid 1145 tid 101378 ]
Stopped at kdb_enter+0x48: str xzr, [x19]
db>
could be no memory situation that is not handled correctly. We don't have IOMMU enabled (iommu code exists and working). I got another panic during vm_page_reclaim_contig() that panfrost is calling when it can't allocate large chunks of contiguos memory
WARNING !list_empty(&lock->head) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_modeset_lock.c:268 WARNING !drm_modeset_is_locked(&crtc->mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:617 WARNING !drm_modeset_is_locked(&dev->mode_config.connection_mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:667 WARNING !drm_modeset_is_locked(&plane->mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:892 WARNING !drm_modeset_is_locked(&plane->mutex) failed at /home/br/cheri/cheribsd/sys/dev/drm/core/drm_atomic_helper.c:892 <3>[drm: 0xffff0000001768c8] *ERROR* [CRTC:33:crtc-0] hw_done timed out <3>[drm: 0xffff0000001768f4] *ERROR* [CRTC:33:crtc-0] flip_done timed out <3>[drm: 0xffff00000017697c] *ERROR* [CONNECTOR:35:HDMI-A-1] hw_done timed out <3>[drm: 0xffff0000001769a8] *ERROR* [CONNECTOR:35:HDMI-A-1] flip_done timed out <3>[drm: 0xffff000000176a38] *ERROR* [PLANE:31:plane-0] hw_done timed out <3>[drm: 0xffff000000176a64] *ERROR* [PLANE:31:plane-0] flip_done timed out <3>[drm: 0xffff000000176a38] *ERROR* [PLANE:32:plane-1] hw_done timed out <3>[drm: 0xffff000000176a64] *ERROR* [PLANE:32:plane-1] flip_done timed out panic: page 0xffffa08368d80080 is PG_FICTITIOUS or PG_MARKER
I just hit this on my new morello system, will debug a bit.
panic: page 0xffffa08368d80080 is PG_FICTITIOUS or PG_MARKER
I just hit this on my new morello system, will debug a bit.
Well, panfrost_alloc_pages_contig()
allocates regular physical memory, then sets PG_FICTITIOUS in the pages for some reason. The code which scans physical memory during defragmentation does not expect this, hence the panic.
I don't really understand why we use an OBJT_MGTDEVICE object for panfrost GEM objects. They are for containing fictitious pages (i.e., device memory or stolen pages of RAM), but the panfrost driver is allocating regular physical RAM. It should be OBJT_PHYS or OBJT_SWAP. The latter if we care about being able to remove all mappings of a given page, but I don't see anything in panfrost that relies on that...?
New panic today:
panic: refcount 0xffffa0808003f708 wraparound
cpuid = 3
time = 1721744621
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x34
vpanic() at vpanic+0x13c
panic() at panic+0x64
__drm_atomic_helper_connector_destroy_state() at __drm_atomic_helper_connector_destroy_state+0xa0
drm_atomic_helper_connector_destroy_state() at drm_atomic_helper_connector_destroy_state+0x18
drm_atomic_state_default_clear() at drm_atomic_state_default_clear+0x80
__drm_atomic_state_free() at __drm_atomic_state_free+0x34
drm_mode_atomic_ioctl() at drm_mode_atomic_ioctl+0x81c
drm_ioctl_kernel() at drm_ioctl_kernel+0xcc
drm_ioctl() at drm_ioctl+0x194
drm_fstub_ioctl() at drm_fstub_ioctl+0x84
kern_ioctl() at kern_ioctl+0x2e0
user_ioctl() at user_ioctl+0x178
do_el0_sync() at do_el0_sync+0x630
handle_el0_sync() at handle_el0_sync+0x3c
So DRM maintains a global linked list of VMA structures that reference mapped GEM objects. The VM object corresponding to a GEM object keeps a pointer to the GEM object in
handle
; during a fault, we look up the VMA using the handle as a key. We crashed because we took a fault but couldn't find the mapping VMA in the global list.The VM object and GEM object look fine, i.e., they're valid, haven't been destroyed. The refcount on the VM object is 2. Interestingly, at the time of the panic, a child process of plasmashell was in the middle of exec'ing, so it was busy freeing its mappings and dropping VM object references.
In
drm_fstub_do_mmap()
, we allocate a VMA and insert it into the global list. But apparently we handle the case where the key already exists, i.e., the GEM object has already been mapped via more than one VM object. In this case, we don't insert the VMA:523 rw_wlock(&drm_vma_lock); 524 TAILQ_FOREACH(ptr, &drm_vma_head, vm_entry) { 525 if (ptr->vm_private_data == vm_private_data) 526 break; 527 } 528 /* check if there is an existing VM area struct */ 529 if (ptr != NULL) { 530 /* check if the VM area structure is invalid */ 531 if (ptr->vm_ops == NULL || 532 ptr->vm_ops->open == NULL || 533 ptr->vm_ops->close == NULL) { 534 rv = ESTALE; 535 vm_no_fault = 1; 536 } else { 537 rv = EEXIST; 538 vm_no_fault = (ptr->vm_ops->fault == NULL); 539 } 540 } else { 541 /* insert VM area structure into list */ 542 TAILQ_INSERT_TAIL(&drm_vma_head, vmap, vm_entry); 543 rv = 0; 544 vm_no_fault = (vmap->vm_ops->fault == NULL); 545 } 546 rw_wunlock(&drm_vma_lock); 547 548 if (rv != 0) { 549 /* free allocated VM area struct */ 550 drm_vmap_free(vmap); 551 /* check for stale VM area struct */ 552 if (rv != EEXIST) 553 return (rv); 554 } 555 556 /* check if there is no fault handler */ 557 if (vm_no_fault) { 558 *obj = cdev_pager_allocate(vm_private_data, 559 OBJT_DEVICE, &drm_dev_pg_ops, size, prot, 560 *foff, td->td_ucred); 561 } else { 562 *obj = cdev_pager_allocate(vm_private_data, 563 OBJT_MGTDEVICE, &drm_mgtdev_pg_ops, size, prot, 564 *foff, td->td_ucred); 565 }
So in the
rv = EEXIST
case, it looks like we can end up with >1 VM object that references the same GEM object. When one of those VM objects is destroyed (maybe triggered by the child process exec()ing), the corresponding VMA will be removed from the global list, but then a fault on a different referencing VM object will fail to look up a VMA, and we'll trigger the panic that Robert hit. I can't really see any other way it could happen.Thus my questions are:
1. Under what circumstances does `drm_fstub_do_mmap()` find an existing GEM object in the global list? 2. If there is a legitimate explanation for 1, how do we handle it? A refcount on the VMA list entries would help, but some of those fields assume that the mapping is unique. That is, if you have two mappings referencing a GEM object via the VMA list, what meaning do the `vm_start` and `vm_end` fields of the VMA have? @bukinr do you have any idea what causes 1)?
@rwatson @markjdb @bukinr I found the bug and have a C program that can reproduce the panic. The C program is here: https://www.cl.cam.ac.uk/~pffm2/cheribsd_issue2097/reproducers/main.c Everything needed to build it is here. The bug manifests itself only with some interleavings of two processes that concurrently mmap and munmap the same GEM GPU buffer. This patch for sys/dev/drm/freebsd/drm_os_freebsd.c ensures with two pause() calls the interleaving that eventually leads to the panic when the C program is executed: https://www.cl.cam.ac.uk/~pffm2/cheribsd_issue2097/reproducers/drm_os_frebsd.c.patch
The C program also triggers fairly reliably the bug without this patch if this sleep() call is commented out: https://github.com/CTSRD-CHERI/pffm2/blob/c50cb472bc278114d82ae328382243c493467278/misc/issue2097/main.c#L48
In summary, the problem is that during an mmap() call that maps a GEM GPU buffer object this check if a corresponding VMA object already exists can be positive (meaning the object exists) even though another thread is in a code path that will free the VMA object. With the interleaving described below, the check in cdev_pager_allocate
here and here if a corresponding VM Object exists is negative. Hence, a new VM_Object is created with a reference in its handle
field that is later on used in an attempt to find a VMA object that was freed. In more detail, drm_vmap_find
in drm_cdev_pager_populate
cannot find the VMA object with the reference in the VM_Object handle, which then leads to the panic.
Here is the interleaving that causes the panic with the C program that I mentioned above:
drm_cdev_pager_dtor()
. The code in this function and the code that's executed after will free the VMA and VM_Object, but isn't executed yet.drm_cdev_pager_ctor
(see here). The check here if a VMA object for this GEM object exists was positive and we took the rv = EEXIST
code path that intends to reuse the existing VMA object that @markjdb mentioned. drm_cdev_pager_ctor
is then called in the implementation of cdev_pager_allocate
, which is called either here or here. In this step we don't create a new VMA object because we found an existing one.cdev_pager_allocate
if a VM_Object already exists fails and a new VM_Object is created. The handle
field of this new VM_Object
is set to a pointer to the mmaped() GEM object. VMA objects also hold this pointer in their vm_private_data
field and drm_vmap_find
in drm_cdev_pager_populate
uses this pointer to find VMA objects.drm_cdev_pager_populate
to be executed. drm_cdev_pager_populate
passes the pointer for the GEM object, that's held in the handle
field of the VM_Object, to drm_vmap_find
. drm_vmap_find
can't find the VMA object because it was freed in step 2 and returns NULL, which in turn then causes the panic.I don't have a fully thought through idea for a fix yet. Maybe a ref counter for VMA objects, like @markjdb suggested, would fix it.
@rwatson @markjdb @bukinr I found the bug and have a C program that can reproduce the panic.
Thank you for the thorough investigation. The links to your reproducers don't work, however - could you please update them?
The DRM code currently assumes that there is a 1-1 mapping between VMAs and GEM objects, which is wrong since it's apparently possible for multiple processes to share mappings of a GEM object (and presumably it's possible for a single process to map a GEM object multiple times).
I think the solution is to rework drm_vma_head
. What we want is a list of mapped GEM objects (effectively a list of unique vm_private_data
pointers), and for each entry, 1) a reference count, 2) the associated VM object (there should only be one), and perhaps 3) a list of VMAs mapping that object.
Note that there are several DRM object types potentially affected by this change, corresponding to: drm_vm_dma_ops
, drm_vm_ops
, drm_vm_shm_ops
, drm_vm_sg_ops
, panfrost_gem_vm_ops
.
I'll try to implement this today.
I updated the links to the reproducer. They were pointing into a private repository. I also updated the paragraph that starts with "In summary,..." to make it a bit clearer.
I added a refcounter for VMA objects and corresponding locking to
However, changing the code so that a separate VMA object is created every time a GEM object is mmaped sounds better.drm_cdev_pager_ctor()
, which seems to have fixed the bug. A patch file is attached. It contains some other debugging code that I added and IIRC the RW_RECURSE
flag in drm_stub_init()
isn't necessary. I tested it with my reproducer and OpenGL traces with which I encountered the panic before.
Note: I removed the patch file because I noticed a mistake in it.
Here is the patch that I mentioned yesterday, which adds a reference counter for the VMA objects. I'd be happy to test if with the latest version of CheriBSD (I'm using a fork of the 2022 release of CheriBSD with some backported DRM fixes) and create a PR if we want to use it in the interim.
At long last, I believe that the panics reported here are now fixed in dev
.
Running with a kernel/userlevel from #2080, I saw this kernel panic when starting an aarch64 Chromium web browser within an otherwise entirely purecap (kernel, userlevel, desktop) environment:
panic: Assertion vmap != NULL failed at /usr/src/sys/dev/drm/freebsd/drm_os_freebsd.c:370
The kernel build was:
FreeBSD cheri-blossom.sec.cl.cam.ac.uk 15.0-CURRENT FreeBSD 15.0-CURRENT #19 c18n_procstat-n268168-8e6f163a2c50: Tue Apr 9 02:29:44 UTC 2024 robert@cheri-blossom.sec.cl.cam.ac.uk:/usr/obj/usr/src/arm64.aarch64c/sys/GENERIC-MORELLO-PURECAP arm64
Async revocation and default enabled c18n are both turned on:
Console output:
KGDB on the crashdump reports:
The process in question was
plasmashell