Initial sDDF GPU Design

This PR adds an initial sDDF protocol design for 2D unaccelerated gpus. It contains implementation for:

gpu protocol: queues, events
gpu virtualiser
gpu virtio driver for qemu-virt-aarch64
an example gpu system which has a dummy client scanning out different sections of an image

Note:

~~The gpu example in this PR does not work on macos QEMU due to the udmabuf framework requirement, which only exists on linux.~~ (edit: BLOB feature is now conditionally compiled. By default it is off. You can turn on blob resources by specifying BLOB=1 in the Makefile args).
The example requires sudo privileges to simulate on QEMU, as udmabuf requires sudo to access. (edit: only true when BLOB=1 is specified in Makefile.
Zig build for the example is broken. The build also makes use of imagemagick's convert tool to figure out the image resolution, which I never figured out how to get working with zig's build tool. Even if the resolution was hardcoded, running the example leads to a VMfault (there's still a bug in my build system there somewhere). There was a stack overflow problem with zig, fixed by increasing stack size. Avoided the imagemagick convert tool issue by not allowing user to supply their own image in zig.
udmabuf expects each entry in the scatter gather list to be page aligned. QEMU will only warn you when you create a blob resource with a memory size that's unaligned, and otherwise let the request succeed. This I believe is not the correct behaviour, it should fail the request upon failing to create a proper memory backing. ~~But for now I've page aligned the blob resource memory in the client to avoid this issue.~~ Driver now does the aligning instead.

Protocol Design

The sddf gpu protocol design is quite similar to virtio gpu. The protocol introduces the concept of 2D resources that describe a 2D image which can then be scanned out to a display. Clients can enqueue requests to create these 2D resources, and also manipulate these 2D resources. 2D resources have its own private memory and additionally a separate memory backing that can be modified by clients directly. Clients can then make requests to update the private memory from its attached backing. Often, efficient implementations of the driver will put this private memory in device memory. Clients can also request to create/destroy resources and attach/detach client memory to these resources. The client is expected to bookkeep and manage its own memory that it attaches to the resources.

The protocol also provides blob resources that can be similarly manipulated. As opposed to 2D resources, blob resources do not assume a pixel format and has to be casted to a framebuffer object (it then becomes something similar to 2D resources) for it to be scanned out to a display. Blob resources also don't have private memory, it relies only on client allocated memory backing to enable potentially zero-copy communication.

Queues

The queues consist of a request/response queue implemented internally by ringbuffers.

Each request has an id that is matched with a corresponding response id.
Requests to the gpu involve creating a resource and manipulating its state, therefore all requests except GET_DISPLAY_INFO require passing in a resource id to identify the resource to manipulate.
Scanout requests require passing in a scanout id, which is determined from the response data of a GET_DISPLAY_INFO request.
An entry in the response queue will indicate whether the request has finished its processing or failed with a status code.
Requests can be enqueued asynchronously. There is no need to wait for the completion of the previous request before enqueuing the next.
Requests are fenced, meaning, a success response cannot be enqueued unless the request has completed processing. Non-fenced requests are not as relevant for 2D operations, though maybe useful for 3D.

Requests

Clients can make the following requests via the request queue:
- GET_DISPLAY_INFO
  - Request for system scanout information. This includes information relating to the number of scanouts and its display resolution.
  - Args:
    - mem_offset: Offset into data region where display info response would be written to.
- RESOURCE_CREATE_2D
  - Create a 2D resource, specifying its shape and format, and assigns it to a resource id.
  - Args:
    - resource_id: Assign id to resource.
    - width: Width in pixels of resource rectangle.
    - height: Height in pixels of resource rectangle.
    - format: Pixel format of resource.
- RESOURCE_UNREF
  - Destroy a resource.
  - Args:
    - resource_id: Resource id to be destroyed.
- RESOURCE_ATTACH_BACKING
  - Assign a contiguous segment of client memory to the resource.
  - Args:
    - resource_id: Id of resource of which backing is attached to.
    - mem_offset: Offset into data region of the memory backing to assign to resource.
    - mem_size: Size in bytes of the memory backing to assign to resource.
- RESOURCE_DETACH_BACKING
  - Detach the segment of memory from a resource.
  - Args:
    - resource_id: Id of resource from which backing is detached from.
- SET_SCANOUT
  - Set a 2D resource to a scanout.
  - Args:
    - resource_id: Id of resource.
    - scanout_id: Id of scanout.
    - rect: Rectangle within resource private memory for data to be scanned out from.
- TRANSFER_TO_2D
  - Transfer from the attached memory of a 2D resource to the resource's private memory.
  - Args:
    - resource_id: Id of resource for transfer
    - rect: Rectangle in resource's private memory where data is transferred to. Size of transfer is inferred from this rectangle.
    - mem_offset: Offset into resource's attached memory backing, from which data is transferred from.
- RESOURCE_FLUSH
  - Flush a resource to its assigned scanout. For 2D resources, this flushes the resource's private memory to scanout. For blob resources that have been set to a scanout (thus casted to a framebuffer object), this flushes the attached backing to the scanout.
  - Args:
    - resource_id: Id of resource to flush to scanout.
    - rect: Rectangle within the resource to flush to scanout.
      - RESOURCE_CREATE_BLOB
      - Create a blob resource. A blob resource does not assume any format and only takes in a contiguous segment of memory as its backing.
      - Args:
      - resource_id: Assign id to blob resource.
      - mem_offset: Offset in data region to the memory backing to assign to blob resource.
      - mem_size: Size in bytes of the memory backing to assign to blob resource.
      - SET_SCANOUT_BLOB
      - Set scanout of blob resources can be thought of as casting the resource into a framebuffer object. This lets the driver know how to interpret the resource for scanout to a display.
      - Args:
      - resource_id: Id of blob resource.
      - scanout_id: Id of scanout.
      - width: Width in pixels of blob resource rectangle.
      - height: Height in pixels of blob resource rectangle.
      - format: Pixel format of blob resource.
      - stride: Bytes from one row of pixels to the next in blob resource rectangle.
      - offset: Offset into blob resource memory, where data to the blob resource rectangle begins.
      - rect: Rectangle within the blob resource for data to be scanned out from.
Requesting SET_SCANOUT or SET_SCANOUT_BLOB with resource id=0 will disable the scanout. No resources can be created with id=0 as it is reserved for this purpose.
Blob resources support swapping in memory backing during runtime. This can be done by requesting RESOURCE_ATTACH_BACKING and RESOURCE_DETACH_BACKING on the resource.
Blob resources can be flushed to a scanout using RESOURCE_FLUSH, just like 2D resources.
Blob resources do not interact with TRANSFER_TO_2D and SET_SCANOUT, doing so will result in an error.
2D resources do not interact with SET_SCANOUT_BLOB, doing so will result in an error.

Initialisation

As part of initialisation, clients MUST first make a GET_DISPLAY_INFO request. Client is not considered initialised until a successful GET_DISPLAY_INFO request has been responded to. The client is then expected to use the scanout information from the GET_DISPLAY_INFO response for further operation.

Events

GPU devices needs to notify the client when changes happen in the hardware. These events are synchronised by atomic accesses in shared memory. There is currently only 1 event type, which is a display_info event.

Display info event

This event is triggered by plugging in or removing a new display, or when displays change their resolution.
A GPU virtualiser can also trigger this event when it wishes to change the scanout emulation view it's presenting to its clients. This can be thought of as a runtime virtualiser policy change.
The client must make a GET_DISPLAY_INFO request to get the latest scanout information. In-flight requests between a display info event being triggered and the response of the corresponding GET_DISPLAY_INFO being received by the client is considered stale.
Stale requests due to a new display info event are benign in the current design, and thus nothing more is done to them (they may or may not be failed by the virtualiser or driver).
A display info event must be cleared BEFORE enqueueing a GET_DISPLAY_INFO request. This is to ensure that a new display info event is propagated to the client in the scenario that during an in-flight GET_DISPLAY_INFO request, the hardware has changed the display info thus requiring the client to make another GET_DISPLAY_INFO request.

Blob resources and private resources

There are two types of resources a client can create. There are 2D resources which assumes a format consisting of a width, height, and pixel format that describes a 2D image. And there is blob resources which does not assume a format and only has memory with an associated size.

Blob resources do not have private memory, the GPU will scanout from the memory attached to that resource directly, allowing zero-copy communication. This is true if the system has integrated graphics that does not have its own VRAM, otherwise for a dedicated GPU there is still one necessary copy from main memory to device memory.
2D resources will typically have their private memory in device memory, thus requiring a transfer operation from the attached backing that the client has access to, to that private memory. This is not an inefficiency for dedicated GPUs where a copy is necessary, but it is an inefficiency for environments without dedicated VRAM which is typical of integrated GPUs.
Note that technically, we can create blob resources with device memory and introduce a memory map and unmap request for clients allowing zero-copy for dedicated GPUs. VirtIO allows this, but for some reason only if 3D operations are supported. Providing support for this feature in sDDF GPU which only supports 2D would make interfacing with virtIO more difficult. This is not the only reason: it is tricker to implement on Microkit as it only allows static mapping of memory regions, which would mean unwanted static mappings into VRAM from each client.

Request reordering

Requests that operate on the same resource cannot be reordered amongst each other. This applies to ALL requests except for GET_DISPLAY_INFO.
Any request that come before a RESOURCE_FLUSH request cannot be moved after it, and any requests that come after it cannot be moved before it.
Other than the previous conditions, the driver is free to reorder any requests.

Example Operation: Creating a framebuffer and configuring a scanout

With 2D resources:

Enqueue a GET_DISPLAY_INFO as part of initialisation to determine how many scanouts exists and which to use.
Create a 2D resource using RESOURCE_CREATE_2D, specify its width, height and pixel format.
Attach a memory backing to the resource using RESOURCE_ATTACH_BACKING, specifying an offset and size in the client data region.
Set the scanout using SET_SCANOUT by passing in the scanout's scanout_id obtained previously. Specify a rectangle within the resource to scanout from.
Transfer the memory from the attached backing to resource memory using TRANSFER_TO_2D. Specify an offset from the memory backing to transfer from, and a rectangle within resource memory to transfer to.
Flush the resource to the scanout using RESOURCE_FLUSH, specifying a rectangle within the resource to flush.

With blob resources:

Enqueue a GET_DISPLAY_INFO as part of initialisation to determine how many scanouts exists and which to use.
Create a blob resource using RESOURCE_CREATE_BLOB, specifying the memory size.
Set the scanout using SET_SCANOUT_BLOB by passing in the scanout's scanout_id obtained previously. Specify the pixel format, width, height of the resource, and a rectangle within the resource to scanout from. Also specify the alignment requirements of the resource with its stride and offset.
Flush the resource to the scanout using RESOURCE_FLUSH, specifying a rectangle within the resource to flush.

Compatibility with 3D

Modern 3D GPUs functions much like a state machine, and thus drivers would need to validate and/or do pre-processing / post-processing of client command stream data before it can be passed to the device. This design means that a 3D gpu driver must have access to the client's data and thus needs to be trusted. For 2D GPU drivers, there is no command stream so no need to do validations, and the framebuffer objects are standardised with pixel formats, thus 2D GPU drivers do not need the client data mapped in.
Blob resources do not assume a format despite the current sDDF design specifying only framebuffer as its object type. The generality of this design interfaces better with 3D where resources can be more than one type: e.g. vertex buffers, shaders, textures etc.

GPU Virtualiser

The GPU virtualiser has two roles, it translates the offsets from clients into IO addresses, and it also virtualises the scanout information from the device to clients. The current implementation does the simplest thing: forward the same view of the device scanouts to all clients. Under this implementation, a client's scanout id would be identical to the device's true scanout id.

The virtualiser remaps resource ids when forwarding client requests to a driver to maintain unique resource ids between clients.
The virtualiser has a data region it shares with the driver. Whilst it forwards IO addresses from the client data region for most client requests, this region is needed by the virtualiser to request display info from the driver using GET_DISPLAY_INFO. The virtualiser needs to request this information upon each display info event, and multiplex the scanout view to clients (although the current implementation is giving an identity mapping).
Client GET_DISPLAY_INFO requests does not get delivered to the driver and is instead responded to entirely by the virtualiser. The virtualiser returns an identity map of scanout info.

Assumptions on request failure

Bookkeeping requests from multiple clients in the GPU virtualiser is complex, and is made even more complex when these requests fail due to the asynchronous nature of the request/response queues. I've made an assumption that other than the requests which create a resource, requests should never fail under normal circumstances. And if they do there is something catastrophically wrong with either the driver/device which would render recovery of bookkeeping state meaningless. This simplifies the virtualiser drastically by avoiding complex recovery logic upon request failure. The only exception for this is when requests are rendered stale due to display info events, which thankfully, if you inspect the logic carefully does not require us to perform any complex recovery logic.

GPU VirtIO Driver

Implementation of the virtIO GPU v1.2 specification with support for VIRTIO_GPU_F_RESOURCE_BLOB feature

Future Work

Move towards a 3D GPU protocol, this interface needs to be generic across different 3D APIs (opengl, vulkan)
Implement a cursor queue, this is an optimisation for desktop environments to allow for more responsive cursor movement.
Consider using scatter gather for resource memory allocations. Would need a more thorough look at use cases to see whether this is necessary.
Investigate how to do blob resources with device memory instead of system memory to improve performance on dedicated GPUs with VRAM.
Do an exhaustive investigation on the possible failure conditions on each request to validate assumptions made in the virtualiser.
Remove legacy 2D resources and only support blob resources. The way it is currently is because it maps more cleanly to virtio-gpu.

au-ts / sddf