PoisonNinja / mint

A simple, fast, and modern kernel
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Memory subsystem rewrite #5

Closed PoisonNinja closed 7 years ago

PoisonNinja commented 7 years ago

Intended to resolve #4.

In the beginning

When I initially wrote Mint, I was a n00b when it came to memory management (Honestly, I probably still am). So, I looked to other operating systems to see how they did memory management.

The first step was to build a physical memory manager. In my previous attempts at writing a kernel, I used a simple bitmap. However, bitmaps allocate in O(n) time, which can get horrendous as memory increases. This time, I wanted something that major operating systems were doing, and I discovered the buddy allocator. It performs reasonably well, reduces external memory fragmentation, and is capable of allocating large contiguous blocks.

I ended up looking at Linux's implementation, and I based Mint's allocator off of their design. The Mint buddy allocator uses bitmaps and stacks to keep track of memory, and allocation is O(1) which is really nice.

Building on top of the physical memory manager, I built a virtual memory manager to map and unmap pages. This was pretty trivial, and I was able to implement it relatively fast.

Last, I built a slab allocator on top of the buddy allocator. Why do we need a slab allocator? The buddy allocator works in multiples of a page size (4096 bytes on x86_64), and any request smaller than that would give you a page of memory. As you can imagine, that's incredibly wasteful. Imagine allocating 4096 bytes for a 8 byte object. The slab allocator requests a page from the physical memory manager, divides it into smaller chunks, and gives out the chunks to code that needs it.

I was pretty satisfied with this system. It performed fast, was efficient, and didn't waste any space.

A problem appears

To understand the problem, you have to understand how the buddy allocator works. The allocator uses a free stack, which stores the metadata in the free page itself. But, if you have any experience working with operating systems, you know that accessing the free page is a problem when paging is enabled. The way that Linux does this on x86_64, they map a huge chunk of physical memory (256 TiB??) into the virtual address space at a known offset. Thus, to access physical memory, you simply add the offset to your desired address.

Mint does this too, and addresses given out by the slab allocator also use this scheme. After all, the slab allocator is built on top of the physical allocator.

The problem is: you can't map the entire physical memory into virtual address space on 32-bit and even 16-bit systems. Thus, with the current scheme, a 32-bit port would be impossible!

The solution

This PR completely rewrites the memory subsystem from the ground up.

Physical memory manager

While we still use the buddy allocator, the stack algorithm has been slightly tweaked. We still store the metadata inside the free page, but instead of relying on the physical mapping offset (which no longer exists BTW), we map the page we want to access into a known virtual address that we control. Thus, we can access any physical page from that virtual address, as long as we update the tables correctly.

The performance impact is pretty small, but there is a definite impact from all the TLB shootdowns. We currently write to CR3 which flushes the ENTIRE TLB as a side effect, but we would get much better performance if we use invlpg instead.

Virtual memory manager

The virtual memory manager has been rewritten to use the fractal mapping trick. In the past, we used to directly access the page tables by using the physical memory map offset, but since that no longer exists, we have to find another way to do so.

The fractal mapping trick involves mapping the address of the PML4 into one of the PML4 slots, which means that we can access the tables in that PML4 at known virtual addresses. A function has been added to help calculate where all the tables are, so it's pretty easy to do so.

The mapping and unmapping functions use the fractal mapping technique, and this easily scales to 32-bit systems

Heap

The old slab allocator uses the physical memory offset, which no longer exists. Instead, we replaced the slab allocator with a actual heap implementation, called liballoc.

liballoc was ported from SpoonOS, and performs reasonably well given the time and resource constraints in an operating system. The kernel interface for allocating (kmalloc) and freeing (kfree) remain identical, but the addresses returned have changed slightly.

New stuff

valloc

Ever wanted to manage a virtual region similar to a physical memory region? Now you can! valloc manages a section of virtual memory for you, making sure code doesn't accidentally allocate the same virtual address multiple times.

The PCI subsystem uses this to allocate space for PCI device mapping.

DMA buffer allocation

A new DMA buffer allocation subsystem has been created, since you can no longer rely on the physical memory offset anymore. When allocating DMA memory, use dma_alloc, which returns a struct containing a pointer to the virtual buffer, and a pointer to the physical buffer.

The virtual buffer is intended for use in the kernel by the code, and the physical buffer is intended to be passed to the device itself (e.g. AHCI PRDT). The physical buffer is guaranteed to be located in ISA DMA compatible memory (< 16MB), and it is suitable for legacy ISA devices.

PoisonNinja commented 7 years ago

Let's hold off on merging this until at least the invlpg issue has been rectified.