[kernel] Experimental - trampoline interrupt vectors removed from heap

Mellvik / TLVC

Tiny Linux for Vintage Computers

Other

7 stars 0 forks source link

[kernel] Experimental - trampoline interrupt vectors removed from heap #33

Closed Mellvik closed 7 months ago

Mellvik commented 7 months ago

This experimental change eliminates the use of kernel heap space for interrupt vectors. In earlier days these vectors were allocated at boot time and ended up at the beginning of the heap where they could do little damage. Since the IRQ allocation was changed to happen on device open instead of device init, these vectors can end up anywhere in the heap, contributing to fragmentation which in tight situations is damaging. In addition, given the overhead of 12 bytes per heap allocation, the 6 byte vectors don't really belong on the heap.

This change allocates local storage for 8 (#define INT_VECTOR_MAX 8) such vectors in arch/i86/kernel/irq.c and allows additional vectors to be allocated on the heap the old way. The number 8 is chosen because it covers most 'normal' situations. Exceptions are when 3 or 4 serial lines plus networking plus the new interrupt driven directHD are used concurrently.

This code can be enabled/disabled via an #ifdef in irq.c: #ifdef USE_LOCAL_IRQVEC

ghaerr commented 7 months ago

Getting these allocations out of the kernel near heap seems like a good idea, especially if a device driver is opened some time after boot and stays open.

I'm curious as to any more details you may have on heap fragmentation you observed, are other allocations alongside those with mismatched lifespans? Or is it allocation sizing that seems to be a bigger issue? I know the serial, pty and tty drivers can require large character queue allocations that might not be able to be met in certain system use cases.

Perhaps the near kernel heap might be better arranged (somehow) by sorting large vs small, or short-lived vs long-lived, allocations into separate arenas.

Mellvik commented 7 months ago

Hi @ghaerr, As often is the case, it started with an itch. I was experimenting with allocating net packet buffers from the heap instead of static, like you suggested a while back. The L1 cache was still quite big, 14 buffers initially, so I ran into serious heap trouble and started paying close attention to what was going on, using meminfo. Leading to vastly improved understanding of both memory allocation in general and heap allocation/usage.

Long story short, when starting and stopping the network, maybe switching between net interfaces which I do quite a bit for testing, fragmentation became an issue. The following is after just one net restart:

  HEAP   TYPE  SIZE    SEG   TYPE    SIZE  CNT
  9ffe   SEG     16   2476   free     160    0
  a01a   INT      6
  a02c   INT      6
  a03e   INT      6
  a050   BUFH 10240
  c85c   BUFH   480
  ca48   SEG     16   2a80   CSEG    7056    1
  ca64   SEG     16   2480   BUF    24576    1
  ca80   INT      6
  ca92   SEG     16   12c6   DSEG    5872    1
  caae   INT      6
  cac0   TTY   1024
  cecc   TTY     80
  cf28   SEG     16   2c39   CSEG    4096    1
  cf44   SEG     16   1435   free    1040    0
  cf60   SEG     16   4af2   DSEG    5680    1
  cf7c   SEG     16   3ea0   DSEG   25232    1
  cf98   SEG     16   44c9   free   25232    0
  cfb4   SEG     16   3877   DSEG   25232    1
  cfd0   BUFH  1536
  d5dc   free  4790
  e89e   SEG     16   2d39   CSEG   46048    2
  e8ba   SEG     16   511b   CSEG   18032    1
  e8d6   SEG     16   5582   free   39904    0
  e8f2   SEG     16   63a0   DSEG   57824    1
  e90e   INT      6
  e920   SEG     26   71be   free  188448    0
  e946   TTY     80
  e9a2   TTY     80
  e9fe   BUFH  1536
  f00a   BUFH  1536
  f616   BUFH  1536
  fc22   SEG     16   50a4   free    1904    0
  fc3e   SEG     16   4c55   CSEG   14032    1
  fc5a   SEG     16   4fc2   CSEG    3616    1
  fc76   SEG     16   5f40   DSEG   14672    1
  fc92   SEG     16   62d5   DSEG    2400    1
  fcae   SEG     16   636b   free     848    0
  fcca   free   822
  Heap/free   24590/ 5612 Total mem  511904
  Memory usage  499KB total,  248KB used,  251KB free
tlvc15#

... which is not too bad since I took down the # of L1 buffers, but then it gets worse fast. It's not that the INT vectors are so bad, they're just an easy target, they're ludicrously small - and they do make a difference.

I've been pondering the idea of pooling or sorting some data-types in order to optimise heap usage. Having spent quite some time looking at heap behaviour as the load varies and the process ids run into the hundreds or maybe thousands, I'm leaning towards having SEG descriptors gravitate to the high end and let everything else be (and keep the INTs out of it). I think that will eliminate more that 50% of all heap fragmentation. When the ends 'meet' we'll have an almost non-functional system anyway.

I'm inclined to increase heap space btw, I haven't looked up where to do that, you would probably know?

--M

ghaerr commented 7 months ago

I'm inclined to increase heap space btw, I haven't looked up where to do that

The near heap's size can be specified using SETUP_HEAPSIZE in config.h, but when left unspecified, which is the default, it defaults to the max available (which is is shown in your meminfo listing as ~24k). The kernel data segment is allocated by default as a DS-addressible 64k segment just past the end of the kernel code segment. It's size on your system indicates around 40k of static data declared within the kernel, leaving 24k for the heap. The largest static users of the data segment are the task array and inode array; the system.map file can be inspected to see where we might eliminate more (like the network buffers, which I see you've just done). Of course, whether statically or dynamically allocated, everything comes out of the kernel data segment, so no real space is saved when the system is being fully utilized.

I'm leaning towards having SEG descriptors gravitate to the high end

Back when MFLD rewrote the main memory allocator and then wrote the near kernel allocator, there was a big discussion as to whether the near heap should hold the references to the main memory allocations. Long story short, the near heap holds each allocated main memory segment descriptor in its own tiny 16-byte SEG allocation in the near heap. Every one of these allocations are effectively just as tiny as the IRQ allocations, and as can be seen definitely fragment the near heap.

Looking through your posted memory layout, I see the initial 10240 allocation for 10 L1 buffers, and the 480 byte allocation of 24 buffer headers @ 20 bytes each for the 24 EXT buffers. The 1024/80 byte allocation is a serial port open, and the 80/80 TTY is probably tty1 or pty. I am guessing the remaining 4 1536-byte BUFH's are network buffers (are you allocating four per open, or is that something different?) (BTW perhaps a new HEAP_TAG_NETWORK in heap.h and meminfo.c modified to show "NETW" or something like that for network-allocated buffers).

That leaves just a single 4790-byte free space and a single 822 bytes free space at the end of heap.

After having read a lot about memory allocations lately, I would say perhaps trying to keep two or three heap arenas, and then allocate by either "long-lived", "small size" or "short-lived/temp" might be a way to go. If we kept it to two arenas, small vs large, we could use the same near heap allocation code extended to use two fixed-size heap headers without worrying much about introducing algorthmic errors to the allocator.

If allocations were sorted just by size, and the IRQ and SEG allocations were kept to a fixed-size 16-bytes in the "small" arena, and everything else in the large arena, that would allow IRQ and SEG allocations to change frequently and not interfere with each other or the larger allocations. The question would then be whether the serial and/or tty/pty open buffers would fragment the possibly continually changing network and/or pipe (=80 bytes) buffers.

Mellvik commented 6 months ago

I'm leaning towards having SEG descriptors gravitate to the high end

Back when MFLD rewrote the main memory allocator and then wrote the near kernel allocator, there was a big discussion as to whether the near heap should hold the references to the main memory allocations. Long story short, the near heap holds each allocated main memory segment descriptor in its own tiny 16-byte SEG allocation in the near heap. Every one of these allocations are effectively just as tiny as the IRQ allocations, and as can be seen definitely fragment the near heap.

I think size is less important than lifespan in this context. If we disregard my special case of starting and stopping the network a lot, the heap dynamics are pipes, seg descriptors and - at least in my case - tty buffers. With the int vectors gone and the seg descriptor allocated from the top, I think we'd have a quite optimal heap. In particular, moving the early (after boot) allocated seg descriptors out of the way would be a great optimization.

(BTW perhaps a new HEAP_TAG_NETWORK in heap.h and meminfo.c modified to show "NETW" or something like that for network-allocated buffers).

Yes, that's useful, I'll add that right away.

After having read a lot about memory allocations lately, I would say perhaps trying to keep two or three heap arenas, and then allocate by either "long-lived", "small size" or "short-lived/temp" might be a way to go. If we kept it to two arenas, small vs large, we could use the same near heap allocation code extended to use two fixed-size heap headers without worrying much about introducing algorthmic errors to the allocator.

If allocations were sorted just by size, and the IRQ and SEG allocations were kept to a fixed-size 16-bytes in the "small" arena, and everything else in the large arena, that would allow IRQ and SEG allocations to change frequently and not interfere with each other or the larger allocations. The question would then be whether the serial and/or tty/pty open buffers would fragment the possibly continually changing network and/or pipe (=80 bytes) buffers.

Great discussion, @ghaerr. I think evidence/experience points toward a small/large distinction, small being < 20 bytes. If those small allocations could get away with a smaller (than 12) descriptor per allocation, that may just be the way to go. Still I'm tempted to keep the INT vectors out using the current scheme because they're long lived compared to (most) seg descriptors. Which makes the sorting very simple: SEG descriptors = high heap, otherwise 'normal'.

Are we on the same page?

ghaerr commented 6 months ago

Yes, good discussion. I don't really know for sure what might be the best way to ensure the most heap memory is available, depending on varying user operations.

Are we on the same page?

Well, certainly from the standpoint that experimentation will be needed. Two big problems remain:

I think size is less important than lifespan in this context.

When talking about lifespan (other than pipes), how does one quantify the expected lifespan of a kernel resource? This would have to be a fixed number that's passed to the heap allocator everywhere its called (or the HEAP_TAG could be used if that were to be found it makes sense).

I think evidence/experience points toward a small/large distinction

SEG descriptors = high heap

Large vs small would be easy, except that there is no way to allocate from "high heap" without completely rewriting the allocator, and experience has shown that to be very tricky. We found a number of bugs over the years in the current two allocators and they were quite subtle. In addition, I don't know how to manage the "middle ground" where they might meet. My suggestion was to use the existing allocator easily extended to use two (fixed size) arenas for large vs small allocations. This could be fairly easily done without an allocator algorithm replacement.

Mellvik commented 6 months ago

Agreed, we cannot quantify lifespan (expected or real) in any meaningful way other than educated guesswork, whis is fine. I was thinking along much simpler lines:

SEG is very dynamic, small and has 'great' fragmentation potential, so forget everyhing else (INT gone) and allocate SEG descriptors from the top
No split, no merge, just move the topOfHeap marker up and down as required, easy reuse of freed slots
It may even allow for smaller heap allocation descriptors (for this particular entry type) although that's beginning to sound complicated.

IOW, not a general mechanism at all, rather the opposite, SEG only.

I haven't looked at the code so I'm admittedly speculating blindly. From what I've seen from the meminfo listings while experimenting these past few days, such a mechanism could have significant potential if a simple implementation is possible.

not mentioned before, but I'm hoping less fragmentation will make larger pipes less painful when memory (heap) gets tight. The current limit makes pipes painfully slow.

ghaerr commented 6 months ago

allocate SEG descriptors from the top

just move the topOfHeap marker up and down as required, easy reuse of freed slots

I'm not sure I'm following you here: the kernel heap doesn't have a "top" or UNIX-style "break". It's a fixed size (based on leftover available memory in the 64K-addressable DS segment), ~24k in your case. There's no way to "allocate from top" which I assume you're meaning "increase or decrease the heap break" like UNIX applications and UNIX C malloc does. The way ELKS kernel near heap works is described above, and uses a best-fit algorithm within the (24k) arena. Or are you saying something different that I'm not getting?

will make larger pipes less painful when memory (heap) gets tight. The current limit makes pipes painfully slow.

The current pipe kernel buffer was set at a very small 80 bytes because IIRC we were almost out of space for larger allocations, and my initial testing didn't show the degradation you're speaking of. But then again I didn't test much with different sizes. For a single pipe between two processes, I didn't think the overhead of switching between two processes was really that big, given all the other slowness of ELKS I/O in the first place. Do you have some examples that show how slow ELKS pipes are? I'd like to test and come up with a faster implementation, and determine an optimal pipe buffer size.

Another thought on pipe buffers would be for the kernel to inspect the heap and see whether a larger pipe buffer size might automatically be allocated when there are sufficient resources... ?

Mellvik commented 6 months ago

allocate SEG descriptors from the top

just move the topOfHeap marker up and down as required, easy reuse of freed slots

I'm not sure I'm following you here: the kernel heap doesn't have a "top" or UNIX-style "break". It's a fixed size (based on leftover available memory in the 64K-addressable DS segment), ~24k in your case. There's no way to "allocate from top" which I assume you're meaning "increase or decrease the heap break" like UNIX applications and UNIX C malloc does. The way ELKS kernel near heap works is described above, and uses a best-fit algorithm within the (24k) arena. Or are you saying something different that I'm not getting?

Indeed - we're on very different pages on this one. Instead of explaining in detail what I mean, I'll implement it (let the sh.. hit the fan so to speak), and we can take it from there.

will make larger pipes less painful when memory (heap) gets tight. The current limit makes pipes painfully slow.

The current pipe kernel buffer was set at a very small 80 bytes because IIRC we were almost out of space for larger allocations, and my initial testing didn't show the degradation you're speaking of. But then again I didn't test much with different sizes. For a single pipe between two processes, I didn't think the overhead of switching between two processes was really that big, given all the other slowness of ELKS I/O in the first place. Do you have some examples that show how slow ELKS pipes are? I'd like to test and come up with a faster implementation, and determine an optimal pipe buffer size.

I've run into this performance problem repeatedly (always involving pipes) and made a note of it w/o investigating. It's like performance decreasing by an order of magnitude. I'll look closer into it as part of this 'heap project'.

Another thought on pipe buffers would be for the kernel to inspect the heap and see whether a larger pipe buffer size might automatically be allocated when there are sufficient resources... ?

I was pondering the same yesterday, only the other way, like attempting to allocate (say) 512b pipes, then make repeated attempts with lower values if it fails. It's probably the right thing to do.