akkartik / mu

Soul of a tiny new machine. More thorough tests → More comprehensible and rewrite-friendly software → More resilient society.
http://akkartik.name/akkartik-convivial-20200607.pdf
Other
1.38k stars 47 forks source link

Redo how SubX lays out code/data segments in memory #29

Closed akkartik closed 5 years ago

akkartik commented 5 years ago

Background

Right now SubX has two ways to determine the address of a code/data segment.

== 0x0a0001000
...
== code
...

Either way, the first segment defined must be code and contain instructions.

This approach has long seemed a mess. It's hard to explain, and there's a certain amount of historical evolution that led to this mess. I started out expecting programs to specify addresses because that made writing tests easier, and later found use in naming segments so that I could keep code and related data close together. However I couldn't easily retire the old approach because so many tests relied on specific instructions being at specific addresses.

The named segments approach 'works' for a simple ELF binary with two code/data segments, but hasn't really been used with more than two segments even though it seems to support it. As I revise my mental model of how the heap works in the presence of ASLR, this mess has been slowing me down.

New plan

Programs always specify the starting address of segments. But they only have to do so once, the first time the segment is encountered.

== code 0x0a0001000
...
== code
...

This is a single uniform syntax that should help write small tests and also deal with ASLR. Code segments no longer have to come first. The name code is special.

Migration

I've been reluctant to make changes here because we have to change the C++ version and also change how we bootstrap SubX in SubX. But I think it's doable if we first start with SubX in SubX. Currently assort.subx and dquotes.subx fail tests if I add starting addresses to segment headers. First task is to make them pass. In particular, assort.subx will require some changes to the table data structure.

Now that I've made this plan, it looks like it's not needed for building allocate out of mmap. So I'm going to set it aside and hopefully deal with that issue first.