SubX in SubX: computing addresses for labels

akkartik commented 5 years ago

This branch is work in progress on implementing SubX in SubX. The big-picture plan is to divide up the work into the following phases:

$ cat file1.subx file2.subx ... |dquotes |assort |pack |survey |hex > a.elf

Each phase is an ELF binary constructed out of a corresponding .subx file.

hex.subx converts a sequence of bytes in hex into a binary. (done)
survey.subx will turn labels into addresses, and also slap on ELF headers. It's called 'survey' because the act of imposing a coordinate system on a program feels somewhat like surveying terrain and determining various distances and heights. (in progress here)
pack.subx converts bitfields in instructions into whole bytes in the right order. For example, 0/mod 1/rm32 1/r32 is turned into the single byte 08. (done)
assort.subx just reads in many segments starting with == and containing lines of code or data, and concatenates all segments with the same name. (done)
dquotes.subx moves string literals in the code segment to the data segment. (done)

This branch is where the survey phase is being built.

To run its tests (and so see how far along things are):

$ ./subx translate 0*.subx apps/subx-common.subx apps/survey.subx -o apps/survey  &&  \
      ./subx run apps/survey test

Contact me if you'd like to contribute. Commit access freely given.

akkartik commented 5 years ago

@charles-l I just spun up this PR for survey.subx. No failing tests yet. Only mildly interesting thing to look at may be the ELF header definitions.

akkartik commented 5 years ago

@charles-l I'm still cleaning up #32 so haven't written any tests here yet. If you're looking for something to do, could you create a function here called emit-elf-header that prints out the Elf_header string? For now convert can consist just of emit-elf-header and then cating stdin to stdout.

Any tests I think of for this seem pretty silly, so let's just test it manually at the commandline.

akkartik commented 5 years ago

I'm thinking hard about how to build this phase, now that dquotes is fully merged (yay!)

There seem to be three tasks to perform here (and I'm constantly wondering whether to split them up into multiple phases, but they seem to need some common offset tracking as they read stdin):

Assign a start address to each segment. 1a. User programs provide an approximate start address for segments. But we have to perturb it slightly so the offset at which a segment starts in the ELF binary has the same alignment as the start address for the segment when loading to memory and running.
Assign a couple of offsets to each label (whether in code or in data): 2a. Global file offset. 2b. Local segment offset.
Replace labels with addresses or displacements. If you know the starting address and segment offset for a label, you can compute its address.

Since labels can be defined after they're used, there need to be two passes. The first performs tasks 1 and 2. The second performs task 3.

We need to drop labels at some point. And we can't rewind the input fd at the moment. Perhaps it's cleaner to not add seek syscalls for now.

Ok, so very high-level pseudocode.

Data structures shared between pass 1 and pass 2.

segments: table from segment name to (starting address, starting offset, size in bytes)
labels: table from label name to segment offset
tmp: copy of stdin in memory

Pass 1

var file-offset = 0, segment-offset = 0
while reading words from stdin
  if word is a label
    x : (address number) = insert(labels, name)
    *x = segment-offset
    continue
  emit word to tmp
  if word == '=='
    name = read word from stdin
    seg : (address segment) = insert(segments, name)
    emit word to tmp
    name = read word from stdin
    emit name to tmp
    addr = read word from stdin
    seg->starting-address = parse-hex-int(addr)
    seg->starting-offset = file-offset
    segment-offset = 0
  else
    width = compute-width(word)
    segment-offset += width
    file-offset += width

compute-width usually returns 1, unless word has metadata /imm32, /disp16, etc.

Pass 2

emit-elf-header(segments)
emit-elf-program-headers(segments)
var file-offset = 0, segment-offset = 0
var curr-seg-offset = 0, curr-seg-address = 0
while reading word from tmp
  if (word == "==") skip line
  datum = next-token(word, '/')
  var value = 0, width = 1
  if is-hex-int?(datum)
    value = parse-hex-int(datum)
  else if has-metadata?("/disp8") or has-metadata?("/disp32")
    value = get(labels, datum) - curr-offset
  else if has-metadata?("/imm8") or has-metadata?("/imm32")
    ???
  width = compute-width(word)
  emit-hex(value, width)
  segment-offset += width
  file-offset += width

Ugh, this is complicated.

akkartik commented 5 years ago

One small piece we can nibble off today is to extract the logic for compute-width from emit in pack.subx. Does that make sense? You'd have to move some code into subx-common.subx. Let me know if that seems interesting to you.

charles-l commented 5 years ago

The only place in pack.subx that I see the width of a value being computed is in the switch in the convert-data function. Is that what you're referring to?

akkartik commented 5 years ago

Ah yes, that's the one. Thanks!

akkartik commented 5 years ago

I think what may stop me spinning my wheels is a helper like check-ints-equal called check-int-arrays-equal that accepts a string, parses it as a list of numbers and compares the other arg with it. That'll help me write clean tests for pass 1 in isolation.

Just an idea to throw out there. I'll probably work on it unless you really want to.

akkartik commented 5 years ago

Excellent!

charles-l commented 5 years ago

OK -- implemented compute-width. Not sure if needs more specifiers, though. Only supports {disp,imm}{8,16,32} right now...

akkartik commented 5 years ago

That is the whole set. I'm pretty inconsistent there as well; sometimes I support {disp,imm}{8,32} and sometimes disp8/16/32 and imm8/32.

akkartik commented 5 years ago

Any preference on what you'd like to do next?

I've been (veeery sloooowly) building out primitives for checking the trace, so that I can write the survey tests like this:

# . check-trace-contains(Trace-stream, "label 'x' is at address 0x1079")
# . check-trace-contains(Trace-stream, "segment 'code' starts at address 0x74")
# . check-trace-contains(Trace-stream, "segment 'code' has size 0x5")
# . check-trace-contains(Trace-stream, "segment 'data' starts at address 0x1079")

That feels more useful than printing out say a series of bytes for the ELF header, and then writing a test that I printed out that same series of bytes.

More details on the trace.

akkartik commented 5 years ago

I just committed the current state of layer 56. There are test failures. But I'll make two sub-PRs of this one like last time, and you can pick one if you like.

akkartik commented 5 years ago

Ok, the current state is that two functions in layer 56 need implementing to fix failing tests:

next-line-matches?: takes a stream of bytes t and a string line. Checks if the whole line starting at t->read (and until the next newline) matches 'line' and returns the result.
skip-next-line: updates t->read to the start of the next line (after the next newline).

If you wanna build one of these just let me know. There's a skeleton for them already in the code.

charles-l commented 5 years ago

ok, I'll take a look at adding skip-next-line next.

akkartik commented 5 years ago

What editor do you use, @charles-l? Also, what is your preferred mode of receiving messages besides GitHub? :)

charles-l commented 5 years ago

What editor do you use, @charles-l?

the only true editor: vim ;)

Also, what is your preferred mode of receiving messages besides GitHub?

Email works -- I had my (broken) website's email in the commit log before, but I've since updated it to my standard email address that I regularly check.

akkartik commented 5 years ago

Excellent, and excellent.

akkartik commented 5 years ago

Current status:

Initial set of tests is written. We may need more, but we may also not.
All functions have detailed pseudocode.

So now we just need to crank through all the pseudocode and turn it into code.

akkartik commented 5 years ago

Excellent!

I'm still seeing one test failure for these functions: test-skip-next-line

charles-l commented 5 years ago

I'm still seeing one test failure for these functions: test-skip-next-line

Whoops -- I inadvertently changed the data for the test when I added next-line-matches?, but that actually caught a bug in the implementation for skip-next-line. Fixed now.

akkartik commented 5 years ago

Hmm, I'm still seeing failures and also a hang. Can you share what command you're running to run the tests?

charles-l commented 5 years ago

oh, I was just running the individual tests:

zsh run_one_test.sh 056trace.subx 'test-next-line-matches?' 
zsh run_one_test.sh 056trace.subx 'skip-next-line'

I was assuming the other tests were failing because they haven't been fully implemented yet. But now that I look, there is code for them... I'll take a look at debugging the failing tests (from ./subx translate 0*.subx -o a.elf && ./subx --trace run a.elf test).

akkartik commented 5 years ago

Oh you're right, my mistake in the last comment. I misread and thought test-skip-next-line was still failing.

My general approach is to run a single test for debugging purposes, but then run all tests (using the command you mentioned) before I commit. Is the change in tests what I expected? Here the hang is mildly surprising. But I take it back, your commits are fine so far. Someone :) just needs to get the remaining tests passing. I'd hoped fixing the helpers would get them working. I'll debug it as well.

akkartik commented 5 years ago

(One little trick I use at times like this is to print out test names as they run, by uncommenting out this line in the test harness layer.)

akkartik commented 5 years ago

Never mind, scratch that. I'm not fully awake.

Each test in layer 56 passes by itself. But somehow running all the tests in layer 56 hangs:

./subx translate 049memory_layout.subx 05[0-6]*.subx -o a.elf && ./subx run a.elf test

akkartik commented 5 years ago

I ran a binary search on the output of:

grep '^test' 056trace.subx |sed 's,\(.*\):,e8/call \1/disp32,'

I started out with this new Entry block in layer 56:

Entry:
    e8/call test-trace-single/disp32
    e8/call test-trace-appends/disp32
    e8/call test-trace-empty-line/disp32
    e8/call test-next-line-matches?/disp32
    e8/call test-skip-next-line/disp32
    e8/call test-trace-scan-first/disp32
    e8/call test-trace-scan-skips-lines-until-found/disp32
    e8/call test-trace-second-scan-starts-where-first-left-off/disp32
    e8/call test-trace-scan-failure-leaves-read-index-untouched/disp32
    # syscall(exit, Num-test-failures)
    8b/copy                         0/mod/indirect  5/rm32/.disp32            .             .           3/r32/EBX   Num-test-failures/disp32          # copy
    b8/copy-to-EAX  1/imm32/exit
    cd/syscall  0x80/imm8

By the end, it turns out just running the last 2 tests causes the hang -- but after they have both passed:

Entry:
    e8/call test-trace-second-scan-starts-where-first-left-off/disp32
    e8/call test-trace-scan-failure-leaves-read-index-untouched/disp32
    # syscall(exit, Num-test-failures)
    8b/copy                         0/mod/indirect  5/rm32/.disp32            .             .           3/r32/EBX   Num-test-failures/disp32          # copy
    b8/copy-to-EAX  1/imm32/exit

I'll continue debugging it, but feel free to poke at it as well. May be fun.

akkartik commented 5 years ago

The trace shows an infinite loop in trace-scan.

$ subx --debug translate 049memory_layout.subx 05[0-6]*.subx -o a.elf  &&  subx --dump --debug --trace run a.elf test |grep '== label'
...
   4 run: == label $trace-scan:loop
   5 run: == label next-line-matches?
   5 run: == label $next-line-matches?:loop
   5 run: == label $next-line-matches?:loop
   5 run: == label $next-line-matches?:end

I'll investigate further. May well be a bug in my caller.

akkartik commented 5 years ago

Yeah, there's a logic bug in my trace-scan. Thanks!

akkartik commented 5 years ago

Ok, "standard library" is now good. Back to apps/survey.subx.

There are 6 functions that need implementing:

compute-offsets
compute-addresses
emit-output
- emit-headers
- emit-elf-header
- emit-elf-program-header-entry

I suggest we start on the first two. They are self-contained and have failing tests. emit-output is kinda persnickety and doesn't seem to admit useful tests in isolation; the real test for getting it right will be running the generated ELF binaries.

We're getting into some complex functions now. But the end is in sight!

charles-l commented 5 years ago

I'll start working on compute-offsets

akkartik commented 5 years ago

Perfect. I noticed yesterday that the tests for compute-addresses are not complete. So I'll complete them and then work on them.

akkartik commented 4 years ago

These failing tests all check the global trace, but I just realized I don't yet have a good way to write lines containing numbers into the trace. Sorry! Working on that next.

charles-l commented 4 years ago

i implemented is label -- as far as I can tell, this is just a check for label definitions, i.e. words that have ':' at the end. Let me know if it needs to do something else as well...

akkartik commented 4 years ago

Yeah, that's pretty much all there is to it.

I think there's a couple of existing places I can use this helper. Thanks!

I'll send out the tracing helpers tonight.

akkartik commented 4 years ago

Sorry it took a while, but you now have a helper to emit the traces needed in compute-offsets:

trace-slsls: takes a _s_tring, s_l_ice, _s_tring, s_l_ice, _s_tring (5 args), and concatenates them into a single line in the trace.

Let me know if it makes sense. I'm using a rudimentary manual name mangling to maintain multiple variants of a single overloaded function. It has a test so you can see example usage.

akkartik commented 4 years ago

I just fixed an error in the pseudocode for compute-offsets: https://github.com/akkartik/mu/pull/34/commits/1c576e99d756199142ce9ed98046636b58917243

akkartik commented 4 years ago

I'm done with compute-addresses. Though I'm sure we'll find bugs when we put all the pieces together at the end.

akkartik commented 4 years ago

I use is-label? in one place so far. But I'll resist the temptation to mess with it. We have bigger fish to fry and I don't want to risk merge conflicts.

charles-l commented 4 years ago

OK, so it's partially implemented -- the segment block isn't complete yet. Also it has zero testing (i.e. this is just a first pass of the translation without running it), so it's probably totally wrong -- reading through with a fresh set of eyes will probably catch a plethora of issues.

I'm going to take a look at it tomorrow, so I'll let you know if I'm working on the $compute-offsets:segment:. Once that's done, I'm not really sure how easy it'll be to split up the testing work -- I haven't looked closely at the tests, but if they go through different paths, we can split up work by segment/label offsets or something like that.

akkartik commented 4 years ago

I just performed a long-overdue merge with master into this branch, partly to deal with the duplication of next-word.

If possible, do a git pull before you do any more work. Let me know if you get stuck with a massive merge conflict. I'll help get you unblocked.

akkartik commented 4 years ago

Mostly a note to myself: we should create a get in addition to get-or-insert. get-or-insert should always clear a newly inserted value if it doesn't already exist. get should abort if it can't find the key.

Most of the calls to get-or-insert should really be calls to get. That'll reduce our debugging burden in some situations.

akkartik commented 4 years ago

Woohoo!

akkartik commented 4 years ago

@charles-l One thing we should try to stay in sync on is how we check booleans: https://github.com/akkartik/mu/commit/7af8f4e8f6f296f2ebdd11c0a46b0a6a55bcf2d9

charles-l commented 4 years ago

@charles-l One thing we should try to stay in sync on is how we check booleans: 7af8f4e

Ah, good point. I'll do that from here on out.

akkartik commented 4 years ago

I just went over your compute-offsets, @charles-l. You may be interested in the last batch of commits I just pushed ^^ as a sort of code review. I mention a couple of tricks.

Doing this risks some merge conflicts, but I figured I should take the time to read the code since you took the trouble to git push it :) As always, I can help with merge conflicts. Feel free to just create your work in a new branch and upload it.

compute-offsets is still not working, and after some of my changes above it gets stuck in an infinite loop. If you haven't been debugging it already, I can dig deeper into it. Let me know!

akkartik commented 4 years ago

survey.subx is almost done! Remaining tasks:

debug the infinite loop in test-compute-offsets.
final integration testing of test-convert-computes-addresses.

🎉 Much appreciation to you, @charles-l.

charles-l commented 4 years ago

@akkartik oh, cool, thanks for doing those tweaks. Those simplifications are good -- I'll write code in the future using those functions/conventions :+1:

Got test 2 to pass by fixing the label names. Just have to make the segment stuff work properly and we should be good to go.

BTW thanks for getting rid of those memory bugs too -- they were giving me some trouble.

charles-l commented 4 years ago

Hrmm... I think the stack is getting screwed up somehow because the write-trace call doesn't work if test 5 passes...

akkartik commented 4 years ago

I actually intended label offsets to be relative to segment. Segment offsets are a little easier to work with than file offsets to get to the final goal of an absolute address.

akkartik commented 4 years ago

Woohoo!

The final bug was something I introduced in 538f24c296f6ddd0688c3d312db08cfe48919cdd :/ No idea what I was thinking.

akkartik / mu

SubX in SubX: computing addresses for labels #34