Closed akkartik closed 4 years ago
@charles-l I just spun up this PR for survey.subx. No failing tests yet. Only mildly interesting thing to look at may be the ELF header definitions.
@charles-l I'm still cleaning up #32 so haven't written any tests here yet. If you're looking for something to do, could you create a function here called emit-elf-header
that prints out the Elf_header
string? For now convert
can consist just of emit-elf-header
and then cat
ing stdin to stdout.
Any tests I think of for this seem pretty silly, so let's just test it manually at the commandline.
I'm thinking hard about how to build this phase, now that dquotes is fully merged (yay!)
There seem to be three tasks to perform here (and I'm constantly wondering whether to split them up into multiple phases, but they seem to need some common offset tracking as they read stdin):
Since labels can be defined after they're used, there need to be two passes. The first performs tasks 1 and 2. The second performs task 3.
We need to drop labels at some point. And we can't rewind the input fd at the moment. Perhaps it's cleaner to not add seek
syscalls for now.
Ok, so very high-level pseudocode.
Data structures shared between pass 1 and pass 2.
segments
: table from segment name to (starting address, starting offset, size in bytes)labels
: table from label name to segment offsettmp
: copy of stdin in memoryPass 1
var file-offset = 0, segment-offset = 0
while reading words from stdin
if word is a label
x : (address number) = insert(labels, name)
*x = segment-offset
continue
emit word to tmp
if word == '=='
name = read word from stdin
seg : (address segment) = insert(segments, name)
emit word to tmp
name = read word from stdin
emit name to tmp
addr = read word from stdin
seg->starting-address = parse-hex-int(addr)
seg->starting-offset = file-offset
segment-offset = 0
else
width = compute-width(word)
segment-offset += width
file-offset += width
compute-width
usually returns 1, unless word
has metadata /imm32
, /disp16
, etc.
Pass 2
emit-elf-header(segments)
emit-elf-program-headers(segments)
var file-offset = 0, segment-offset = 0
var curr-seg-offset = 0, curr-seg-address = 0
while reading word from tmp
if (word == "==") skip line
datum = next-token(word, '/')
var value = 0, width = 1
if is-hex-int?(datum)
value = parse-hex-int(datum)
else if has-metadata?("/disp8") or has-metadata?("/disp32")
value = get(labels, datum) - curr-offset
else if has-metadata?("/imm8") or has-metadata?("/imm32")
???
width = compute-width(word)
emit-hex(value, width)
segment-offset += width
file-offset += width
Ugh, this is complicated.
One small piece we can nibble off today is to extract the logic for compute-width
from emit
in pack.subx. Does that make sense? You'd have to move some code into subx-common.subx. Let me know if that seems interesting to you.
The only place in pack.subx
that I see the width of a value being computed is in the switch in the convert-data
function. Is that what you're referring to?
Ah yes, that's the one. Thanks!
I think what may stop me spinning my wheels is a helper like check-ints-equal
called check-int-arrays-equal
that accepts a string, parses it as a list of numbers and compares the other arg with it. That'll help me write clean tests for pass 1 in isolation.
Just an idea to throw out there. I'll probably work on it unless you really want to.
Excellent!
OK -- implemented compute-width
. Not sure if needs more specifiers, though. Only supports {disp,imm}{8,16,32}
right now...
That is the whole set. I'm pretty inconsistent there as well; sometimes I support {disp,imm}{8,32}
and sometimes disp8/16/32
and imm8/32
.
Any preference on what you'd like to do next?
I've been (veeery sloooowly) building out primitives for checking the trace, so that I can write the survey tests like this:
# . check-trace-contains(Trace-stream, "label 'x' is at address 0x1079")
# . check-trace-contains(Trace-stream, "segment 'code' starts at address 0x74")
# . check-trace-contains(Trace-stream, "segment 'code' has size 0x5")
# . check-trace-contains(Trace-stream, "segment 'data' starts at address 0x1079")
That feels more useful than printing out say a series of bytes for the ELF header, and then writing a test that I printed out that same series of bytes.
I just committed the current state of layer 56. There are test failures. But I'll make two sub-PRs of this one like last time, and you can pick one if you like.
Ok, the current state is that two functions in layer 56 need implementing to fix failing tests:
next-line-matches?
: takes a stream of bytes t
and a string line
. Checks if the whole line starting at t->read
(and until the next newline) matches 'line' and returns the result.
skip-next-line
: updates t->read
to the start of the next line (after the next newline).
If you wanna build one of these just let me know. There's a skeleton for them already in the code.
ok, I'll take a look at adding skip-next-line
next.
What editor do you use, @charles-l? Also, what is your preferred mode of receiving messages besides GitHub? :)
What editor do you use, @charles-l?
the only true editor: vim ;)
Also, what is your preferred mode of receiving messages besides GitHub?
Email works -- I had my (broken) website's email in the commit log before, but I've since updated it to my standard email address that I regularly check.
Excellent, and excellent.
Current status:
So now we just need to crank through all the pseudocode and turn it into code.
Excellent!
I'm still seeing one test failure for these functions: test-skip-next-line
I'm still seeing one test failure for these functions:
test-skip-next-line
Whoops -- I inadvertently changed the data for the test when I added next-line-matches?
, but that actually caught a bug in the implementation for skip-next-line
. Fixed now.
Hmm, I'm still seeing failures and also a hang. Can you share what command you're running to run the tests?
oh, I was just running the individual tests:
zsh run_one_test.sh 056trace.subx 'test-next-line-matches?'
zsh run_one_test.sh 056trace.subx 'skip-next-line'
I was assuming the other tests were failing because they haven't been fully implemented yet. But now that I look, there is code for them... I'll take a look at debugging the failing tests (from ./subx translate 0*.subx -o a.elf && ./subx --trace run a.elf test
).
Oh you're right, my mistake in the last comment. I misread and thought test-skip-next-line
was still failing.
My general approach is to run a single test for debugging purposes, but then run all tests (using the command you mentioned) before I commit. Is the change in tests what I expected? Here the hang is mildly surprising. But I take it back, your commits are fine so far. Someone :) just needs to get the remaining tests passing. I'd hoped fixing the helpers would get them working. I'll debug it as well.
(One little trick I use at times like this is to print out test names as they run, by uncommenting out this line in the test harness layer.)
Never mind, scratch that. I'm not fully awake.
Each test in layer 56 passes by itself. But somehow running all the tests in layer 56 hangs:
./subx translate 049memory_layout.subx 05[0-6]*.subx -o a.elf && ./subx run a.elf test
I ran a binary search on the output of:
grep '^test' 056trace.subx |sed 's,\(.*\):,e8/call \1/disp32,'
I started out with this new Entry
block in layer 56:
Entry:
e8/call test-trace-single/disp32
e8/call test-trace-appends/disp32
e8/call test-trace-empty-line/disp32
e8/call test-next-line-matches?/disp32
e8/call test-skip-next-line/disp32
e8/call test-trace-scan-first/disp32
e8/call test-trace-scan-skips-lines-until-found/disp32
e8/call test-trace-second-scan-starts-where-first-left-off/disp32
e8/call test-trace-scan-failure-leaves-read-index-untouched/disp32
# syscall(exit, Num-test-failures)
8b/copy 0/mod/indirect 5/rm32/.disp32 . . 3/r32/EBX Num-test-failures/disp32 # copy
b8/copy-to-EAX 1/imm32/exit
cd/syscall 0x80/imm8
By the end, it turns out just running the last 2 tests causes the hang -- but after they have both passed:
Entry:
e8/call test-trace-second-scan-starts-where-first-left-off/disp32
e8/call test-trace-scan-failure-leaves-read-index-untouched/disp32
# syscall(exit, Num-test-failures)
8b/copy 0/mod/indirect 5/rm32/.disp32 . . 3/r32/EBX Num-test-failures/disp32 # copy
b8/copy-to-EAX 1/imm32/exit
I'll continue debugging it, but feel free to poke at it as well. May be fun.
The trace shows an infinite loop in trace-scan
.
$ subx --debug translate 049memory_layout.subx 05[0-6]*.subx -o a.elf && subx --dump --debug --trace run a.elf test |grep '== label'
...
4 run: == label $trace-scan:loop
5 run: == label next-line-matches?
5 run: == label $next-line-matches?:loop
5 run: == label $next-line-matches?:loop
5 run: == label $next-line-matches?:end
I'll investigate further. May well be a bug in my caller.
Yeah, there's a logic bug in my trace-scan
. Thanks!
Ok, "standard library" is now good. Back to apps/survey.subx
.
There are 6 functions that need implementing:
compute-offsets
compute-addresses
emit-output
emit-headers
emit-elf-header
emit-elf-program-header-entry
I suggest we start on the first two. They are self-contained and have failing tests. emit-output
is kinda persnickety and doesn't seem to admit useful tests in isolation; the real test for getting it right will be running the generated ELF binaries.
We're getting into some complex functions now. But the end is in sight!
I'll start working on compute-offsets
Perfect. I noticed yesterday that the tests for compute-addresses
are not complete. So I'll complete them and then work on them.
These failing tests all check the global trace, but I just realized I don't yet have a good way to write lines containing numbers into the trace. Sorry! Working on that next.
i implemented is label -- as far as I can tell, this is just a check for label definitions, i.e. words that have ':' at the end. Let me know if it needs to do something else as well...
Yeah, that's pretty much all there is to it.
I think there's a couple of existing places I can use this helper. Thanks!
I'll send out the tracing helpers tonight.
Sorry it took a while, but you now have a helper to emit the traces needed in compute-offsets
:
trace-slsls
: takes a _s_tring, s_l_ice, _s_tring, s_l_ice, _s_tring (5 args), and concatenates them into a single line in the trace.Let me know if it makes sense. I'm using a rudimentary manual name mangling to maintain multiple variants of a single overloaded function. It has a test so you can see example usage.
I just fixed an error in the pseudocode for compute-offsets
: https://github.com/akkartik/mu/pull/34/commits/1c576e99d756199142ce9ed98046636b58917243
I'm done with compute-addresses
. Though I'm sure we'll find bugs when we put all the pieces together at the end.
I use is-label?
in one place so far. But I'll resist the temptation to mess with it. We have bigger fish to fry and I don't want to risk merge conflicts.
OK, so it's partially implemented -- the segment block isn't complete yet. Also it has zero testing (i.e. this is just a first pass of the translation without running it), so it's probably totally wrong -- reading through with a fresh set of eyes will probably catch a plethora of issues.
I'm going to take a look at it tomorrow, so I'll let you know if I'm working on the $compute-offsets:segment:
. Once that's done, I'm not really sure how easy it'll be to split up the testing work -- I haven't looked closely at the tests, but if they go through different paths, we can split up work by segment/label offsets or something like that.
I just performed a long-overdue merge with master into this branch, partly to deal with the duplication of next-word
.
If possible, do a git pull
before you do any more work. Let me know if you get stuck with a massive merge conflict. I'll help get you unblocked.
Mostly a note to myself: we should create a get
in addition to get-or-insert
. get-or-insert
should always clear a newly inserted value if it doesn't already exist. get
should abort if it can't find the key.
Most of the calls to get-or-insert
should really be calls to get
. That'll reduce our debugging burden in some situations.
Woohoo!
@charles-l One thing we should try to stay in sync on is how we check booleans: https://github.com/akkartik/mu/commit/7af8f4e8f6f296f2ebdd11c0a46b0a6a55bcf2d9
@charles-l One thing we should try to stay in sync on is how we check booleans: 7af8f4e
Ah, good point. I'll do that from here on out.
I just went over your compute-offsets
, @charles-l. You may be interested in the last batch of commits I just pushed ^^ as a sort of code review. I mention a couple of tricks.
Doing this risks some merge conflicts, but I figured I should take the time to read the code since you took the trouble to git push
it :) As always, I can help with merge conflicts. Feel free to just create your work in a new branch and upload it.
compute-offsets
is still not working, and after some of my changes above it gets stuck in an infinite loop. If you haven't been debugging it already, I can dig deeper into it. Let me know!
survey.subx is almost done! Remaining tasks:
test-compute-offsets
.test-convert-computes-addresses
.🎉 Much appreciation to you, @charles-l.
@akkartik oh, cool, thanks for doing those tweaks. Those simplifications are good -- I'll write code in the future using those functions/conventions :+1:
Got test 2 to pass by fixing the label names. Just have to make the segment stuff work properly and we should be good to go.
BTW thanks for getting rid of those memory bugs too -- they were giving me some trouble.
Hrmm... I think the stack is getting screwed up somehow because the write-trace
call doesn't work if test 5 passes...
I actually intended label offsets to be relative to segment. Segment offsets are a little easier to work with than file offsets to get to the final goal of an absolute address.
Woohoo!
The final bug was something I introduced in 538f24c296f6ddd0688c3d312db08cfe48919cdd :/ No idea what I was thinking.
This branch is work in progress on implementing SubX in SubX. The big-picture plan is to divide up the work into the following phases:
Each phase is an ELF binary constructed out of a corresponding
.subx
file.0/mod 1/rm32 1/r32
is turned into the single byte08
. (done)==
and containing lines of code or data, and concatenates all segments with the same name. (done)This branch is where the survey phase is being built.
To run its tests (and so see how far along things are):
Contact me if you'd like to contribute. Commit access freely given.