akkartik / mu

Soul of a tiny new machine. More thorough tests → More comprehensible and rewrite-friendly software → More resilient society.
http://akkartik.name/akkartik-convivial-20200607.pdf
Other
1.38k stars 47 forks source link

SubX in SubX: Transforming uses of string literals (prerequisite B) #27

Closed akkartik closed 5 years ago

akkartik commented 5 years ago

One of two primitives remaining to be built in dquotes.subx.

For more context, see the parent branch.

akkartik commented 5 years ago

Amazing!

Hmm, it occurs to me that this won't work if a string literal contains /. Need to think about that. Can you merge this PR to the dquotes branch and create a test case for that? There's some tokenization happening there, and we should try to come up with the right primitive to share between tokenizing words and skipping to their metadata.

charles-l commented 5 years ago

Hmm, it occurs to me that this won't work if a string literal contains /. Need to think about that. Can you merge this PR to the dquotes branch and create a test case for that?

Ah, yeah -- good point. I'll do that next.

Also, the code in this branch segfaults when ./apps/dquotes test is executed natively. Didn't have a chance to investigate that yet.

akkartik commented 5 years ago

Thanks for the report! I ran into the same problem in https://github.com/akkartik/mu/pull/26, and I think it's because of an overflow in the data segment. I need to grow it with brk. Will do that tonight.

akkartik commented 5 years ago

Didn't have much luck last night. It seems like data segment overflow because it goes away if I reduce the size of text+data, either by commenting out some random tests, or by reducing say the size of the buffers in subx-common.subx. But it's possible it's just a regular out-of-bounds access that manifests differently in native runs. One idea to falsify the latter idea is to add padding after every single global definition, see if that fixes things. Another would be to replace tests with copies of some hitherto innocuous test.

akkartik commented 5 years ago

Oh, I forgot to say: I tried adding a call to brk but it didn't cause any improvement.

I'd like to try sbrk as well but it's not a syscall, it usually comes from libc.

I'm trying to remind myself of whether the start of the data segment actually makes it to the ELF binary. I think it must since we refer to variables as /imm32, I believe? I need to check these questions.

akkartik commented 5 years ago

The start of the data segment is definitely specified in the ELF binary:

$ readelf -l apps/pack
Elf file type is EXEC (Executable file)
Entry point 0x9002d94
There are 2 program headers, starting at offset 52

Program Headers:
  Type           Offset     VirtAddr     PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000074   0x09000074   0x09000074 0x05de7 0x05de7 R E 0x1000
  LOAD           0x005e5b **0x0a005e5b** 0x0a005e5b 0x02f81 0x02f81 RW  0x1000

So my brk syscall really should work:

--- a/subx/apps/pack.subx
+++ b/subx/apps/pack.subx
@@ -42,6 +42,10 @@ Entry:  # run tests if necessary, convert stdin if not
     # . check result
     3d/compare-EAX-and  1/imm32
     75/jump-if-not-equal  $run-main/disp8
+    # . syscall(brk, 0x0affffff)
+    bb/copy-to-EBX  0x0affffff/imm32
+    b8/copy-to-EAX  0x2d/imm32/brk
+    cd/syscall  0x80/imm8
     # . run-tests()
     e8/call  run-tests/disp32
     8b/copy                         0/mod/indirect  5/rm32/.disp32            .             .           3/r32/EBX   Num-test-failures/disp32          # copy *Num-test-failures to EBX

I think I need to check the return value from brk.

akkartik commented 5 years ago

Output of strace:

$ strace ./apps/pack test
execve("./apps/pack", ["./apps/pack", "test"], 0x7ffe3f17a028 /* 82 vars */) = 0
strace: [ Process PID=20286 runs in 32 bit mode. ]
brk(0xaffffff)                          = 0xaffffff
write(2, ".", 1.)                        = 1
...
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xa0090dc} ---
+++ killed by SIGSEGV +++

So the brk seems to have succeeded. As a bonus, strace seems to be telling me the address that couldn't be dereferenced, and it's far lower than the end of the data segment.

A segmentation violation with si_code SEGV_MAPERR (0x1) is likely a null pointer dereference, an access of non-existent memory such as 0xfffffc0000004000, or malloc and free problems. Heap corruption or process exceeding its runtime limits (man getrlimit) in the case of malloc and double free or free of non-allocated address in the case of free. Look at the si_errno element for more clues.

(https://unix.stackexchange.com/a/71268)

akkartik commented 5 years ago

I'm still stumped by the segfault in the native run, and it's unfortunately kept me from working on anything else.

I'm showing others this example program in C: https://mastodon.social/@akkartik/102056044040765522

charles-l commented 5 years ago

Wow, that is weird. That's the opposite of what I understand the heap and sbrk is supposed to do...

charles-l commented 5 years ago
...
0x602000 0x622ffc
0x602000 0x622ffd
0x602000 0x622ffe
0x602000 0x622fff
0x602000 0x623000

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400657 in main ()
(gdb) info proc mappings
process 25010
Mapped address spaces:

          Start Addr           End Addr       Size     Offset objfile
            0x400000           0x401000     0x1000        0x0 /tmp/a.out
            0x600000           0x601000     0x1000        0x0 /tmp/a.out
            0x601000           0x602000     0x1000     0x1000 /tmp/a.out
            0x602000           0x623000    0x21000        0x0 [heap]

If I understand this output correctly, it looks like that program works on my machine...

akkartik commented 5 years ago

Indeed. Thanks for the info proc mappings trick.

charles-l commented 5 years ago

In case it helps with tracking down the issue, here's my setup:

> uname -a
Linux gimli 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux
> cc --version
clang version 3.8.1-24 (tags/RELEASE_381/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
> lsb_release -a                                                                                                                          
No LSB modules are available.
Distributor ID: Devuan
Description:    Devuan GNU/Linux 2.0 (ascii)
Release:    2.0
Codename:   ascii
akkartik commented 5 years ago

Can you tell me more about the machine it works on?

akkartik commented 5 years ago

Ah, I see your comment.

akkartik commented 5 years ago

clang is maybe the big difference..

charles-l commented 5 years ago

oh wait, I think it's ASLR.

Try disabling that and running it:

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
charles-l commented 5 years ago

heap/stack diagram From: https://blog.holbertonschool.com/hack-the-virtual-memory-malloc-the-heap-the-program-break/

akkartik commented 5 years ago

Whoa! I thought of ASLR a dozen times but didn't notice the brk offset in that picture.

How does it even work. All you can do in brk is provide an upper bound to grow the heap until. How do you even get the lower bound?! Hmm.

akkartik commented 5 years ago

I've seen docs like https://pax.grsecurity.net/docs/aslr.txt say the heap directly follows the text segment.

akkartik commented 5 years ago

I guess we should just use mmap. That's probably what malloc does under the hood, judging by this random doc: https://gist.github.com/CMCDragonkai/10ab53654b2aa6ce55c11cfc5b2432a4

akkartik commented 5 years ago

But brk is probably usable somehow in the presence of ASLR. I'm missing something.

difranco commented 5 years ago

I'm pretty sure mmap is a strict generalization of sbrk on modern systems. Read about it in the documentation of some artisanal alternative malloc or maybe garbage collector… will try to find it.

On Tue, May 7, 2019, 20:05 Kartik Agaram notifications@github.com wrote:

But brk is probably usable somehow in the presence of ASLR. I'm missing something.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/akkartik/mu/pull/27#issuecomment-490329441, or mute the thread https://github.com/notifications/unsubscribe-auth/AABLB3JEYC7XJD6ZFUHJS7TPUI7QZANCNFSM4HJFKNHQ .

akkartik commented 5 years ago

I don't think that's true. You can't grow an mmapd region.

akkartik commented 5 years ago

I can't find a good authoritative source, but from reading the internet it seems like modern malloc implementations use mmap under the hood, and probably manage non-contiguous chunks received from the OS. So growing a region isn't essential. So you were mostly right, @difranco.

difranco commented 5 years ago

I'm still not sure that this is what I had in mind, but seems to contain the relevant information: (s)brk vs mmap differ mainly in potential for address space fragmentation since former extends contiguous address space bound while latter gives pointer to region at arbitrary location: https://locklessinc.com/technical_allocator.shtml

On Wed, May 8, 2019, 09:26 Kartik Agaram notifications@github.com wrote:

I can't find a good authoritative source, but from reading the internet it seems like modern malloc implementations use mmap under the hood, and probably manage non-contiguous chunks received from the OS. So growing a region isn't essential. So you were mostly right, @difranco https://github.com/difranco.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/akkartik/mu/pull/27#issuecomment-490555905, or mute the thread https://github.com/notifications/unsubscribe-auth/AABLB3MHCB4ND2P2A7UD2ALPUL5LXANCNFSM4HJFKNHQ .

charles-l commented 5 years ago

Hmm... don’t you just assume the heap size is zero with the initial sbrk and grow it from there (even if there is some heap space going down)?

That’s what it’s seemed like to me from what I’ve seen online, but there’s a surprising lack of documentation on the relationship between ASLR and brk...

akkartik commented 5 years ago

The problem is that sbrk isn't a syscall. It's provided by libc. Someday I'll check the libc sources to figure out how it's implemented in the presence of ASLR. But that seems low-priority since malloc likely doesn't need it.

SubX already provides a wrapper around mmap called new-segment (layer 53). My plan is to modify it to return an allocation descriptor (layer 69) that allocate can directly use. Hold on, let me throw out the half-done changes I have so far.

akkartik commented 5 years ago

What's been slowing me down for the last couple of days: https://github.com/akkartik/mu/issues/29

akkartik commented 5 years ago

Ok, allocate now builds on new-segment (#28). Now checking if this branch runs natively, @charles-l.

akkartik commented 5 years ago

In the process I ran into another bug: #30. I found it easier this time around to just make an issue and move on, without scope-creeping myself into paralysis for the umpteenth time.

akkartik commented 5 years ago

Hmm, apps/assort is failing tests natively on Travis even though it passes on my Linux server. No segfault, though, so some progress..

https://travis-ci.org/akkartik/mu/jobs/530988661

akkartik commented 5 years ago

Turns out the culprit is #30! mmap sometimes returns addresses with the most significant bit set, and SubX currently only supports signed comparison. As a result my dependency-injected wrapper for the write syscall thinks the stream argument is a file descriptor, and chaos ensues.

I've started aborting on failed writes to make issues like this easier to catch.

But yeah, #30. It's going to take some time to fix because I have to go over all supported instructions and start doing right by the carry flag. I'll open a new branch for it. Let me know if you'd be interested in contributing to it.

akkartik commented 5 years ago

I've created PR #31. Progress so far: https://github.com/akkartik/mu/pull/31/commits/8c1a69089bcb0be8ee869a3f83c3151a7083e27c

akkartik commented 5 years ago

The segfault should be fixed! Do a git pull and try it out.

akkartik commented 5 years ago

The branch of #26 still throws an error. I think moving the heap around has uncovered an older bug. Now top of my list.

akkartik commented 5 years ago

26 is now unblocked: https://github.com/akkartik/mu/pull/26/commits/2f49a27504fe302c1df8e55f7a1b63571495182c

Phew! Back to building out dquotes.subx.

charles-l commented 5 years ago

Whoo! I’ll check it out when I get back from work.

charles-l commented 5 years ago

hmm... the dquotes branch is still segfaulting for me... might be because the tests don't pass yet...

akkartik commented 5 years ago

Let me check. There's been a lot of merging downstream. But dquotes-1 should work?

akkartik commented 5 years ago

Sorry, yes you're right. I fixed one bug in the last couple of days on dquotes-1 which hasn't yet made it to dquotes. I didn't notice because I forgot with all the bouncing between branches that test_apps doesn't actually test dquotes.subx. Since master isn't affected, I'd like to avoid cherry-picking into an upstream branch, and wait until I wrap up #26 before merge everything back into dquotes in one fell swoop.

charles-l commented 5 years ago

Ah, gotcha. Yeah, looks like dquotes-1 is working now.

akkartik commented 5 years ago

I've been starting a new thread over at https://github.com/akkartik/mu/commit/d4a2442688#comments, in case you weren't notified for some reason.

charles-l commented 5 years ago

Oh, yeah, I didn't seem to get notifications for that. Thanks for letting me know -- I'll take a look.