Closed akkartik closed 5 years ago
Amazing!
Hmm, it occurs to me that this won't work if a string literal contains /
. Need to think about that. Can you merge this PR to the dquotes
branch and create a test case for that? There's some tokenization happening there, and we should try to come up with the right primitive to share between tokenizing words and skipping to their metadata.
Hmm, it occurs to me that this won't work if a string literal contains
/
. Need to think about that. Can you merge this PR to the dquotes branch and create a test case for that?
Ah, yeah -- good point. I'll do that next.
Also, the code in this branch segfaults when ./apps/dquotes test
is executed natively. Didn't have a chance to investigate that yet.
Thanks for the report! I ran into the same problem in https://github.com/akkartik/mu/pull/26, and I think it's because of an overflow in the data segment. I need to grow it with brk
. Will do that tonight.
Didn't have much luck last night. It seems like data segment overflow because it goes away if I reduce the size of text+data, either by commenting out some random tests, or by reducing say the size of the buffers in subx-common.subx. But it's possible it's just a regular out-of-bounds access that manifests differently in native runs. One idea to falsify the latter idea is to add padding after every single global definition, see if that fixes things. Another would be to replace tests with copies of some hitherto innocuous test.
Oh, I forgot to say: I tried adding a call to brk
but it didn't cause any improvement.
I'd like to try sbrk
as well but it's not a syscall, it usually comes from libc.
I'm trying to remind myself of whether the start of the data segment actually makes it to the ELF binary. I think it must since we refer to variables as /imm32
, I believe? I need to check these questions.
The start of the data segment is definitely specified in the ELF binary:
$ readelf -l apps/pack
Elf file type is EXEC (Executable file)
Entry point 0x9002d94
There are 2 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000074 0x09000074 0x09000074 0x05de7 0x05de7 R E 0x1000
LOAD 0x005e5b **0x0a005e5b** 0x0a005e5b 0x02f81 0x02f81 RW 0x1000
So my brk
syscall really should work:
--- a/subx/apps/pack.subx
+++ b/subx/apps/pack.subx
@@ -42,6 +42,10 @@ Entry: # run tests if necessary, convert stdin if not
# . check result
3d/compare-EAX-and 1/imm32
75/jump-if-not-equal $run-main/disp8
+ # . syscall(brk, 0x0affffff)
+ bb/copy-to-EBX 0x0affffff/imm32
+ b8/copy-to-EAX 0x2d/imm32/brk
+ cd/syscall 0x80/imm8
# . run-tests()
e8/call run-tests/disp32
8b/copy 0/mod/indirect 5/rm32/.disp32 . . 3/r32/EBX Num-test-failures/disp32 # copy *Num-test-failures to EBX
I think I need to check the return value from brk
.
Output of strace:
$ strace ./apps/pack test
execve("./apps/pack", ["./apps/pack", "test"], 0x7ffe3f17a028 /* 82 vars */) = 0
strace: [ Process PID=20286 runs in 32 bit mode. ]
brk(0xaffffff) = 0xaffffff
write(2, ".", 1.) = 1
...
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xa0090dc} ---
+++ killed by SIGSEGV +++
So the brk
seems to have succeeded. As a bonus, strace
seems to be telling me the address that couldn't be dereferenced, and it's far lower than the end of the data segment.
A segmentation violation with si_code SEGV_MAPERR (0x1) is likely a null pointer dereference, an access of non-existent memory such as 0xfffffc0000004000, or malloc and free problems. Heap corruption or process exceeding its runtime limits (man getrlimit) in the case of malloc and double free or free of non-allocated address in the case of free. Look at the si_errno element for more clues.
I'm still stumped by the segfault in the native run, and it's unfortunately kept me from working on anything else.
I'm showing others this example program in C: https://mastodon.social/@akkartik/102056044040765522
Wow, that is weird. That's the opposite of what I understand the heap and sbrk
is supposed to do...
...
0x602000 0x622ffc
0x602000 0x622ffd
0x602000 0x622ffe
0x602000 0x622fff
0x602000 0x623000
Program received signal SIGSEGV, Segmentation fault.
0x0000000000400657 in main ()
(gdb) info proc mappings
process 25010
Mapped address spaces:
Start Addr End Addr Size Offset objfile
0x400000 0x401000 0x1000 0x0 /tmp/a.out
0x600000 0x601000 0x1000 0x0 /tmp/a.out
0x601000 0x602000 0x1000 0x1000 /tmp/a.out
0x602000 0x623000 0x21000 0x0 [heap]
If I understand this output correctly, it looks like that program works on my machine...
Indeed. Thanks for the info proc mappings
trick.
In case it helps with tracking down the issue, here's my setup:
> uname -a
Linux gimli 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux
> cc --version
clang version 3.8.1-24 (tags/RELEASE_381/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
> lsb_release -a
No LSB modules are available.
Distributor ID: Devuan
Description: Devuan GNU/Linux 2.0 (ascii)
Release: 2.0
Codename: ascii
Can you tell me more about the machine it works on?
Ah, I see your comment.
clang is maybe the big difference..
oh wait, I think it's ASLR.
Try disabling that and running it:
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
Whoa! I thought of ASLR a dozen times but didn't notice the brk offset
in that picture.
How does it even work. All you can do in brk
is provide an upper bound to grow the heap until. How do you even get the lower bound?! Hmm.
I've seen docs like https://pax.grsecurity.net/docs/aslr.txt say the heap directly follows the text segment.
I guess we should just use mmap
. That's probably what malloc
does under the hood, judging by this random doc: https://gist.github.com/CMCDragonkai/10ab53654b2aa6ce55c11cfc5b2432a4
But brk
is probably usable somehow in the presence of ASLR. I'm missing something.
I'm pretty sure mmap is a strict generalization of sbrk on modern systems. Read about it in the documentation of some artisanal alternative malloc or maybe garbage collector… will try to find it.
On Tue, May 7, 2019, 20:05 Kartik Agaram notifications@github.com wrote:
But brk is probably usable somehow in the presence of ASLR. I'm missing something.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/akkartik/mu/pull/27#issuecomment-490329441, or mute the thread https://github.com/notifications/unsubscribe-auth/AABLB3JEYC7XJD6ZFUHJS7TPUI7QZANCNFSM4HJFKNHQ .
I don't think that's true. You can't grow an mmap
d region.
I can't find a good authoritative source, but from reading the internet it seems like modern malloc
implementations use mmap
under the hood, and probably manage non-contiguous chunks received from the OS. So growing a region isn't essential. So you were mostly right, @difranco.
I'm still not sure that this is what I had in mind, but seems to contain the relevant information: (s)brk vs mmap differ mainly in potential for address space fragmentation since former extends contiguous address space bound while latter gives pointer to region at arbitrary location: https://locklessinc.com/technical_allocator.shtml
On Wed, May 8, 2019, 09:26 Kartik Agaram notifications@github.com wrote:
I can't find a good authoritative source, but from reading the internet it seems like modern malloc implementations use mmap under the hood, and probably manage non-contiguous chunks received from the OS. So growing a region isn't essential. So you were mostly right, @difranco https://github.com/difranco.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/akkartik/mu/pull/27#issuecomment-490555905, or mute the thread https://github.com/notifications/unsubscribe-auth/AABLB3MHCB4ND2P2A7UD2ALPUL5LXANCNFSM4HJFKNHQ .
Hmm... don’t you just assume the heap size is zero with the initial sbrk and grow it from there (even if there is some heap space going down)?
That’s what it’s seemed like to me from what I’ve seen online, but there’s a surprising lack of documentation on the relationship between ASLR and brk...
The problem is that sbrk
isn't a syscall. It's provided by libc. Someday I'll check the libc sources to figure out how it's implemented in the presence of ASLR. But that seems low-priority since malloc
likely doesn't need it.
SubX already provides a wrapper around mmap
called new-segment
(layer 53). My plan is to modify it to return an allocation descriptor (layer 69) that allocate
can directly use. Hold on, let me throw out the half-done changes I have so far.
What's been slowing me down for the last couple of days: https://github.com/akkartik/mu/issues/29
Ok, allocate
now builds on new-segment
(#28). Now checking if this branch runs natively, @charles-l.
In the process I ran into another bug: #30. I found it easier this time around to just make an issue and move on, without scope-creeping myself into paralysis for the umpteenth time.
Hmm, apps/assort
is failing tests natively on Travis even though it passes on my Linux server. No segfault, though, so some progress..
Turns out the culprit is #30! mmap
sometimes returns addresses with the most significant bit set, and SubX currently only supports signed comparison. As a result my dependency-injected wrapper for the write
syscall thinks the stream argument is a file descriptor, and chaos ensues.
I've started aborting on failed writes to make issues like this easier to catch.
But yeah, #30. It's going to take some time to fix because I have to go over all supported instructions and start doing right by the carry flag. I'll open a new branch for it. Let me know if you'd be interested in contributing to it.
I've created PR #31. Progress so far: https://github.com/akkartik/mu/pull/31/commits/8c1a69089bcb0be8ee869a3f83c3151a7083e27c
The segfault should be fixed! Do a git pull
and try it out.
The branch of #26 still throws an error. I think moving the heap around has uncovered an older bug. Now top of my list.
Phew! Back to building out dquotes.subx.
Whoo! I’ll check it out when I get back from work.
hmm... the dquotes branch is still segfaulting for me... might be because the tests don't pass yet...
Let me check. There's been a lot of merging downstream. But dquotes-1 should work?
Sorry, yes you're right. I fixed one bug in the last couple of days on dquotes-1
which hasn't yet made it to dquotes
. I didn't notice because I forgot with all the bouncing between branches that test_apps doesn't actually test dquotes.subx. Since master
isn't affected, I'd like to avoid cherry-picking into an upstream branch, and wait until I wrap up #26 before merge everything back into dquotes
in one fell swoop.
Ah, gotcha. Yeah, looks like dquotes-1
is working now.
I've been starting a new thread over at https://github.com/akkartik/mu/commit/d4a2442688#comments, in case you weren't notified for some reason.
Oh, yeah, I didn't seem to get notifications for that. Thanks for letting me know -- I'll take a look.
One of two primitives remaining to be built in dquotes.subx.
For more context, see the parent branch.