Oleg-N-Cher / OfrontPlus

Oberon family of languages to C translator for ARM, x64 and x86 architectures
Other
57 stars 11 forks source link

Crash during GC on Mac OS (clang-1000.10.44.4) #87

Closed sgreenhill closed 4 years ago

sgreenhill commented 4 years ago

Hi,

I'm just trying ofront+ for the first time, using the current github version. It works fine until it needs to expand the heap, and then segfaults. Programs using less than about 200K of heap work fine. Same program runs fine under VOC.

Easiest way to reproduce the problem is:

MODULE TestHeap2;

VAR
  a : POINTER TO ARRAY OF CHAR;

BEGIN
  NEW(a, 1024*1024);
END TestHeap2.

Compiled with:

ofront+ -88 -2 -m TestHeap2.Mod
gcc $OCFLAGS TestHeap2.c -o TestHeap2 -lOfront

where $OCFLAGS are the paths to the library directories. Same result if compiled (like the compiler itself) like this:

gcc $OCFLAGS -m64 -g -fvisibility=hidden -fomit-frame-pointer -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables TestHeap2.c -o TestHeap2 -lOfront -dl

Looking at the trace below, I think the problem is in Heap.MarkStack:

  PROCEDURE MarkStack(n: ADDRESS; VAR cand: ARRAY OF ADDRESS);
    VAR
      frame: S.PTR;
      nofcand: INTEGER;
      inc, sp, p, stack0: ADDRESS;
      align: RECORD ch: SHORTCHAR; p: S.PTR END;
  BEGIN
    IF n > 0 THEN MarkStack(n-1, cand);  (* flush register windows by means of recursive calls *)
      IF n > 100 THEN RETURN END   (* prevent tail recursion optimization *)
    END;
    IF n = 0 THEN
      nofcand := 0; sp := S.ADR(frame);
      stack0 := SystemMainStackFrame();
      (* check for minimum alignment of pointers *)
      inc := S.ADR(align.p) - S.ADR(align);
      IF uLT(stack0, sp) THEN inc := -inc END;
      WHILE sp # stack0 DO
        S.GET(sp, p);     (* <------------ CRASH HERE *)
        IF uLE(heapMin, p) & uLT(p, heapMax) THEN
          IF nofcand = LEN(cand) THEN HeapSort(nofcand, cand); MarkCandidates(nofcand, cand); nofcand := 0 END;
          cand[nofcand] := p; INC(nofcand)
        END;
        INC(sp, inc)
      END;
      IF nofcand > 0 THEN HeapSort(nofcand, cand); MarkCandidates(nofcand, cand) END
    END
  END MarkStack;

The most likely way this can go wrong is if inc has the wrong sign, or if the stack frames are not perfectly aligned. This would allow sp to move past stack0 without triggering the test sp # stack0. Probably would be safer to do an unsigned comparison at this point. For example, uLE(sp,stack0) or uGE depending on the stack direction.

From below, you can see that the stack is between 0x00007ffeefbff764 and 0x00007ffeefbeb4f0, and its testing a location "sp" outside this range: 0x7ffeefc00000. To me it looks suspicious that SYSTEM_MainStackFrame ends with a 4 when the alignment size is probably 8. Is it possible this value could be slightly wrong? Perhaps it should be forced to an alignment boundary?

ofront $lldb TestHeap2
(lldb) target create "TestHeap2"
Current executable set to 'TestHeap2' (x86_64).
(lldb) run
Process 48462 launched: '/Users/stewart/work/oberon/ofront/TestHeap2' (x86_64)
Process 48462 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x7ffeefc00000)
    frame #0: 0x0000000100001e32 TestHeap2`Heap_MarkStack + 82
TestHeap2`Heap_MarkStack:
->  0x100001e32 <+82>: movq   (%rbx), %rbp
    0x100001e35 <+85>: cmpq   %rbp, 0x16ec(%rip)        ; Heap_heapMin
    0x100001e3c <+92>: ja     0x100001e77               ; <+151>
    0x100001e3e <+94>: cmpq   0x16eb(%rip), %rbp        ; Heap_heapMax
Target 0: (TestHeap2) stopped.
(lldb) register read
General Purpose Registers:
       rax = 0x0000000100003501  TestHeap2`Heap_freeList + 65
       rbx = 0x00007ffeefc00000
       rcx = 0x00000001000f3809
       rdx = 0x00000001000031d1  Heap_ModuleDesc__desc + 225
       rdi = 0x0000000000000000
       rsi = 0x00007ffeefbebd40
       rbp = 0x0000633733343562
       rsp = 0x00007ffeefbeb4f0
        r8 = 0x0000000000000000
        r9 = 0x00000001000f35e0
       r10 = 0x0000000000000000
       r11 = 0x0000000000000007
       r12 = 0x0000000000000001
       r13 = 0x0000000000000008
       r14 = 0x00007ffeefbebd40
       r15 = 0x00007ffeefbff764
       rip = 0x0000000100001e32  TestHeap2`Heap_MarkStack + 82
    rflags = 0x0000000000010283
        cs = 0x000000000000002b
        fs = 0x0000000000000000
        gs = 0x0000000000000000

(lldb) print SYSTEM_MainStackFrame
(void *) $0 = 0x00007ffeefbff764
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x7ffeefc00000)
  * frame #0: 0x0000000100001e32 TestHeap2`Heap_MarkStack + 82
    frame #1: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #2: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #3: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #4: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #5: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #6: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #7: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #8: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #9: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #10: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #11: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #12: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #13: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #14: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #15: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #16: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #17: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #18: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #19: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #20: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #21: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #22: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #23: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #24: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #25: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #26: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #27: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #28: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #29: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #30: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #31: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #32: 0x0000000100001dfe TestHeap2`Heap_MarkStack + 30
    frame #33: 0x0000000100001ac8 TestHeap2`Heap_GC + 146
    frame #34: 0x00000001000017e4 TestHeap2`Heap_NEWREC + 255
    frame #35: 0x00000001000016b6 TestHeap2`Heap_NEWBLK + 36
    frame #36: 0x0000000100001307 TestHeap2`SYSTEM_NEWARR + 326
    frame #37: 0x0000000100000e3f TestHeap2`main + 95
    frame #38: 0x00007fff535f1015 libdyld.dylib`start + 1
    frame #39: 0x00007fff535f1015 libdyld.dylib`start + 1
(lldb) ^D
ofront $gcc --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 10.0.0 (clang-1000.10.44.4)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
Oleg-N-Cher commented 4 years ago

Dear Stewart,

I was looking for a good way to get the address of the top of the stack, but I didn't know that under 64 bit OS an argument could be placed at address that unaligned into 64 bits. I thank you for this observation.

Most recently, I made a commit that fixed incorrect calculation of the top of the stack under Windows with a custom entry point (see Mod/Lib/crt1.c), that was used to reduce the size of the binary executable file.

Please try to change in SYSTEM.c

void SYSTEM_INIT(INTEGER argc, void *argv)
{
  SYSTEM_MainStackFrame = &argc;

to:

  SYSTEM_MainStackFrame = __builtin_frame_address(0);

If you can suggest some more correct way to calculate the aligned address of top of stack, please let me know.

P.S. I have my own mini-mac, but it is currently under repair.

Oleg-N-Cher commented 4 years ago

I'm not quite sure that the function __builtin_frame_address is in Clang...

So I am now more inclined to calculate top of stack address for Linux in the same way as in voc.

But it is likely that this non-aligned stack problem may be present under Windows x64 too. Can we use &argv instead of &argc ?

Oleg-N-Cher commented 4 years ago

Of course, "top of stack" here is a very conditional concept. The address just must be higher than any local pointer placed on the stack. All this is required for the garbage collector.

sgreenhill commented 4 years ago

Thanks Oleg!

I can confirm that both &argv and __builtin_frame_address(0) work fine under Clang.

Looks like &argc is just an unlucky choice, since it is 4-bytes wide and is therefore not aligned on a multiple of the pointer size.

The real bug is in the statement WHILE sp # stack0 DO which assumes assumes that the stack top and bottom are aligned on a multiple of the pointer size. While this might usually be true, it probably can't be guaranteed.

$cat test.c
#include <stdio.h>

int main(int argc, char ** argv) {
    printf("argc: %p\n", &argc);
    printf("argv: %p\n", &argv);
    printf("__builtin_frame_address: %p\n", __builtin_frame_address(0));
}

$gcc test.c -o test
$./test
argc: 0x7ffee868682c
argv: 0x7ffee8686820
__builtin_frame_address: 0x7ffee8686830
Oleg-N-Cher commented 4 years ago

Ah, I chose &argc because it is higher on the stack than &argv and closer to the result of __builtin_frame_address(0). But this is probably not the case on all platforms.

      IF uLT(stack0, sp) THEN inc := -inc END;
      WHILE sp # stack0 DO

This code implies working on architectures where the stack grows up instead of down. This is why the inequality sp # stack0 comparison was chosen here instead of the unsigned comparison on more or less.

I'm still wondering what we can do with it elegantly.

P.S. I am open for new suggestions on Ofront+, so we can discuss these questions here, and feel free to e-mail me: allot { at } bk.ru I'm not as conservative as Norayr, and I eventually want to get a convenient tool for developing in the real world, and not a beautiful, but useless thing.

You can see how binding to WinAPI for Ofront+ is done now:

https://github.com/Oleg-N-Cher/XDev/blob/master/WinDev/Lib/Mod/WinAPI.Def

If this were needed for voc, it would require a lot of very routine work.

Also please look at these bindings:

https://github.com/Oleg-N-Cher/NesDev/blob/master/Lib/Mod/NesLib.Mod https://github.com/Oleg-N-Cher/XDev/blob/master/ZXDev/Lib/Mod/Tasks.Def

A variety of ways are shown here.

sgreenhill commented 4 years ago

One option is to split the code into two loops: one which increments sp while less than stack0, and one which decrements sp while greater than stack0. It wouldn't be elegant, but would be safer and just as fast.

Alternatively, round both stack0 and sp to multiples of the increment. For example, assuming positive sp, and that pointers have an alignment of "inc": sp := sp - sp MOD inc; stack0 := stack0 - stack0 MOD inc; Do this before the loop, and sp must eventually hit stack0 with the current test.

Thanks for your details. I will contact you if I have any issues. For now I am doing some experiments with ofront+ and have ported some hand-written interfaces. It should be easy to migrate from the "code procedure" to "extern declaration" model. From there I may look at adapting my translator (H2O) to make some more extensive automatic translations. It would be good to eventually support the same functionality in VOC. It don't think it should be too hard, especially with your implementation as a reference. I will study the compiler and try to understand how it works.

Looking at my collection of interfaces, I have a version of WinApi from 2002 which eventually became part of the VisualOberon project by Tim Teulings. The ZXDev bindings are certainly interesting. The "Def" modules are Oberon interfaces to the Z80 assembley codes? Can the C compiler generate code that is compact enough to be useful?There's not much space on some of those old machines. It makes me feel a bit nostalgic. My first job was writing 6510 assembley for the C64.

Oleg-N-Cher commented 4 years ago

Dear Stewart,

I've already made the commit, and now I'm thinking about it and I'm starting to doubt it.

All the entire mechanism for searching for local pointers on the stack is based on the fact that all pointers are necessarily aligned. If there is at least one non-aligned garbage collected pointer on the stack, this mechanism will be broken.

Therefore, stack0 and sp are always already aligned, because they are calculated by getting address of a pointer to a function's argument on the stack. This means that the additional correction is not necessary.

What do you think of this?

P.S. I use the extension .Def to indicate that these modules are not code modules, but only definitions (foreign interfaces).

Old machines really are very limited, but I use Ofront+ for developing simple games for retro-computers. Of course, these games never reach the level of games developed directly in assembler, but this is also a kind of hobby, don't you think? :-)

I have tried using the SDCC and cc65 compilers with Ofront+, and this tandem generates satisfactory code quality.

Here are a few links to these games:

https://zx.oberon.org/dash.htm https://github.com/Oleg-N-Cher/Dash https://zx.oberon.org/durak.htm https://github.com/Oleg-N-Cher/DarkWoods https://zx.oberon2.ru/forum/viewtopic.php?f=5&t=4&start=10#p1332

sgreenhill commented 4 years ago

I think the rule is that objects must be aligned at a multiple of their size. This prevents them from overlapping word boundaries, which would otherwise make fetches inefficient. In the experiment below it seems to be true for both Clang-10, and gcc-9, although spacing is sometimes weird, which I can't explain. So maybe rather than rounding it is sufficient to just take the address of a pointer variable?

If pointers were not so aligned, Heap_MarkStack would not work at all as it is.

Excellent! Boulderdash was a favourite of mine. Never played Dark Woods, but it looks fun :-)

$cat test.c
#include <stdio.h>

void test(int a, char * p1, char c, char d, char * p2, char e, int f, char *p3, char *p4) {
    printf("a:  %p\n", &a);
    printf("p1: %p\n", &p1);
    printf("c:  %p\n", &c);
    printf("d:  %p\n", &d);
    printf("p2: %p\n", &p2);
    printf("e:  %p\n", &e);
    printf("f:  %p\n", &f);
    printf("p3: %p\n", &p3);
    printf("p4: %p\n", &p4);
}

int main(int argc, char ** argv) {
    test(1, "one", '2', '2', "two", '3', 3, "three", "four");
}

$gcc test.c -o test
$./test 
a:  0x7ffeef68c75c
p1: 0x7ffeef68c750
c:  0x7ffeef68c74f
d:  0x7ffeef68c74e
p2: 0x7ffeef68c740
e:  0x7ffeef68c73f
f:  0x7ffeef68c790
p3: 0x7ffeef68c798
p4: 0x7ffeef68c7a0
$gcc-9 test.c -o test9
$./test9
a:  0x7ffeec3c078c
p1: 0x7ffeec3c0780
c:  0x7ffeec3c0788
d:  0x7ffeec3c077c
p2: 0x7ffeec3c0770
e:  0x7ffeec3c0778
f:  0x7ffeec3c07a0
p3: 0x7ffeec3c07a8
p4: 0x7ffeec3c07b0
Oleg-N-Cher commented 4 years ago

Josef Templ wrote:

Dear Oleg,

sorry for my late answer. I do remember that I also encountered problems with getting the stack base when switching C compiler optimizations on, depending on the platform. In general, using the address of a variable within 'main' was the only portable way I have found and using &argv avoids the need for an additional local variable. Using gcc specific features was not an option for me. Now, if the C compiler does very aggressive optimizations, it may inline 'main' and if argv is then no longer a local variable of 'main' but for example a global variable it lives at a completely different address outside of the stack and it is useless for obtaining the stack base address. This must be avoided and switching off compiler optimizations for the main module usually helps, as far as I remember.

  • Josef
jtempl commented 4 years ago

Oleg, your call of SYSTEM_INIT() in the macro __INIT in the file SYSTEM.h has a bug. Compare it to the ofront version and you will see that an & operator is missing. Note that in ofront the 'address' of the argv argument is passed, not its value. This is expressed by the parameter name argvadr. That explains the wrong stack base address.

Oleg-N-Cher commented 4 years ago

Dear Josef, there is no bug. I did it on purpose.

Ofront uses the argv variable address from the function main(argv, argv). I can't do it the same way, because for reducing size of executable I'm using a custom version of crt1.c, and the entry point replaced from main() to WinMain() that has not such argument argv. So I have to simulate argv using a function __getmainargs. Look at this source and you'll understand everything:

https://github.com/Oleg-N-Cher/OfrontPlus/blob/master/Mod/Lib/crt1.c

So now Ofront+ uses address not from argv of the main() function, but from argv of the SYSTEM_INIT:

void SYSTEM_INIT(INTEGER argc, void *argv)

Of course, this address is somewhat deeper in the stack, but it will still work as SYSTEM_MainStackFrame. It is an address of a pointer, so it is aligned. I'm sure this will be a safe solution, since SYSTEM_INIT is called very close to the top of the stack:

int main(int argc, char **argv)
{
    __INIT(argc, argv);
    ...

We still have a way to use the function __builtin_frame_address(0) to get this address.

I will be happy to hear your opinion.

Oleg-N-Cher commented 4 years ago

I hope I've explained everything clearly? My custom argv is a variable not on the stack. Because of this, I had problems with the garbage collector, which passed about 2 megabytes of stack and called SIGSEGV - because GC passed the stack from the bottom and not to the top of the stack, but to the area of local variables in the heap. So my entry point doesn't have argv as an argument passed on the stack. And I have to find another way. And I found a very similar working method - to use &argv of the SYSTEM_INIT.

Oleg-N-Cher commented 4 years ago

Here's what my situation looked like with taking argv address where argv was a variable on the heap, not on the stack:

void SYSTEM_INIT(INTEGER argc, void *argvadr)
{ int adr;
  SYSTEM_MainStackFrame = argvadr; printf("argvadr = %d\n", (SYSTEM_ADRINT)argvadr);
  SYSTEM_MainStackFrame = __builtin_frame_address(0);
  printf("__builtin_frame_address = %d\n", (SYSTEM_ADRINT)__builtin_frame_address(0));
  printf("&adr = %d\n", (SYSTEM_ADRINT)&adr);

image

By the way, for completeness: we could take the address of the local variable adr declared in SYSTEM_INIT as SYSTEM_MainStackFrame. But &adr is even deeper on the stack than &argv of SYSTEM_INIT.

jtempl commented 4 years ago

You must be very careful when selecting a stack base address because if you miss a single pointer candidate the garbage collector may crash your program. Therefore you must use an address outside of SYSTEM_INIT, i.e. from the caller of SYSTEM_INIT. Then all potential pointer candidates are covered. It is not sufficient to be somewhere close to the stack base, you must be there. You can use a local variable in your WinMain function if you don't have any parameter available.

Oleg-N-Cher commented 4 years ago

What do you think about this decision?

int main(int argc, char **argv)
{
    __INIT(argc, argv, __builtin_frame_address(0)); // <-- No risk of a missed pointer,
    // because no one GC-pointer has been reserved in stack at this place, main() has not local variables at all.
    // All pointers (as local variables and as arguments) will be reserved later.
void SYSTEM_INIT(INTEGER argc, void *argv, void *stackframe)
{
  SYSTEM_MainStackFrame = stackframe; // MUST be aligned
  SYSTEM_ArgCount = argc;
  SYSTEM_ArgVector = argv;

By the way, what are the reasons to avoid using the function __builtin_frame_address? Stewart found out that it works on macOS too.

Oleg-N-Cher commented 4 years ago

I certainly wouldn't want to overcomplicate code generation with a call __INIT(argc, argv, __builtin_frame_address(0)) But what else can I do?

Oleg-N-Cher commented 4 years ago

In principle, it can be written in SYSTEM.h as the macro:

#define __INIT(argc, argv) static void *m; SYSTEM_INIT(argc, argv, __builtin_frame_address(0))

Oleg-N-Cher commented 4 years ago

This commit takes into account the expressed wishes. For UNIX systems, the address is taken in the old, proven way, by &argv of main(). And for WinMain entry point, we use the function __builtin_frame_address(0).

sgreenhill commented 4 years ago

That sounds good to me. Are you using GCC on Windows, or does the Microsoft compiler also implement __builtin_frame_address?

Oleg-N-Cher commented 4 years ago

Yes, I use MINGW.

I have just found out that using function __builtin_frame_address has not any dependencies on WinAPI libraries, from which I conclude that this function unwraps into several in-line machine code commands without external dependencies.

I found some links for an analog of the function __builtin_frame_address from Microsoft.

https://stackoverflow.com/questions/1291446/visual-c-versions-of-gcc-functionshttps://docs.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2008/s975zw7k(v=vs.90) • https://docs.microsoft.com/en-us/cpp/intrinsics/returnaddress?view=vs-2019

jtempl commented 4 years ago

__builtin_frame_address is a GCC specific function or macro, as far as I know. That's why I would try to avoid it. The C standard dos not have such a feature. On some compiler it may not be available.

Oleg-N-Cher commented 4 years ago

Well, in this case we use this function exclusively for a specific usage of the WinMain entry point.

If a compiler uses the main() entry point, the code with __builtin_frame_address will not be used. And as far as I understand, Ofront generally only uses the main() entry point.

I think I can close this issue.