UU-ComputerScience / uhc

136 stars 21 forks source link

compiled code is segfaulting #32

Open orchid-hybrid opened 9 years ago

orchid-hybrid commented 9 years ago

Hello

I have built UHC from source code using GHC on a 64 bit computer

$ uhc --version
ehc-1.1.7.2, revision master@05180a3aff, timestamp 20141121 +0000 030619

but I tried to compile the demo with uhc hello.hs, this seems to work and creates output files but when I run it it crashes:

(gdb) r
Program received signal SIGSEGV, Segmentation fault.
0x000000000041c748 in gb_InitTables ()
(gdb) bt
#0  0x000000000041c748 in gb_InitTables ()
#1  0x00000000004065a0 in UHC_Base_initModule ()
#2  0x0000000000404935 in main ()

I looked around and found this is in the bytecode interpreter.c file but I don't know what might be going wrong.

atzedijkstra commented 9 years ago

Hi, on what OS/GHC combi are you running, e.g. can you provide the output from ./configure followed by a clean build?

orchid-hybrid commented 9 years ago

Hello, I'm using 64-bit Arch Linux with GHC 7.8.3, Shuffle 0.1.3.1 and uuagc 0.9.51.

$ uname -a
Linux arch 3.17.1-1-ARCH #1 SMP PREEMPT Wed Oct 15 15:04:35 CEST 2014 x86_64 GNU/Linux
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.8.3
$ shuffle --version
0.1.3.1
$ uuagc --version  
Attribute Grammar compiler / HUT project. Version 0.9.51

I can change to any other version if it helps!

$ git clone https://github.com/UU-ComputerScience/uhc --depth 1
$ cd uhc/EHC/
$ ./configure > configure.txt
$ make > make.txt 2>&1

Here are the two files: https://gist.github.com/orchid-hybrid/5df8e23a5d13d6f87db3

If I can give any more useful info just ask!

atzedijkstra commented 9 years ago

Hi,

I cannot find anything suspicious in the dumps... A next step is to turn on tracing which is available only in variant 99 of uhf, so please:

make 99/ehc make 99/ehclib install/99/bin/ehc --gen-trace=1 -Operwholecore hw ./hw

where hw is your hello world program. This should give you an execution trace I'd like to see.

Atze

On 21 Nov, 2014, at 21:08 , orchid-hybrid notifications@github.com wrote:

Hello, I'm using 64-bit Arch Linux with GHC 7.8.3, Shuffle 0.1.3.1 and uuagc 0.9.51.

$ uname -a Linux arch 3.17.1-1-ARCH #1 SMP PREEMPT Wed Oct 15 15:04:35 CEST 2014 x86_64 GNU/Linux $ ghc --version The Glorious Glasgow Haskell Compilation System, version 7.8.3 $ shuffle --version 0.1.3.1 $ uuagc --version
Attribute Grammar compiler / HUT project. Version 0.9.51

I can change to any other version if it helps!

$ git clone https://github.com/UU-ComputerScience/uhc --depth 1 $ cd uhc/EHC/ $ ./configure > configure.txt $ make > make.txt 2>&1

Here are the two files: https://gist.github.com/orchid-hybrid/5df8e23a5d13d6f87db3

If I can give any more useful info just ask!

— Reply to this email directly or view it on GitHub.

            - Atze -

Atze Dijkstra, Department of Information and Computing Sciences. /|\ Utrecht University, PO Box 80089, 3508 TB Utrecht, Netherlands. / | \ Tel.: +31-30-2534118/1454 | WWW : http://www.cs.uu.nl/~atze . /--| \ Fax : +31-30-2513971 .... | Email: atze@uu.nl ............... / |___\

orchid-hybrid commented 9 years ago

Sure Atze. Here is the trace from the same program demo/hello.hs which again segfaults: https://gist.github.com/orchid-hybrid/b2b1c07b7b58fae0ae9b

atzedijkstra commented 9 years ago

Hi,

internal setup seems ok but it never gets to run. Further digging needs to be done on your platform (as I cannot reproduce it on my environment and I don't have a virtual image for your platform readily available)... If you have time for it, please could you figure out where in main_GB_Run in file src/rts/rts.cc (and/or perhaps in main_GB_Init1) it crashes?

Atze

On 24 Nov, 2014, at 11:37 , orchid-hybrid notifications@github.com wrote:

Sure Atze. Here is the trace from the same program demo/hello.hs which again segfaults: https://gist.github.com/orchid-hybrid/b2b1c07b7b58fae0ae9b

— Reply to this email directly or view it on GitHub.

            - Atze -

Atze Dijkstra, Department of Information and Computing Sciences. /|\ Utrecht University, PO Box 80089, 3508 TB Utrecht, Netherlands. / | \ Tel.: +31-30-2534118/1454 | WWW : http://www.cs.uu.nl/~atze . /--| \ Fax : +31-30-2513971 .... | Email: atze@uu.nl ............... / |___\

orchid-hybrid commented 9 years ago

Hello,

I didn't see main_GB_Run or main_GB_Init1 in the backtraces and I'm not sure how to find out the information about the crash that would be useful. If there is anything you'd like me to try I would happy to.

I did attempt to investigate this more, I tried adding -g compile flag to various places to get line numbers in the backtrace but I couldn't work out how to get line numbers for gb_InitTables compiled from interpreter.c:

(gdb) bt
#0  0x00000000004203c2 in gb_InitTables ()
#1  0x00000000004078eb in UHC_Base_initModule (
    modTbl=0x648940 <Main_moduleEntries>, modTblInx=0)
    at install/99/lib/pkg//uhcbase-1.1.7.2/99/bc/plain/UHC/Base.c:57350
#2  0x00000000004058d9 in main (argc=1, argv=0x7fffffffe528)
    at demo/hello.c:834

With layout asm I could see that it crashes on the instruction at +1426:

0x4203c2 <gb_InitTables+1426>   paddq  -0x10(%r8),%xmm1
0x4203c8 <gb_InitTables+1432>   paddq  -0x20(%r8),%xmm0
0x4203ce <gb_InitTables+1438>   movaps %xmm1,-0x10(%rsi)
0x4203d2 <gb_InitTables+1442>   movaps %xmm0,-0x20(%rsi)
0x4203d6 <gb_InitTables+1446>   cmp    %rdi,%rbx
0x4203d9 <gb_InitTables+1449>   jae    0x420471 <gb_InitTables+1601>
0x4203df <gb_InitTables+1455>   movdqa %xmm3,%xmm0
0x4203e3 <gb_InitTables+1459>   jmp    0x420384 <gb_InitTables+1364>
0x4203e5 <gb_InitTables+1461>   lea    0x8(%rbx,%r15,1),%rdx
0x4203ea <gb_InitTables+1466>   mov    %rdx,(%rbx)
0x4203ed <gb_InitTables+1469>   jmpq   0x420110 <gb_InitTables+736>

I tried the build and hello world demo on my friends computer who also uses arch linux and the crash occurred at a different location +1112:

0x41c748 <gb_InitTables+1112>   paddq  (%r12),%xmm0
0x41c74e <gb_InitTables+1118>   paddq  %xmm5,%xmm1
0x41c752 <gb_InitTables+1122>   add    $0x20,%r12
0x41c756 <gb_InitTables+1126>   paddq  -0x10(%r12),%xmm1
0x41c75d <gb_InitTables+1133>   movaps %xmm0,-0x20(%rcx)
0x41c761 <gb_InitTables+1137>   paddq  %xmm4,%xmm6
0x41c765 <gb_InitTables+1141>   movaps %xmm1,-0x10(%rcx)
0x41c769 <gb_InitTables+1145>   cmp    %r10,%r15
0x41c76c <gb_InitTables+1148>   jae    0x41c7c3 <gb_InitTables+1235>
0x41c76e <gb_InitTables+1150>   movdqa %xmm6,%xmm0
0x41c772 <gb_InitTables+1154>   jmp    0x41c722 <gb_InitTables+1074>

Finally I tried it on a ubuntu machine it there was no problem.

So I am a little worried this is a problem with arch itself rather than the compiler. If so I am very sorry to take up your time on it! I will report back if I find anything about that.

atzedijkstra commented 9 years ago

Hi,

I am afraid that in a similar situation I'd have to fall back on adding tracing printf statements etc.. Indeed installing/running uhc on ubuntu 32/64 bit both is ok, unix/linux in general is. If you can find the cause, please let me know, but again, I am afraid that is probably is something very silly like a small glitch in the C compiler (this has happened before, where it turned out that an expression had to be rewritten so that gcc would generate different (correct) code.).

regards, Atze

On 26 Nov, 2014, at 01:07 , orchid-hybrid notifications@github.com wrote:

Hello,

I didn't see main_GB_Run or main_GB_Init1 in the backtraces and I'm not sure how to find out the information about the crash that would be useful. If there is anything you'd like me to try I would happy to.

I did attempt to investigate this more, I tried adding -g compile flag to various places to get line numbers in the backtrace but I couldn't work out how to get line numbers for gb_InitTables compiled from interpreter.c:

(gdb) bt

0 0x00000000004203c2 in gb_InitTables ()

1 0x00000000004078eb in UHC_Base_initModule (

modTbl=0x648940 <Main_moduleEntries>, modTblInx=0)
at install/99/lib/pkg//uhcbase-1.1.7.2/99/bc/plain/UHC/Base.c:57350

2 0x00000000004058d9 in main (argc=1, argv=0x7fffffffe528)

at demo/hello.c:834

With layout asm I could see that it crashes on the instruction at +1426:

0x4203c2 <gb_InitTables+1426> paddq -0x10(%r8),%xmm1 0x4203c8 <gb_InitTables+1432> paddq -0x20(%r8),%xmm0 0x4203ce <gb_InitTables+1438> movaps %xmm1,-0x10(%rsi) 0x4203d2 <gb_InitTables+1442> movaps %xmm0,-0x20(%rsi) 0x4203d6 <gb_InitTables+1446> cmp %rdi,%rbx 0x4203d9 <gb_InitTables+1449> jae 0x420471 <gb_InitTables+1601> 0x4203df <gb_InitTables+1455> movdqa %xmm3,%xmm0 0x4203e3 <gb_InitTables+1459> jmp 0x420384 <gb_InitTables+1364> 0x4203e5 <gb_InitTables+1461> lea 0x8(%rbx,%r15,1),%rdx 0x4203ea <gb_InitTables+1466> mov %rdx,(%rbx) 0x4203ed <gb_InitTables+1469> jmpq 0x420110 <gb_InitTables+736>

I tried the build and hello world demo on my friends computer who also uses arch linux and the crash occurred at a different location +1112:

0x41c748 <gb_InitTables+1112> paddq (%r12),%xmm0 0x41c74e <gb_InitTables+1118> paddq %xmm5,%xmm1 0x41c752 <gb_InitTables+1122> add $0x20,%r12 0x41c756 <gb_InitTables+1126> paddq -0x10(%r12),%xmm1 0x41c75d <gb_InitTables+1133> movaps %xmm0,-0x20(%rcx) 0x41c761 <gb_InitTables+1137> paddq %xmm4,%xmm6 0x41c765 <gb_InitTables+1141> movaps %xmm1,-0x10(%rcx) 0x41c769 <gb_InitTables+1145> cmp %r10,%r15 0x41c76c <gb_InitTables+1148> jae 0x41c7c3 <gb_InitTables+1235> 0x41c76e <gb_InitTables+1150> movdqa %xmm6,%xmm0 0x41c772 <gb_InitTables+1154> jmp 0x41c722 <gb_InitTables+1074>

Finally I tried it on a ubuntu machine it there was no problem.

So I am a little worried this is a problem with arch itself rather than the compiler. If so I am very sorry to take up your time on it! I will report back if I find anything about that.

— Reply to this email directly or view it on GitHub.

            - Atze -

Atze Dijkstra, Department of Information and Computing Sciences. /|\ Utrecht University, PO Box 80089, 3508 TB Utrecht, Netherlands. / | \ Tel.: +31-30-2534118/1454 | WWW : http://www.cs.uu.nl/~atze . /--| \ Fax : +31-30-2513971 .... | Email: atze@uu.nl ............... / |___\

orchid-hybrid commented 9 years ago

I see!

I tried adding debug prints to each line of main_GB_Init1 and main_GB_Run in src/rts/rtc.cc, every one of the main_GB_Init1 was printed but none of main_GB_Run was. I don't think that execution reaches main_GB_Run.

main_GB_Init1 0
main_GB_Init1 1
main_GB_Init1 2
main_GB_Init1 3
main_GB_Init1 4
main_GB_Init1 5
*** module Main
  *** entry UHC.Handle.newEmptyBuffer
...
Segmentation fault (core dumped)
atzedijkstra commented 9 years ago

Perhaps comparing these trace outputs as generated from your OS and (say) Ubuntu will give a clue as to where things break... Your output suggests that setting up tables for modules etc is going wrong. 64 Linux machines (I think) roughly have the same memory layout so differences would not that difficult to spot apart from obvious offset differences.

phile314 commented 9 years ago

I did run into a very similar problem today... I did upgrade the OS (linux - fedora) a few days ago, and since then all produced executables crash. The trace ( https://gist.github.com/phile314/826024d9bd5f72134fa0 ) looks similar to the one already in this bug report.

I am using an amd64 machine. UHC worked on Fedora 20 with ghc 4.8.3, now with Fedora 21 and ghc 4.9.2 the produced executables crash.

@atzedijkstra we can discuss it at the meeting tomorrow.

atzedijkstra commented 9 years ago

I fear that the only way to find out where the problem originates is to have the traces of a yes/no working OS environment available for comparison. Otherwise too much of a needle-in-haystack (time-consuming) search. Now we at least have two known environments distinguishing the working from the crash, would it be doable to set these both up as a VM and obtain traces for the same program run? My initial guess is that the problem is not the instruction before the crash but a difference in setup of the runtime environment, tables etc. It might be gcc is making something different from generated code...

A

On 12 Jan, 2015, at 18:15 , Philipp Hausmann notifications@github.com wrote:

I did run into a very similar problem today... I did upgrade the OS (linux - fedora) a few days ago, and since then all produced executables crash. The trace ( https://gist.github.com/phile314/826024d9bd5f72134fa0 ) looks similar to the one already in this bug report.

I am using an amd64 machine. UHC worked on Fedora 20 with ghc 4.8.3, now with Fedora 21 and ghc 4.9.2 the produced executables crash.

@atzedijkstra we can discuss it at the meeting tomorrow.

— Reply to this email directly or view it on GitHub.

            - Atze -

Atze Dijkstra, Department of Information and Computing Sciences. /|\ Utrecht University, PO Box 80089, 3508 TB Utrecht, Netherlands. / | \ Tel.: +31-30-2534118/1454 | WWW : http://www.cs.uu.nl/~atze . /--| \ Fax : +31-30-2513971 .... | Email: atze@uu.nl ............... / |___\

phile314 commented 9 years ago

I tried to debug the generated executables, but that hasn't really helped so far.... Interestingly, inside the debuger (gdb) the C executable crashes everywhere. I also can see in the debugger that the two versions are not optimized exactly the same way.

The crash seems to happen at interpreter.c:1887, backtrace:

#0  gb_InitTables (byteCodes=, byteCodesSz=, cafGlEntryIndices=0xd80000 , 
    cafGlEntryIndicesSz=, globalEntries=, globalEntriesSz=, 
    consts=0xd7fbc0 , gcStackInfos=0xc1a9c0 , linkChainInds=0x12b4680, 
    callinfos=0xc20a40 , callinfosSz=2350, functionInfos=0xc1b200 , functionInfosSz=2053, 
    bytePool=0xbf8380  "UHC.Base._'Dict_Constructor", linkChainOffset=122553, impModules=0x12b4680, impModulesSz=0, 
    expNode=0x12b4880 , expNodeSz=604, expNodeOffs=0xd84c40 , 
    modTbl=0xbf7380 , modTblInx=0) at build/99/rts/bc/bc/interpreter.c:1887
#1  0x000000000040793e in UHC_Base_initModule (modTbl=0xbf7380 , modTblInx=0)
    at install/99/lib/pkg//uhcbase-1.1.8.4/99/bc/plain/UHC/Base.c:103203
#2  0x000000000040592c in main (argc=1, argv=0x7fffffffe3f8) at Test.c:890

(To get debug information for the RTS, one can use the --with-gcc-ehc-options=-g configure argument.)

The traces are quite big (~30MB), so opening them in the browser may not be the best idea... Working f20 trace: http://files.314.ch/trace_B_f20.txt Working f20 trace, cut off after where the f21 version crashes: http://files.314.ch/trace_B_f20_cut.txt Crashing f21 trace: http://files.314.ch/trace_B_f21.txt

I also compiled UHC using gcc 3.4 on my notebook, and then the generated executables work.

@atze Out of curiosity, what GCC version are you using?

atzedijkstra commented 9 years ago

Ok,

on the interpreter.c:1887 location I have (long time ago) inserted code to work around a gcc/ubuntu bug which I did not understand at the time (still don't). Might well be that fixing that piece of code could solve the problem as maybe now gcc is fixed. An #ifdef on the compiler version around it then...

This is the code:

            {
                WPtr p = loc+1;
                int j ; // this must be int, otherwise gcc under ubuntu 11.10 makes following code crash. dont ask why, I do not know
                for ( j = 0 ; j < info ; j++ ) {
                    WPtr pp = &p[j+1];
                    p[j] += (Word)pp ;
                }
            }

Might be that changing 'int j' into 'Word j' fixes it... Just 'guessing'...

A

On 14 Jan, 2015, at 12:12 , Philipp Hausmann notifications@github.com wrote:

I tried to debug the generated executables, but that hasn't really helped so far.... Interestingly, inside the debuger (gdb) the C executable crashes everywhere. I also can see in the debugger that the two versions are not optimized exactly the same way.

The crash seems to happen at interpreter.c:1887, backtrace:

0 gb_InitTables (byteCodes=, byteCodesSz=, cafGlEntryIndices=0xd80000 ,

cafGlEntryIndicesSz=, globalEntries=, globalEntriesSz=, 
consts=0xd7fbc0 , gcStackInfos=0xc1a9c0 , linkChainInds=0x12b4680, 
callinfos=0xc20a40 , callinfosSz=2350, functionInfos=0xc1b200 , functionInfosSz=2053, 
bytePool=0xbf8380  "UHC.Base._'Dict_Constructor", linkChainOffset=122553, impModules=0x12b4680, impModulesSz=0, 
expNode=0x12b4880 , expNodeSz=604, expNodeOffs=0xd84c40 , 
modTbl=0xbf7380 , modTblInx=0) at build/99/rts/bc/bc/interpreter.c:1887

1 0x000000000040793e in UHC_Base_initModule (modTbl=0xbf7380 , modTblInx=0)

at install/99/lib/pkg//uhcbase-1.1.8.4/99/bc/plain/UHC/Base.c:103203

2 0x000000000040592c in main (argc=1, argv=0x7fffffffe3f8) at Test.c:890

(To get debug information for the RTS, one can use the --with-gcc-ehc-options=-g configure argument.)

The traces are quite big (~30MB), so opening them in the browser may not be the best idea... Working f20 trace: http://files.314.ch/trace_B_f20.txt Working f20 trace, cut off after where the f21 version crashes: http://files.314.ch/trace_B_f20_cut.txt Crashing f21 trace: http://files.314.ch/trace_B_f21.txt

I also compiled UHC using gcc 3.4 on my notebook, and then the generated executables work.

@atze Out of curiosity, what GCC version are you using?

— Reply to this email directly or view it on GitHub.

            - Atze -

Atze Dijkstra, Department of Information and Computing Sciences. /|\ Utrecht University, PO Box 80089, 3508 TB Utrecht, Netherlands. / | \ Tel.: +31-30-2534118/1454 | WWW : http://www.cs.uu.nl/~atze . /--| \ Fax : +31-30-2513971 .... | Email: atze@uu.nl ............... / |___\

phile314 commented 9 years ago

I will give your suggestion a try next week.

If it doesn't fix the problem, I am probably just going to use an older GCC version. I am not that keen on spending much more time on this right now... It's probably also a good idea to mention this in the README if it stays broken, but let's first see if the fix works.

To summarize what we know right now: GCC <= 4.8 works GCC == 4.9 is broken LLVM/clang may work

orchid-hybrid commented 9 years ago

I tried changing int j ; to Word j; in src/rts/bc/interpreter.cc in a fresh git clone, compiled and tested it, sorry to say that it still crashes in the same way. I have gcc 4.9.2.

atzedijkstra commented 9 years ago

Ok, was just a guess...

I am installing ubuntu as a VM, will try to see what happens there now...

A

On 16 Jan, 2015, at 12:44 , name notifications@github.com wrote:

I tried changing int j ; to Word j; in src/rts/bc/interpreter.cc in a fresh git clone, compiled and tested it, sorry to say that it still crashes in the same way.

— Reply to this email directly or view it on GitHub.

            - Atze -

Atze Dijkstra, Department of Information and Computing Sciences. /|\ Utrecht University, PO Box 80089, 3508 TB Utrecht, Netherlands. / | \ Tel.: +31-30-2534118/1454 | WWW : http://www.cs.uu.nl/~atze . /--| \ Fax : +31-30-2513971 .... | Email: atze@uu.nl ............... / |___\

phile314 commented 9 years ago

Not really the solution, but might be helpful for other people. Using LLVM/clang the generated executables work and don't segfault.

clang version 3.5.0 (tags/RELEASE_350/final) Target: x86_64-redhat-linux-gnu Thread model: posix

configure: ./configure --with-gcc=/usr/bin/clang

orchid-hybrid commented 9 years ago

I confirm that compiling with clang allows me to build and run hello world successfully with ./install/101/bin/ehc demo/hello.hs.

Thank you!

ghost commented 7 years ago

So, about that one switch branch in interpreter.c... something looks really off about all those +1s.

This is a more complete sketch of the situation:

typedef uint64_t Word64;
typedef Word64 Word;
typedef Word *WPtr;
Word info = somehowInitialised();
WPtr loc = somehowInitialised();

*loc = info ;
{
    WPtr p = loc+1;
    int j;
    for (j = 0; j < info; j++) {
        WPtr pp = &p[j+1];
        p[j] += (Word)pp ;
    }
}

After eliminating the typedefs, p, and the array notation, I get the following bit of code:

uint64_t info = somehowInitialised();
uint64_t *loc = somehowInitialised();
*loc = info;
int j;

for (j = 0; j < info; j++) {
    uint64_t *pp = loc + 1 + j + 1; // (sic)
    *(loc + 1 + j) += (uint64_t)pp;
}

I get the impression that the extra + 1 (from p[j+1]) should not be there. However, I don't know where to find documentation on the meaning of GB_LinkChainKind_Offsets, nor are the variable names of any help, so I can't really check it. I've tried a few permutations of solving possible off-by-one errors here, but none of them resolve the fault.

ghost commented 7 years ago

Well, I think I found the problem after some debugging: gb_InitTables exhibits undefined behaviour. In particular, some part of UHC commits the following sin (says C99 in Appendix J.2):

Conversion between two pointer types produces a result that is incorrectly aligned (6.3.2.3).

The segfault occurs in a PADDQ instruction, which requires 128-bit alignment for memory operands. When I disable loop vectorisation for gb_InitTables (by adding __attribute__((optimize("no-tree-vectorize"))) to gb_InitTables, no other source changes), the problem disappears: GCC now emits code that uses ADD, which has no alignment requirements -- unless asked. And when I enable alignment checking, the program crashes with a bus error:

(gdb) b gb_InitTables 
Breakpoint 1 at 0xc5c70: file build/101/rts/bc/bc/interpreter.c, line 1799.
(gdb) run
Starting program: /tmp/a 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, gb_InitTables (byteCodes=0x555555856800 <UHC_Base_bytecode> "\376\377\341\002", byteCodesSz=238997, 
    cafGlEntryIndices=0x555555890da0 <UHC_Base_cafGlEntryIndices>, cafGlEntryIndicesSz=324, 
    globalEntries=0x5555558a4060 <UHC_Base_globalEntries>, globalEntriesSz=1642, consts=0x5555558a73c0 <UHC_Base_constants>, 
    gcStackInfos=0x555555891b20 <UHC_Base_gcStackInfos>, linkChainInds=0x555555a52040 <Unsafe_Coerce_expNode_size>, 
    callinfos=0x5555558969e0 <UHC_Base_callinfos>, callinfosSz=2193, functionInfos=0x555555892340 <UHC_Base_functionInfos>, 
    functionInfosSz=1642, bytePool=0x555555836d40 <UHC_Base_bytePool> "UHC.Base.primAsinDouble", linkChainOffset=2, 
    impModules=0x555555a52040 <Unsafe_Coerce_expNode_size>, impModulesSz=0, expNode=0x555555a52200 <UHC_Base_expNode>, 
    expNodeSz=529, expNodeOffs=0x5555558912c0 <UHC_Base_expNode_offs>, modTbl=0x555555836260 <a_moduleEntries>, modTblInx=0)
    at build/101/rts/bc/bc/interpreter.c:1799
1799    {
(gdb) set $ps |= (1<<18)
(gdb) continue
Continuing.

Program received signal SIGBUS, Bus error.
gb_InitTables (byteCodes=0x555555856800 <UHC_Base_bytecode> "\376\377\341\002", byteCodesSz=<optimized out>, 
    cafGlEntryIndices=0x555555890da0 <UHC_Base_cafGlEntryIndices>, cafGlEntryIndicesSz=324, globalEntries=<optimized out>, 
    globalEntriesSz=<optimized out>, consts=0x5555558a73c0 <UHC_Base_constants>, 
    gcStackInfos=0x555555891b20 <UHC_Base_gcStackInfos>, linkChainInds=0x555555a52040 <Unsafe_Coerce_expNode_size>, 
    callinfos=0x5555558969e0 <UHC_Base_callinfos>, callinfosSz=2193, functionInfos=0x555555892340 <UHC_Base_functionInfos>, 
    functionInfosSz=1642, bytePool=0x555555836d40 <UHC_Base_bytePool> "UHC.Base.primAsinDouble", linkChainOffset=2, 
    impModules=0x555555a52040 <Unsafe_Coerce_expNode_size>, impModulesSz=0, expNode=0x555555a52200 <UHC_Base_expNode>, 
    expNodeSz=529, expNodeOffs=0x5555558912c0 <UHC_Base_expNode_offs>, modTbl=0x555555836260 <a_moduleEntries>, modTblInx=0)
    at build/101/rts/bc/bc/interpreter.c:1817
1817            FunctionInfo_Inx off = callinfos[i].functionInfoModOff ;

So, yeah... UHC doesn't properly align its pointers.

(To note: /tmp/a was compiled from a source file containing main :: IO (); main = return () by ehc-1.1.9.6, revision master@6eb59da933, timestamp 20170720 +0000 215924.)

atzedijkstra commented 7 years ago

Alignment may well be the problem. For the 64bit backend all size dependent codegen and/or C macros assume 64bit alignment, not 128bit. It will be in internal tables (for initializing interpreter info) holding 64bit values (likely pointers) where 64bit alignment might have to be enforced. It may also be you have to look at the codegen part which generates these tables, internal alignment also might have to be 128bit (currently alignment is enforced on the same size as used wordsize)

I am afraid I can (at this moment) not help you (am on vacation, and have no replica of your environment).

-- Atze Dijkstra

On 21 Jul 2017, at 00:30, Stijn van Drongelen notifications@github.com wrote:

Well, I think I found the problem after some debugging: gb_InitTables exhibits undefined behaviour. In particular, this function performs unaligned memory access.

The segfault occurs in a PADDQ instruction, which requires 128-bit alignment for memory operands. When I disable loop vectorisation for gb_InitTables (by adding attribute((optimize("no-tree-vectorize"))) to gb_InitTables, no other source changes), the problem disappears. However, similar problems reappear when I enable alignment checking, the program crashes with a bus error:

(gdb) b gb_InitTables Breakpoint 1 at 0xc5c70: file build/101/rts/bc/bc/interpreter.c, line 1799. (gdb) run Starting program: /tmp/a [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, gb_InitTables (byteCodes=0x555555856800 "\376\377\341\002", byteCodesSz=238997, cafGlEntryIndices=0x555555890da0 , cafGlEntryIndicesSz=324, globalEntries=0x5555558a4060 , globalEntriesSz=1642, consts=0x5555558a73c0 , gcStackInfos=0x555555891b20 , linkChainInds=0x555555a52040 , callinfos=0x5555558969e0 , callinfosSz=2193, functionInfos=0x555555892340 , functionInfosSz=1642, bytePool=0x555555836d40 "UHC.Base.primAsinDouble", linkChainOffset=2, impModules=0x555555a52040 , impModulesSz=0, expNode=0x555555a52200 , expNodeSz=529, expNodeOffs=0x5555558912c0 , modTbl=0x555555836260 , modTblInx=0) at build/101/rts/bc/bc/interpreter.c:1799 1799 { (gdb) set $ps |= (1<<18) (gdb) continue Continuing.

Program received signal SIGBUS, Bus error. gb_InitTables (byteCodes=0x555555856800 "\376\377\341\002", byteCodesSz=, cafGlEntryIndices=0x555555890da0 , cafGlEntryIndicesSz=324, globalEntries=, globalEntriesSz=, consts=0x5555558a73c0 , gcStackInfos=0x555555891b20 , linkChainInds=0x555555a52040 , callinfos=0x5555558969e0 , callinfosSz=2193, functionInfos=0x555555892340 , functionInfosSz=1642, bytePool=0x555555836d40 "UHC.Base.primAsinDouble", linkChainOffset=2, impModules=0x555555a52040 , impModulesSz=0, expNode=0x555555a52200 , expNodeSz=529, expNodeOffs=0x5555558912c0 , modTbl=0x555555836260 , modTblInx=0) at build/101/rts/bc/bc/interpreter.c:1817 1817 FunctionInfo_Inx off = callinfos[i].functionInfoModOff ; — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

ghost commented 7 years ago

128-bit alignment is enforced by the compiler based on the types the programmer promises.

The PADDQ loop is only the main course of the the whole copy buffet. The aperitif makes sure the Word-aligned pointer (64-bit on my machine) is advanced to a 128-bit aligned pointer. The dessert is Duff's device.

The problem is in the assumptions that underly this line:

WPtr loc = (WPtr)(&byteCodes[ linkChainOffset ]) ;

Note that byteCodes has type GP_BytePtr. It is 8-bit aligned, with no further promises. Unless the bytecode language has syntactical restrictions on alignment (e.g. JVM bytecode has this for jump tables) and you adjust the pointer to have the right alignment, any intended Word-sized memory access has to be implemented in explicitly in Byte-sized access statements. Upcasting a byte pointer to a word pointer is asking for problems, because the compiler will likely serve you the wrong aperitif.