Closed jasigal closed 6 years ago
Of course, no problem!
Glad to see it is actually reproducible (on some architectures). Would you be open to a PR with my suggestion above? I don't have one at the moment, but if in your opinion it's worth it (since who knows how long it could take to get a fix in LLVM) I'd be up for trying. If so, what's the best way to add it if you know off the top of your head?
I think this is a bug in the object code loader (at least for ELF).
The following works for me; if you could confirm that would be great:
main.c
#include <stdio.h>
#include <stdint.h>
void map_1bf1a791bb25369d(int64_t start, int64_t end, double* out, const double* in);
int main(int argc, char** argv)
{
double input = 1.0;
double result;
map_1bf1a791bb25369d(0,1, &result, &input);
printf("result = %.16lf\n", result);
return 0;
}
Save the generated .ll or .s from Accelerate and compile with:
$ clang-5.0 -O3 -lm -Wall main.c map_1bf1a791bb25369d.s
$ ./a.out
result = 0.2384058440442351
objdump -d
should still show the offending instruction (here).
I can confirm that the code above works. Guess I should stop being so certain about compiler bugs when I don't know what I'm talking about 🤔 Does this mean the problem is most likely on the Accelerate side?
NB: One thing I noticed when working with this issue is that changing the -code-model
option to llc
had an effect on whether or not addressing relative to %rip
was used. From my understanding that's an "obvious" statement and I don't know if that's useful.
haha. I mean technically you were correct, just about the wrong compiler 😅
I am guessing that the bug is somewhere in (or very closely related to) this file.
I've never experimented with the different code-model
options actually; I don't know if that would be useful either.
How important of an issue does this appear to be? I could attempt to poke around to see if I can figure out what's going wrong in the ELF object loader, but I really don't have any experience in the area. If this issue rates high enough for you or another developer/maintainer to tackle, I doubt I'd be useful help. However, if it doesn't rate high enough or everyone has other time commitments, I'm happy to attempt a fix! One of the core specs of the project I'm working on is CPU support, so it's not an unproductive use of my time.
If you want to take a look that would be great! If you can't make heads or tails of it don't feel bad about giving up though, it could be a bit hairy.
Looking at the -ddump-ld
output, I'm curious why the second line there doesn't have the symbol name like the first line does. Figuring that out might be a good place to start looking.
[ 0.117] ld: .LCPI0_0: local symbol in section 4 at 0x00
[ 0.117] ld: : local symbol in section 4 at 0x00
[ 0.117] ld: map_1bf1a791bb25369d: local symbol in section 2 at 0x00
[ 0.117] ld: tanh: external symbol found at 0x00007fd29327a970
[ 0.117] ld: section 2: Mem: 0x000000000-0x000000064 .text
[ 0.117] ld: section 4: Mem: 0x000000068-0x000000078 .rodata.cst8
[ 0.117] ld: relocation: 0x0024 to symbol 1 in section 2, type=R_X86_64_PC32, value=.LCPI0_0-4
[ 0.118] ld: relocation: 0x0038 to symbol 2 in section 2, type=R_X86_64_32S, value=+8
[ 0.118] ld: relocation: 0x003d to symbol 4 in section 2, type=R_X86_64_PC32, value=tanh-4
Yeah, I'd noticed that as well. When I compiled the LLVM IR manually, the symbols showed up with nm
and don't show (except as debug) when autogenerated.
I'm currently looking at the generated assembly for the following combination of choices to llc-5.0
:
-relocation-model
: pic
, static
(default)-code-model
: small
(default), medium
, large
, kernel
It appears that adding the PIC option (or kernel and PIC) emits promising assembly, and additionally the large code model seems to produce code which looks correct (although reading it seems this has a performance cost).
My plan at the moment is to:
-relocation-model
options).My reason for this approach is because I feel it will be quicker than understanding all the assumptions in the ELF loading code (unless you have that written up somewhere!). Would a fix produced above be accepted by you, or would you consider this too much of an ad-hoc fix?
Possibly scratch this approach, it seems the PIC options involve R_X86_64_PLT32
relocation types, which aren't handled.
Does changing the code/relocation model really affect the generated assembly that much? (I am concerned that that wouldn't address the underlying issue, but it is possible there are better options for what we are trying to do with the result)
I'll have a look at the ELF loader and see what is wrong there. It isn't really documented, I guess, but it doesn't really do anything special either.
When PIC is not enabled:
generated:no-PIC:kernel.o: file format elf64-x86-64
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000000000000025 R_X86_64_32S .rodata.cst8
0000000000000039 R_X86_64_32S .rodata.cst8+0x0000000000000008
000000000000003e R_X86_64_PC32 tanh-0x0000000000000004
generated:no-PIC:large.o: file format elf64-x86-64
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000000000000020 R_X86_64_64 .rodata.cst8
0000000000000034 R_X86_64_64 tanh
generated:no-PIC:medium.o: file format elf64-x86-64
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
000000000000001e R_X86_64_64 .rodata.cst8
0000000000000049 R_X86_64_PC32 tanh-0x0000000000000004
generated:no-PIC:small.o: file format elf64-x86-64
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000000000000024 R_X86_64_PC32 .LCPI0_0-0x0000000000000004
0000000000000038 R_X86_64_32S .rodata.cst8+0x0000000000000008
000000000000003d R_X86_64_PC32 tanh-0x0000000000000004
When PIC is enabled:
generated:PIC:kernel.o: file format elf64-x86-64
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000000000000024 R_X86_64_PC32 .LCPI0_0-0x0000000000000004
0000000000000037 R_X86_64_PC32 .LCPI0_1-0x0000000000000004
000000000000003c R_X86_64_PLT32 tanh-0x0000000000000004
generated:PIC:large.o: file format elf64-x86-64
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000000000000020 R_X86_64_64 .rodata.cst8
0000000000000034 R_X86_64_64 tanh
generated:PIC:medium.o: file format elf64-x86-64
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
000000000000001e R_X86_64_64 .rodata.cst8
0000000000000049 R_X86_64_PLT32 tanh-0x0000000000000004
generated:PIC:small.o: file format elf64-x86-64
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000000000000024 R_X86_64_PC32 .LCPI0_0-0x0000000000000004
0000000000000037 R_X86_64_PC32 .LCPI0_1-0x0000000000000004
000000000000003c R_X86_64_PLT32 tanh-0x0000000000000004
And there are some small changes (but still involving the unhandled relocation types) when -relocation-model
is rwpi
or ropi
.
I think I've figured out some of the issues. I don't have time at the moment to write everything down (and I don't have it completely figured out), but I should have time tomorrow.
One issue is this function which overwrites the old symbol value with the new one. It assumes that all addresses are 32 bits by casting the PC pointer to a Word32
pointer. This causes, for example, R_X86_64_64
relocations to be done incorrectly if the upper 32 bits are needed.
Another issue is that the R_X86_64_32S
relocations are done incorrectly, and I believe can't be done correctly with the current object loading method. This source describes the assumptions of the small (default) code model as:
the program and its symbols must be linked in the lower 2 GB of the address space. Pointers are 64 bits. Programs can be statically or dynamically linked. This is the default code model.
(although for some reason the pointers for vfnmsub213sb
seem 32 bit, albeit the source references gcc). However, when I run my program the segment created by the backend is Segment 4096 0x00000042001db000
, which is well above 32 bit addresses. Hence, even the jump table is out of range. From what I see there isn't a way to ask Haskell to allocate a pinned byte array (generated through mallocPlainForeignPtrAlignedBytes
) in a specific memory range, so I don't know how to solve the absolute 32 bit relocation issues.
@tmcdonnel , are you going to be at POPL next week?
@tmcdonell even (sorry, typing from my phone)
@ohad I will not be at POPL this year ):
@jasigal okay, I should (finally) have a chance to look into this more this afternoon. sorry for the delay 😓
I still think small is fine here; we just need the IP-relative jumps to be within ±2GB. Anything outside that is going to be a function call for us (e.g. to shared library functions like tanh
, memset
...), which (if necessary) will go via the jump tables. Anyway, I'll investigate further...
@tmcdonell no worries --- I won't be there either this year (visa refused :( ) --- but @adscib will be there, so if you two think we should lobby the Haskell architects for some changes, we can try to do that ;)
@jasigal could you see if this fix works for you? https://github.com/tmcdonell/accelerate-llvm/commit/77078de7241ddd2e5fd6620fbcea2cfdc02cc2a6
eer, wait, never mind. that worked for me for a while then stopped \:
Yeah, I can confirm this doesn't work for me ☹️
@tmcdonell I agree that IP-relative jumps should be fine. Those ones, like the one for tanh
, seem to work fine as is, it's the absolute relocations which want to reference memory addresses outside unsigned 32 bit range which I think are the issue.
Using mmap
to allocate memory explicitly in the lower 2GB of memory allows my program to run, see this commit jasigal/accelerate-llvm@ae5bf3e176383e9d4de0c98c48adac02d222a568 and the one before it on the fix/#409
branch. I'm going to run tests now to check if the changes to the relocation function don't mess anything up.
Well, all the tests passed! (How long are they supposed to take anyway, it's ~20 minutes for the accelerate-llvm-native
tests on my computer which has an 8 core i7)
Yeah, the tests take ages, unfortunately. Last I checked I think it was spending all its time in either (a) generating the random data; or (b) running the reference implementations (which mostly just use lists). I would like to speed it up though...
I fixed this by setting relocation-model=dynamic-no-pic
, which I think is more appropriate to how we are using the result. I pulled in your mmap
patch too though (it might not be necessary to have both fixes, but if nothing else I think it's fair you get your name on a patch (: )
Thanks for your help on this!
No worries, and thanks for the fix, everything seems to be working now! 😄
Hello, I am submitting a bug report for
accelerate-llvm-native
(based on what I saw issues for it were moved/resubmitted/double submitted here?). I've created a gist describing the issue and containing the relevant files and output in addition to information here.I believe this bug may actually be with LLVM itself, and I plan to file a bug there as well, but I think there is a workaround that can be done in
accelerate-llvm-native
in the mean time.Description
The following Accelerate code SIGSEGV's by trying to access an invalid address.
This issue first occurred while I was writing some simple numeric functions for automatic differentiation and the bug caused my code to unpredictably crash.
Expected behaviour
The expected behaviour is the program should output
0.23840584404423515
(= 1 + tanh(-1)
).Current behaviour
Running the program currently produces
“stack exec -- accelerate-llvm-b…” terminated by signal SIGSEGV (Address boundary error)
.Possible solution
See the gist for a detailed description, but a workaround would be to turn off (or give the ability to turn off) FP contract option (
-fp-contract
forllc
andopt
). Compiling the generated LLVM IR without this option by hand produced an object which, when placed in Accelerates cache, produces the expected behaviour.Steps to reproduce
accelerate-llvm-bug.cabal
,stack.yaml
, andMain.hs
) and place them in the same folder (e.g. download the ZIP and unzip it).stack build
.stack exec accelerate-llvm-bug
.Your environment
Haskell package versions are all from Stack LTS 10.2.
accelerate-1.1.1.0
accelerate-llvm-1.1.0.0
,accelerate-llvm-native-1.1.0.1
Misc. versions of other tools:
Thank you, and don't hesitate to ask for more info!