Open Quuxplusone opened 10 years ago
Attached test-setcond-fp.out
(25887 bytes, application/octet-stream): mcjit lli execution of test-setcond-fp.ll with debug
Looks like Pavel may have hit something similar - see discussion at https://reviews.llvm.org/D32295
Greg Clayton also hit this recently, presumably also due to the ProcessAllSections flag.
I don't know ELF well, but this sounds like a broken relocation, or at the very least a busted assertion. If this is supposed to be a relative relocation then we should be making the assertion about the delta (since that's what's going to be written into the target location) rather than any of the inputs (which might be high addresses).
Looking at http://refspecs.linuxbase.org/elf/x86_64-abi-0.98.pdf , these don't
look like relative relocations. Possibly DWARF is using absolute relocations
and abusing an assumption that the DWARF sections will be linked as if at
address zero?
Roping echristo and dblaikie in, in case they happens to know anything about
this off the top of their heads. :)
Also, this looks like it may be related to
https://bugs.llvm.org/show_bug.cgi?id=15356 . From the discussion on that bug:
>> I believe you should be able to work around that problem by compiling with
>> large memory model, which prevents generation of the relocation in question.
>
>
> This doesn't seem to be the case.
>
> $ cat test.s
> test:
> .cfi_startproc
> .cfi_endproc
> $ as test.s -o test.o; readelf -r test.o
> R_X86_64_PC32
> $ llvm-mc -filetype=obj test.s -o test.o; readelf -r test.o
> R_X86_64_PC32
> $ llvm-mc -filetype=obj -code-model=large test.s -o test.o; readelf -r test.o
> R_X86_64_PC32
>
> That is, this particular relocation is always R_X86_64_PC32, regardless of
code model.
I am not sure this is the same problem we are talking about. in #2, we are
relocating the .eh_frame section relative to the .text section.
What I am speaking about is a relocation fully contained in the .debug_frame
section. The .debug_frame in my example has the following bytes:
00000: 14 00 00 00 ff ff ff ff 04 00 08 00 01 78 10 0c .............x..
00010: 07 08 90 01 00 00 00 00 1c 00 00 00 00 00 00 00 ................
00020: 00 00 00 00 00 00 00 00 0f 00 00 00 00 00 00 00 ................
00030: 41 0e 10 86 02 43 0d 06 A....C..
Which, if I am not mistaken, disassemble to this:
Contents of the .debug_frame section:
00000000 0000000000000014 ffffffff CIE
Version: 4
Augmentation: ""
Pointer Size: 8
Segment Size: 0
Code alignment factor: 1
Data alignment factor: -8
Return address column: 16
DW_CFA_def_cfa: r7 (rsp) ofs 8
DW_CFA_offset: r16 (rip) at cfa-8
DW_CFA_nop
DW_CFA_nop
DW_CFA_nop
DW_CFA_nop
00000018 000000000000001c 00000000 FDE cie=00000000
pc=0000000000000000..000000000000000f
DW_CFA_advance_loc: 1 to 0000000000000001
DW_CFA_def_cfa_offset: 16
DW_CFA_offset: r6 (rbp) at cfa-16
DW_CFA_advance_loc: 3 to 0000000000000004
DW_CFA_def_cfa_register: r6 (rbp)
Now, according to my analysis, the relocation entry is both relative to this
section (Relocations.begin()->first == 8, which is the index of the debug_frame
section), and it is being applied the the debug_frame section:
(const llvm::RelocationEntry &) RE = 0x00000000007c0918: {
SectionID = 0x00000008
Offset = 0x000000000000001c
RelType = 0x0000000a
Addend = 0x0000000000000000
= {
SymOffset = 0x0000000000000000
Sections = (SectionA = 0x00000000, SectionB = 0x00000000)
}
IsPCRel = 0x00
Size = 0x00000000
IsTargetThumbFunc = 0x00
}
Offset 0x1c is the location of the CIE pointer in the FDE entry. This sort of
makes sense as the Relocations Addend is zero, and the CIE is located at offset
zero from the start of the section. However, the dwarf spec says that this is a
relative offset of the CIE, and not an absolute value (presumably exactly for
the reason of avoiding relocations). So, I believe that either this relocation
should not be emitted altogether, or it should be some relative relocation to
make it independent of the actual load address of the section.
At least that's my interpretation of the data -- I can't say that I am a
linking or a debug info expert though...
Sorry - I quoted the wrong part of PR15356. In the first comment on that bug Andy Kaylor mentions running into assertions on the R_X86_64_32/R_X86_64_32S relocations, which are the same ones we're hitting here.
I was trying to wrap my head around why we'd have these intra-section relocations in the debug info at all, but I guess it's for fixing up FDE sections when they're merged in the linker. It's still not clear to me why R_X86_64_32/R_X86_64_32S would be the right relocation for this though.
Offset 0x1c is the location of the CIE pointer in the FDE entry. This sort of makes sense as the Relocations Addend is zero, and the CIE is located at offset zero from the start of the section. However, the dwarf spec says that this is a relative offset of the CIE, and not an absolute value (presumably exactly for the reason of avoiding relocations). So, I believe that either this relocation should not be emitted altogether, or it should be some relative relocation to make it independent of the actual load address of the section.
At least that's my interpretation of the data -- I can't say that I am a linking or a debug info expert though...
I think you're right. I'm going to ask around over here and see any anyone knows DWARF/ELF well enough to give a definitive answer, but if that's the case the solution is probably to recognize these relocations when we see them in FDE sections (debug_info, eh_frame) and treat them as a relative relocations instead.
I was seeing a relocation on the first FDEs offset field that points to the offset of the CIE for this FDE. This section offset is known to be relative to the start of the ".debug_frame" section. Not sure why there is a relocation on this offset? At least we do need or want it applied in the debugger...
The .debug_frame is almost the same as EH-frame. They have the same contents except CIE IDs are encoded differently: DWARF uses 0xffffffff and EH-frame uses 0. The FDE's CIE_pointer is a relative offset when in EH-frame and just the offset into the .debug_frame section for .debug_frame. Maybe this is where things are going wrong? We have a relocation when there shouldn't be any?!?
Ak, so it looks like there is already support for *not* emitting those
relocations, controlled by MCAsmInfo::DwarfUsesRelocationsAcrossSections.
However, currently that is only being set to false on darwin. Hard-coding that
flag to false makes my lldb test pass, but unsurprisingly makes some llvm tests
fail (although not as many as one would expect). It also makes the debug info
produced by clang completely broken, as lldb can no longer debug any executable
produced by the patched compiler.
So, I guess the fix for this needs to be elsewhere. It looks like the
"DwarfUsesRelocationsAcrossSections" flag is there to account for different
interpretations of dwarf spec across platforms and MCJIT needs to be fixed to
link these sections more like the platform's native linker, which I guess means
linking them as if their load address was 0, as Lang suggested in #2 (for a
linked executable, readelf reports load address of the debug sections as zero,
and no relocations, which I guess means they got fully resolved at link time
against that address).
What is not clear to me is whether this behaviour can be completely hidden
within MCJIT, or does it need more help from lldb (in the form of explicitly
setting the load address of these sections to zero).
Any thoughts?
So this isn't an "across section" relocation. The FDE, which is in the .debug_frame, has a .debug_frame offset to a CIE which is also in the .debug_frame section. So I repeat my question: why do we have a relocation here? Nothing ever needs to be fixed up when emitting the .debug_frame section for DWARF. Something might need to be fixed up in EH frame as the FDE has a relative offset, but not in the .debug_frame case. Why is a setting named DwarfUsesRelocationsAcrossSections() controlling in section stuff?
There should be a relocation in .o files for when the DWARF needs gets eventually get linked, but when we have a final executable ELF file, there doesn't need to be one. Any fixup applied to this will fail and make things wrong.
(In reply to Greg Clayton from comment #8)
> So this isn't an "across section" relocation. The FDE, which is in the
> .debug_frame, has a .debug_frame offset to a CIE which is also in the
> .debug_frame section. So I repeat my question: why do we have a relocation
> here? Nothing ever needs to be fixed up when emitting the .debug_frame
> section for DWARF. Something might need to be fixed up in EH frame as the
> FDE has a relative offset, but not in the .debug_frame case. Why is a
> setting named DwarfUsesRelocationsAcrossSections() controlling in section
> stuff?
The reason I started talking about cross-section relocations was that after
getting rid of this one, I was running assertion failures on other relocations,
and these were real cross-section relocations (e.g. the debug_info pointer in
debug_pubnames). So I think this needs to be solved at a higher level, and if
we link these sections as if their load-address will be zero then the FDE intra-
section problem will go away as well.
Even cross section relocations don't really need to be applied in the DWARF when everything is fully linked in a dylib or executable, but they are needed in a .o files. All section references refer to offsets. No relocation should be moving any offsets into other sections in DWARF. They are understood to be offsets into other sections. When would we ever want to take a 32 bit offset to another section and add the 64 bit address that the section is loaded at, and expect it to truncate to 32 bit and be the same?
(In reply to Greg Clayton from comment #11)
> Even cross section relocations don't really need to be applied in the DWARF
> when everything is fully linked in a dylib or executable, but they are
> needed in a .o files. All section references refer to offsets. No relocation
> should be moving any offsets into other sections in DWARF. They are
> understood to be offsets into other sections. When would we ever want to
> take a 32 bit offset to another section and add the 64 bit address that the
> section is loaded at, and expect it to truncate to 32 bit and be the same?
Right, I agree with that. But here, it's the MCJIT who is playing the role of
the linker producing an "elf executable" out of an ".o file", which is why I
said in #7 that this needs to be fixed there.
I think we're in agreement here, actually. We just need to find how to fix the
problem.
(In reply to labath from comment #12)
> (In reply to Greg Clayton from comment #11)
> > Even cross section relocations don't really need to be applied in the DWARF
> > when everything is fully linked in a dylib or executable, but they are
> > needed in a .o files. All section references refer to offsets. No relocation
> > should be moving any offsets into other sections in DWARF. They are
> > understood to be offsets into other sections. When would we ever want to
> > take a 32 bit offset to another section and add the 64 bit address that the
> > section is loaded at, and expect it to truncate to 32 bit and be the same?
>
> Right, I agree with that. But here, it's the MCJIT who is playing the role
> of the linker producing an "elf executable" out of an ".o file", which is
> why I said in #7 that this needs to be fixed there.
>
> I think we're in agreement here, actually. We just need to find how to fix
> the problem.
Maybe the fix could actually be to set "DwarfUsesRelocationsAcrossSections" to
false, but only when producing an object file for consumption in MCJIT ? MCJIT
is producing an "so" file, so the sections don't need to be linked with
anything else (I hope).
Lang?
As long as we only enable this for MCJIT when we know we don't want these relocations, then disabling it might help. But I fear we will run into a never ending source of new relocations popping up if we don't do it right. Can we identify certain relocation types that the MCJIT should just not apply and ignore based on a relocation type? Seems like any 32 bit relocations where an address needs to be added and where the truncated 32 bit value needs to be the same as addr + addend is flawed for MCJIT scenarios. Probably even in other sections like EH frame...
Lange: any ideas on the right solution here? Are we truly hacking .o files to run as if they were executables?
> Maybe the fix could actually be to set "DwarfUsesRelocationsAcrossSections"
to false,
> but only when producing an object file for consumption in MCJIT ? MCJIT is
producing
> an "so" file, so the sections don't need to be linked with anything else (I
hope).
>
> Lang?
MCJIT is producing (and should be able to ingest) raw *.o files - we want to
avoid special-casing the pipeline config for MCJIT where we can.
> Lange: any ideas on the right solution here? Are we truly hacking .o files to
run as
> if they were executables?
We truly are. :)
> Can we identify certain relocation types that the MCJIT should just not apply
and
> ignore based on a relocation type?
We could have RuntimeDyldELF skip relocations for .debug sections entirely
(while still passing them to the memory manager) if LLDB knows how to fix up
unrelocated debug sections. Or we could have it skip known "weird" relocations
within debug sections.
Alternatively we could teach RuntimeDyldELF how to apply the necessary
relocations and help clients to map debug sections to '0' (by supplying a
utility to help with that, or by making it the default behaviour for known
debug sections).
Attached RuntimeDyld.patch
(1362 bytes, text/plain): A possible fix
(In reply to Lang Hames from comment #15)
> MCJIT is producing (and should be able to ingest) raw *.o files - we want to
> avoid special-casing the pipeline config for MCJIT where we can.
That makes sense.
> We could have RuntimeDyldELF skip relocations for .debug sections entirely
> (while still passing them to the memory manager) if LLDB knows how to fix up
> unrelocated debug sections. Or we could have it skip known "weird"
> relocations within debug sections.
>
> Alternatively we could teach RuntimeDyldELF how to apply the necessary
> relocations and help clients to map debug sections to '0' (by supplying a
> utility to help with that, or by making it the default behaviour for known
> debug sections).
I like the last solution the most. I think the fix could be extremely simple
even -- it seems that (on linux, at least) all sections which do not have the
SHF_ALLOC flag have a load address of zero. So maybe the RuntimeDyld just needs
to emulate this behavior, which is what my patch in #16 attempts to do. The
patch could be beautified a bit, but I'm showing it here as a proof of concept
mostly. It may also be worth checking whether this is correct for COFF targets
(Mach-O targets should be unaffected as "isRequiredForExecution seems to return
always true there").
What do you think?
Looks good to me. :)
It would be good to add an llvm-rtdyld testcase to very that applying this relocation to a debug section behaves as if the section had been explicitly mapped to zero.
(In reply to Lang Hames from comment #18)
> Looks good to me. :)
>
> It would be good to add an llvm-rtdyld testcase to very that applying this
> relocation to a debug section behaves as if the section had been explicitly
> mapped to zero.
Ok, I'll get on it (it might take a while, as I'm pretty busy this week).
(In reply to labath from comment #19)
> (In reply to Lang Hames from comment #18)
> > Looks good to me. :)
> >
> > It would be good to add an llvm-rtdyld testcase to very that applying this
> > relocation to a debug section behaves as if the section had been explicitly
> > mapped to zero.
>
> Ok, I'll get on it (it might take a while, as I'm pretty busy this week).
Fix with the test is in D32899, although I am not sure how much you will like
the test vector. :)
test-setcond-fp.out
(25887 bytes, application/octet-stream)RuntimeDyld.patch
(1362 bytes, text/plain)