Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Duplicate definition of symbol error to do with mergable comdat constant symbols on Windows/COFF #39046

Closed Quuxplusone closed 4 years ago

Quuxplusone commented 5 years ago
Bugzilla Link PR40074
Status RESOLVED FIXED
Importance P normal
Reported by Machiel van Hooren (jcmac@machiel.info)
Reported on 2018-12-18 03:45:26 -0800
Last modified on 2020-01-22 15:03:10 -0800
Version trunk
Hardware PC Windows NT
CC 1101.debian@gmail.com, frank.laub@intel.com, lhames@gmail.com, llvm-bugs@lists.llvm.org, v.churavy@gmail.com
Fixed by commit(s)
Attachments coff-comdat-constants-workaround.patch (8587 bytes, text/plain)
Blocks
Blocked by
See also
It happens when the same literal is defined twice, each in a different module
and after the second module has been added to the RTDyldObjectLinkingLayer.

So for example in some module:

define float @someFunction() {
entry:
  ret float 2.000000e+00
}

In some other module:
define float @someOtherFunction() {
entry:
  ret float 2.000000e+00
}

When materializing the second function, you will get this error:

JIT session error: Duplicate definition of symbol '__real@40000000'

The symbol names come from comdat symbols that are created in
TargetLoweringObjectFileImpl.cpp
TargetLoweringObjectFileCOFF::getSectionForConstant

For COFF, I have the OverrideObjectFlagsWithResponsibilityFlags and
AutoClaimResponsibilityForObjectSymbols set to true on the
RTDyldObjectLinkingLayer or my function symbols are not found at all.

The same code compiles and runs just fine on Linux/Elf, even with the COFF
workarounds enabled there as well.

A workaround I found is to set HasCOFFComdatConstants to false in
MCAsmInfoCOFF.cpp
Quuxplusone commented 5 years ago

This bug is still reproducible with ORCv2

Quuxplusone commented 5 years ago
Hi Machiel,

Oh -- this is awful. :)

Does it manifest with the same error on ORCv2? Or with a different error?

The relevant parts of RTDyldObjectLinkingLayer/RuntimeDyld's algorithm look
something like this:

  (1) scan the object file, noting weak definitions in a side table
  (2) find out which weak definitions we are responsible for
  (3) perform the link
  (4) if auto-claim is on, add all provided definitions to the definition set
  (5) register definitions

The issue likely has something to do with the fact that determine
responsibility for materializing weak defs in (2), but don't claim it until
(4). Though I would have expected this to lead to a missing definition error,
rather than a duplicate.

We probably need to add a new API to MaterializationResponsibility:
"defineMaterializingWeak". Whereas "defineMaterializing" generates an error on
duplicates, "defineMaterializingWeak" would silently continue without adding
duplicates to the symbol table.

Could you include the target triple that you're seeing in your test cases so
that I can make sure I match your setup?
Quuxplusone commented 5 years ago
Hi Lang,

Yes, on ORCv2 it prints the same error message to the console and then crashes
shortly afterwards.
My target tripple is x86_64-pc-windows-msvc.

The crash happens in RTDyldObjectLinkingLayer::onObjEmit when it tries to
destroy a MemoryBuffer (the ObjBuffer).

Let me know if you need anything else. I could create a minimal repro case
using the Kaleidoscope examples if you want.
Quuxplusone commented 4 years ago

I have marked this bug as a release blocker because it causes a crash when using ORC on Windows in a very trivial case.

Quuxplusone commented 4 years ago

Attached coff-comdat-constants-workaround.patch (8587 bytes, text/plain): Work around duplicate defs from COFF constant pool entries.

Quuxplusone commented 4 years ago

Setting priority back down. This isn’t ideal, but it’s not a release blocker: ORCv1 and MCJIT are still both in LLVM10.

Confirming the earlier analysis: __real@40000000 is (a COMDAT “any” symbol) for a constant pool entry that is created by MC. We’re failing when two materialization units create the same symbol late in the pipeline and both try to register it.

An ideal solution to this would involve teaching the JIT, libObject, and the JIT-linker about COMDAT symbols. That’s not going to happen for a while though.

In the mean time I have attached a patch for an possible workaround. This patch modifies defineMaterializing to support defining new weak symbols (we’ll want to take this part either way), then adds a hack to RTDyldObjectLinkingLayer to detect COFF objects, look for newly defined symbols, and then mark any newly defined symbols in COMDAT sections as weak.

Machiel — Could you see if this patch fixes your issue? Unfortunately I don’t have a windows machine to test execution of COFF objects.

— Lang.

Quuxplusone commented 4 years ago
(In reply to Lang Hames from comment #6)
> Setting priority back down. This isn’t ideal, but it’s not a release
> blocker: ORCv1 and MCJIT are still both in LLVM10.
>
> ....
>
> Machiel — Could you see if this patch fixes your issue? Unfortunately I
> don’t have a windows machine to test execution of COFF objects.
>
> — Lang.

Roger on it not being a release blocker. I'm not really familiar with the
criteria for what should warrant a release blocker, sorry about that.

I have tested your patch and everything seems to work fine so great success!
I've also verified that the issue still exists in unpatched ORC.
Quuxplusone commented 4 years ago

Hi Machiel,

Can you confirm that 84217ad66115cc31b184374a03c8333e4578996f fixes this bug?

It looks like this might be related to http://bugs.llvm.org/PR44337 -- I want to make sure I figure out what has been fixed and what's still broken.

-- Lang.

Quuxplusone commented 4 years ago
Hi Lang,

I've tested my repro case on Windows with both debug and release builds without
issue, using the unmodified current master branch.

I have also tested the IR posted in http://bugs.llvm.org/PR44337 by loading the
IR from a file, adding the module to a dylib, then getting the address to the
'calculate' function and executing it.
This also works without problems, although there is an access violation when
executing the compiled IR due to the usage of @cachedValue. (store double %4,
double* @cachedValue, align 8). Removing that line will produce valid output.

Machiel
Quuxplusone commented 4 years ago

I must add that I do not use LLJIT but my own JIT implementation using ORC.

Quuxplusone commented 4 years ago

Great! Thank you for checking this out Machiel.

Closing as fixed by 84217ad6611.