Open OwenTrokeBillard opened 10 months ago
I am trying to make Enzyme compatible with Clang + MSVC. There are some stubborn linker errors. It fails to link the LLVM Support library. Does this look familiar to anyone?
FAILED: Enzyme/LLVMEnzyme-18.dll
cmd.exe /C "cd . && C:\source\llvm-project\build\bin\clang++.exe -fuse-ld=lld-link -nostartfiles -nostdlib -Wall -fno-rtti -Werror=unused-variable -Werror=dangling-else -O2 -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -Wl,--gc-sections -shared -o Enzyme\LLVMEnzyme-18.dll -Xlinker /MANIFEST:EMBED -Xlinker /implib:Enzyme\LLVMEnzyme-18.lib -Xlinker /pdb:Enzyme\LLVMEnzyme-18.pdb -Xlinker /version:0.0 Enzyme/CMakeFiles/LLVMEnzyme-18.dir/ActivityAnalysis.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/ActivityAnalysisPrinter.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/CApi.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/CacheUtility.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/CallDerivatives.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/DiffeGradientUtils.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/DifferentialUseAnalysis.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/Enzyme.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/EnzymeLogic.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/FunctionUtils.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/GradientUtils.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/InstructionBatcher.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/MustExitScalarEvolution.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/PreserveNVVM.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/TraceGenerator.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/TraceInterface.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/TraceUtils.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/Utils.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/SCEV/ScalarEvolutionExpander.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/TypeAnalysis/TypeTree.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/TypeAnalysis/TypeAnalysis.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/TypeAnalysis/TypeAnalysisPrinter.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/TypeAnalysis/RustDebugInfo.cpp.obj C:/source/llvm-project/build/lib/LLVMDemangle.lib -lkernel32 -luser32 -lgdi32 -lwinspool -lshell32 -lole32 -loleaut32 -luuid -lcomdlg32 -ladvapi32 -loldnames && cd ."
lld-link: warning: ignoring unknown argument '--gc-sections'
lld-link: error: undefined symbol: void __cdecl llvm::deallocate_buffer(void *, unsigned int, unsigned int)
>>> referenced by Enzyme/CMakeFiles/LLVMEnzyme-18.dir/ActivityAnalysis.cpp.obj:(void __cdecl `dynamic atexit destructor for 'InactiveGlobals''(void))
>>> referenced by Enzyme/CMakeFiles/LLVMEnzyme-18.dir/ActivityAnalysis.cpp.obj:(void __cdecl `dynamic atexit destructor for 'MPIInactiveCommAllocators''(void))
>>> referenced by Enzyme/CMakeFiles/LLVMEnzyme-18.dir/ActivityAnalysis.cpp.obj:(void __cdecl `dynamic atexit destructor for 'KnownInactiveFunctionInsts''(void))
>>> referenced 485 more times
(many more)
Enzyme/enzyme/CMakeLists.txt
:
-fPIC
compiler flag. This is not needed or supported on Windows.cmake_policy(SET CMP0091 NEW)
. This fixes the C runtime library link mismatch.add_compile_definitions(_ALLOW_KEYWORD_MACROS)
. This fixes the macro redefinition error.
$env:CC = "C:\path\to\llvm\bin\clang"
$env:CXX = "C:\path\to\llvm\bin\clang++"
$env:RC = "C:\path\to\llvm\bin\llvm-rc"
cd enzyme
mkdir build
cd build
cmake -G Ninja .. -DLLVM_DIR=C:/source/llvm-project/build/lib/cmake/llvm -DCMAKE_BUILD_TYPE:STRING=Release
ninja
Looks like unfortunately no one is too familiar with this issue. Do you want to join the next Enzyme meeting, so we can all try to debug this together? It would be Wednesday, 16:00 ET on Zoom: https://mit.zoom.us/j/96000853439
@ZuseZ4 Sure, it would be my pleasure. Thank you for the invite.
Building a clang that can accept enzyme on windows with MSVC is tricky because not everything that enzyme needs from clang/opt is exported in the export table.
I won't be much help until maybe tomorrow, but after a very cursory glance, you'll have to link against LLVMSupport.lib
(which should be handled automatically by LLVM's CMake modules, and you may be doing this already). You may also get better results using clang-cl
instead of clang++
, but that's tertiary to the issue at hand where a symbol isn't being found due to the support library being excluded from the linking process for some reason.
My apologies for the generic answer. I will be back in less than 24 hours with a (hopefully) better explanation and (again, hopefully), a solution – if no one else has solved it by then.
I'm updating my debug build of Clang+LLVM to 17.0.6. Once that build finishes, I'll clone the repo and take a much closer look. In the meantime, I'll look through the CMakeLists.txt
to see if there is anything super obvious that I didn't catch yesterday.
Okay, I think I see what's happening. I'm still building LLVM at the moment, but after taking a closer look at enzyme/Enzyme/CMakeLists.txt
:
If you're linking against LLVM.dll
(which is what the comment indicates), there is a very good chance you're running into a fundamental limitation of PE files: the exports table size is limited to 16 bits, and there are far more than 65536 symbols in LLVM. The one you're missing is likely in that list. Unfortunately, this means you'll have to link directly against LLVM's support library, and maybe even statically.
It should be as simple as adding Support
under LINK_COMPONENTS
, but I'm not sure how LLVM_LINK_LLVM_DYLIB
changes linking logic. LLVMSupport.dll
may not exist if LLVM was configured to build LLVM.dll
. If it fails, the static library should still be there, and you'll have to link against that.
llvm_map_components_to_libnames(llvmLibs Support)
target_link_libraries(LLVMEnzyme-${LLVM_VERSION_MAJOR} PRIVATE ${llvmLibs})
If that doesn't work, try manually linking without having LLVM "help" you:
target_link_libraries(LLVMEnzyme-${LLVM_VERSION_MAJOR}
PRIVATE LLVMSupport)
# Or:
target_link_libraries(LLVMEnzyme-${LLVM_VERSION_MAJOR}
PRIVATE LLVMSupport.lib)
I haven't looked at the Enzyme source code, but if it exposes anything from the support library, you'll have to do PUBLIC
linkage instead.
I gave a talk on building LLVM plugins for Windows under Visual Studio. You may not be building directly with MSVC, but even so, you may run into other problems getting add_llvm_library
to work with the toolchain -- and I briefly covered that. In my demo repo, I'm using add_library
directly -- though the "normal" way would be to use add_llvm_pass_plugin
. See lines 1 through 19.
If you do run into problems, you may have to change the function you use for defining your library targets -- specifically for Windows + MSVC.
@jvstech Thank you for looking at this issue. The linker errors persist with your suggestions, but your insights are still informative. I found your talk especially helpful and well presented. Your demo repo will also be a great resource.
We investigated LLVM.dll
and found that it does not exist. I believe this is because LLVM_LINK_LLVM_DYLIB
is not supported on Windows. Linking the libraries statically sounds reasonable, and I believe (or rather hope) this is what the build is attempting to do.
In your presentation you discuss many tricky environmental settings that must be configured correctly. It is plausible that one such issue is causing the linker error. If your environment is working already, it would be a great help if you could attempt to build Enzyme with Clang + MSVC. Though, you have done a great deal already and have given us plenty to work with.
@OwenTrokeBillard Not a problem. Happy to help in any way I can. I wasn't sure if LLVM.dll
was a thing, myself. I knew LLVM-C.dll
could be built, but obviously, that serves a different purpose.
It took a while (I had to re-configure LLVM with LLVM_ENABLE_PLUGINS
), but I got to the error you described. I was able to get past it by
MODULE
to SHARED
in the add_llvm_library
definitions, andtarget_link_libraries(LLVMEnzyme-${LLVM_VERSION_MAJOR} PUBLIC LLVMSupport)
That said, once I got past that missing symbol, there were several other libraries that that had to be added. Additionally, there were symbols the linker insisted were missing (from LLVMSupport.lib
, LLVMAnalysis.lib
, and LLVMCore.lib
, although I believe there were others), yet they were clearly defined in their associated .lib files, which could be verified with dumpbin /symbols
. For example, one of the "missing" functions was llvm::Function::BuildLazyArguments
. And yet:
─▷ dumpbin /symbols C:\LLVM\src\llvm\build\llvm-release\install\lib\LLVMCore.lib | select-string 'BuildLazyArguments'
<snip>
15F 00000000 SECT4B notype () External | ?BuildLazyArguments@Function@llvm@@AEBAXXZ (private: void __cdecl llvm::Function::BuildLazyArguments(void)const )
<snip>
It's very clearly listed with its defined section and with external linkage. The only thing I can think of is add_llvm_library
is doing something else to the build commands (indirectly) to cause those symbols to not be visible by the linker. And that's a very, very rough guess.
In the meantime, I'm going to try adding this function to the CMakeLists.txt
file and change all the add_llvm_library
uses to it:
function(add_enzyme_library targetName)
cmake_parse_arguments(ELIB
""
""
"LINK_COMPONENTS"
${ARGN})
if ((WIN32 OR MSVC) AND NOT CYGWIN AND NOT MINGW)
add_library(${targetName} SHARED ${ELIB_UNPARSED_ARGUMENTS})
if (ELIB_LINK_COMPONENTS)
llvm_map_components_to_libnames(ENZYME_WINDOWS_LLVM_LIBS ${ELIB_LINK_COMPONENTS})
endif()
if (LLVM_LINK_COMPONENTS)
list(APPEND ELIB_LINK_COMPONENTS ${LLVM_LINK_COMPONENTS})
endif()
if (ELIB_LINK_COMPONENTS)
target_link_libraries(${targetName} PUBLIC ${ENZYME_WINDOWS_LLVM_LIBS})
endif()
else()
add_llvm_library(${targetName} MODULE ${ELIB_UNPARSED_ARGUMENTS})
endif()
endfunction()
I'll let you know if I get any further.
By the way, you may be interested in this draft pull request with a prototype for building LLVM as DLLs on Windows. It's not done yet, but there is effort being made to get there.
AH HA! I figured it out!
And, specifically:
Do not do this. It's forbidden by the C++ standard because those access specifiers are part of the ABI. I know that's what the _ALLOW_KEYWORD_MACROS
definition is for, but unfortunately, that causes this particular error.
The giveaway was running lld-link --verbose
during linking. Note the error:
lld-link: error: undefined symbol: public: __cdecl llvm::BasicBlock::BasicBlock(class llvm::LLVMContext &, class llvm::Twine const &, class llvm::Function *, class llvm::BasicBlock *)
However, with verbose linker output, we can also see this:
lld-link: Loaded LLVMCore.lib(BasicBlock.cpp.obj) for private: __cdecl llvm::BasicBlock::BasicBlock(class llvm::LLVMContext &, class llvm::Twine const &, class llvm::Function *, class llvm::BasicBlock *)
The only difference is that the found symbol is private:
while the missing symbol is public:
.
I've removed the #define private public
and #undef private
code for further testing. I'm assuming there's a reason for doing it (and I assume that's to have access to private class state), but there must be a better work-around.
Thank you a lot for helping with this, having MSCV support will save us from workarounds on the Rust side.
@wsmoses Do you remember why this line was added?
It is needed for older LLVMs to extend scalar evolution with support for must exit semantics. This was integrated into LLVM, but versions before the import require it.
We can #if llvmversjon gate the define. For the rust folks that should be fine since your LLVM is recent regardless and it isn’t needed.
On Sat, Jan 20, 2024 at 8:21 PM Manuel Drehwald @.***> wrote:
Thank you a lot for helping with this, having MSCV support will save us from extra effort on the Rust side.
@wsmoses https://github.com/wsmoses Do you remember why this line was added?
— Reply to this email directly, view it on GitHub https://github.com/EnzymeAD/Enzyme/issues/1607#issuecomment-1902469799, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTUXHQYZOTWPR5IT373VDYPRURXAVCNFSM6AAAAABBTZO4M6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBSGQ3DSNZZHE . You are receiving this because you were mentioned.Message ID: @.***>
Or actually looking at it, maybe not since the integration clearly is gated on llvm 16 and adding it on the other side. What fails to build when removing it?
On Sat, Jan 20, 2024 at 8:39 PM Billy Moses @.***> wrote:
It is needed for older LLVMs to extend scalar evolution with support for must exit semantics. This was integrated into LLVM, but versions before the import require it.
We can #if llvmversjon gate the define. For the rust folks that should be fine since your LLVM is recent regardless and it isn’t needed.
On Sat, Jan 20, 2024 at 8:21 PM Manuel Drehwald @.***> wrote:
Thank you a lot for helping with this, having MSCV support will save us from extra effort on the Rust side.
@wsmoses https://github.com/wsmoses Do you remember why this line was added?
— Reply to this email directly, view it on GitHub https://github.com/EnzymeAD/Enzyme/issues/1607#issuecomment-1902469799, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTUXHQYZOTWPR5IT373VDYPRURXAVCNFSM6AAAAABBTZO4M6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBSGQ3DSNZZHE . You are receiving this because you were mentioned.Message ID: @.***>
Looks like the problem is that the exit limit type in scalar evolution is marked private in LLVM but used by us.
This should hopefully be a simple patch to llvm upstream.
2024-01-21T01:54:09.3771619Z /home/runner/work/Enzyme/Enzyme/enzyme/Enzyme/MustExitScalarEvolution.h:63:34: error: ‘using ExitLimitCacheTy = class llvm::ScalarEvolution::ExitLimitCache’ is private within this context 2024-01-21T01:54:09.3774131Z
On Sat, Jan 20, 2024 at 8:41 PM Billy Moses @.***> wrote:
Or actually looking at it, maybe not since the integration clearly is gated on llvm 16 and adding it on the other side. What fails to build when removing it?
On Sat, Jan 20, 2024 at 8:39 PM Billy Moses @.***> wrote:
It is needed for older LLVMs to extend scalar evolution with support for must exit semantics. This was integrated into LLVM, but versions before the import require it.
We can #if llvmversjon gate the define. For the rust folks that should be fine since your LLVM is recent regardless and it isn’t needed.
On Sat, Jan 20, 2024 at 8:21 PM Manuel Drehwald @.***> wrote:
Thank you a lot for helping with this, having MSCV support will save us from extra effort on the Rust side.
@wsmoses https://github.com/wsmoses Do you remember why this line was added?
— Reply to this email directly, view it on GitHub https://github.com/EnzymeAD/Enzyme/issues/1607#issuecomment-1902469799, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTUXHQYZOTWPR5IT373VDYPRURXAVCNFSM6AAAAABBTZO4M6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBSGQ3DSNZZHE . You are receiving this because you were mentioned.Message ID: @.***>
@jvstech Thank you for thoroughly investigating this. I never would have suspected redefining access specifiers was the culprit. You have saved me weeks of toiling.
Just building a patched LLVM+Enzyme to make sure there are no other issues.
Looks like the problem is that the exit limit type in scalar evolution is marked private in LLVM but used by us. This should hopefully be a simple patch to llvm upstream.
There are more issues than just that. There are several direct uses of private member variables and functions.
C:\Users\jsmith\source\repos\Enzyme\enzyme\Enzyme\MustExitScalarEvolution.cpp(123,18): error: 'DT' is a private member of 'llvm::ScalarEvolution'
123 | if (!Latch || !DT.dominates(ExitingBlock, Latch))
| ^
C:\Users\jsmith\source\repos\Enzyme\enzyme\Enzyme\MustExitScalarEvolution.cpp(176,18): error: 'howFarToZero' is a private member of 'llvm::ScalarEvolution'
176 | ExitLimit EL = howFarToZero(getMinusSCEV(LHS, RHS), L, ControlsOnlyExit);
| ^
C:\Users\jsmith\source\repos\Enzyme\enzyme\Enzyme\MustExitScalarEvolution.cpp(361,10): error: 'computeExitCountExhaustively' is a private member of 'llvm::ScalarEvolution'
361 | return computeExitCountExhaustively(L, ExitCond, ExitIfTrue);
| ^
C:\Users\jsmith\source\repos\Enzyme\enzyme\Enzyme\MustExitScalarEvolution.cpp(500,20): error: 'howManyGreaterThans' is a private member of 'llvm::ScalarEvolution'
500 | ExitLimit EL = howManyGreaterThans(LHS, RHS, L, IsSigned, ControlsExit,
| ^
C:\Users\jsmith\source\repos\Enzyme\enzyme\Enzyme\MustExitScalarEvolution.cpp(515,10): error: 'computeShiftCompareExitLimit' is a private member of 'llvm::ScalarEvolution'
515 | return computeShiftCompareExitLimit(ExitCond->getOperand(0),
| ^
C:\Users\jsmith\source\repos\Enzyme\enzyme\Enzyme\MustExitScalarEvolution.cpp(959,9): error: 'canIVOverflowOnLT' is a private member of 'llvm::ScalarEvolution'
959 | if (canIVOverflowOnLT(RHS, Stride, IsSigned) && !isUBOnWrap())
| ^
and so on.
This is one of those situations where I would make a copy of the required source and mark those members protected:
until a patch can be made to expose equivalent logic in LLVM proper. You won't be able to use the same names directly lest you end up with symbol collisions, but that can be worked around by either renaming the needed classes/structs/free functions, or wrapping everything in a custom namespace { ... }
. It'll require changes to MustExitScalarEvolution
regardless.
It's not perfect, but it's a sterile bandage for the time being.
@jvstech Thank you for thoroughly investigating this. I never would have suspected redefining access specifiers was the culprit. You have saved me weeks of toiling.
Not a problem. It's been a long weekend and I've had some spare cycles, so I'm happy to help.
Oh darn, yeah I think we'd just want to replace all private linkages with protected there, which also hopefully wouldn't be that bad.
However the long term solution is that MustExitScalarEvolution should just be upstreamed.
I'm think I fixed this but could someone with a windows system test this out to confirm? I made the (very minor) required change to LLVM in this branch, and the changes to the cmake txt listed in comment here
I'm testing on fedora with
cmake .. -GNinja -DLLVM_DIR=$HOME/Workspace/llvm-project/build/lib/cmake/llvm -DLLVM_EXTERNAL_LIT=(which lit) -DCMAKE_CPP_COMPILER=clang-cl
clang-cl
is supposed to be compatible with MSVC, so I expect it should build. let me know if it doesn't, or if you need more information on my end
@skewballfox I personally don’t have access to a windows machine. However GitHub actions offer windows runners.
@jedbrown iirc you had a windows machine, if you have some time, could you verify this builds please? Let's try without Rust for now, that should be easy to add later.
cc @matinraayai who was starting to look at some of the linkage upstreaming
@skewballfox once the linkages are fixed in upstream LLVM, the correct fix to Enzyme is to remove the define private public
things.
once the linkages are fixed in upstream
is there an issue I can look at to get a bit more context? sorry new to contributing to llvm/enzyme
once the linkages are fixed in upstream
is there an issue I can look at to get a bit more context? sorry new to contributing to llvm/enzyme
No there isn't an issue (besides this one), but essentially your LLVM commit needs to be merged upstream into LLVM. Supposing your fix exists in LLVM, the correct thing to do in Enzyme is to remove these defines (https://github.com/EnzymeAD/Enzyme/blob/26c43370e4d56730d0b00e4e3b2705b28f7df375/enzyme/Enzyme/Enzyme.cpp#L29). Or rather, the correct thing to do is to version gate them so the defines only happen on a version of LLVM without the fix
No there isn't an issue (besides this one), but essentially your LLVM commit needs to be merged upstream into LLVM.
I submitted a PR with the one line change to LLVM. I'll update you if the status changes
Hey, anyone know of anything projects downstream of enzyme making use of MustExitScalarEvolution
? I upstreamed the code from the function overrides into LLVM, and it passes LLVM's existing test, but MESE was never tested enzyme side
So I started looking throught the commit history and associated issues for MustExitScalarEvolution.cpp
to try to find code where this is being called. Can Anyone explain how @turbo
is leading to mustexit being called in #1691 ?
Alternatively, anyone have ideas, suggestions, or links that could be useful for writing test for MESE's code?
@turbo
comes from the Julia side, I wouldn't try running tests from Julia.
@wsmoses Can you help him here or on the llvm pr to get this code tested? ripgrep didn't reveal any meaningful tests tailored towards this code
I'll try to take a look in a few days, but as you saw we didn't have any explicit unit tests for the new functionality so you're going to need to write them by hand
Probably that involves forking the existing SE tests that I referenced in the LLVM PR, adding a flag to parse the must exit functionality, and then testing the new functionality behaves as expected
It would be beneficial for Enzyme to build with Clang + MSVC on Windows. One use case is Rust integration. See https://github.com/EnzymeAD/rust/issues/54.
Enzyme is presently cross-compiled for Windows with Clang + MinGW. With any luck, the only change required is updating
CMakeLists.txt
for Clang + MSVC compatibility.