B-Lang-org / bsc

Bluespec Compiler (BSC)
Other
902 stars 141 forks source link

[Bluesim] Simulation executable fails with `undefined symbol: _Z21vcd_write_scope_startP9tSimStatePKc` #704

Closed Vekhir closed 1 month ago

Vekhir commented 1 month ago

Hi, compiling a BSV program using bsc and the bluesim backend, the following error occurs:

Error: dlopen: ./out.so: undefined symbol: _Z21vcd_write_scope_startP9tSimStatePKc
    invoked from within
"sim load $model_name $top_module"
    (file "/opt/bluespec/lib/tcllib/bluespec/bluesim.tcl" line 188)

Solution

The GCC option -ffat-lto-objects needs to be added when building with LTO enabled (See also here).

Investigation

From my testing, the issue occurs consistently and is repeatable. The missing symbol is also always the same for the given example, though it does vary depending on the program (e.g. with a different program, _ZN8WideDataD1Ev was missing, error is otherwise identical). Using c++filt to decode the symbols yields vcd_write_scope_start(tSimState*, char const*) and WideData::~WideData() This only affects the BlueSim backend, the Verilog backend works as expected.

~The issue seems correlated with a syb upgrade. Are you able to reproduce the error when building with syb 0.7.2.4?~

Edit: Further investigation points to LTO (Link-Time-Optimisation) being an issue. Disabling LTO via the !lto option in the PKGBUILD fixes the issue.

Steps to reproduce

  1. Copy examples/smoke_test/FibOne.bsv into a local directory.
  2. Run bsc -u -sim -simdir . -bdir . -info-dir . -keep-fires -p %/Libraries -g mkFibOne FibOne.bsv
  3. Run bsc -e mkFibOne -sim -o ./out -simdir . -p %/Libraries -bdir . -keep-fires
  4. Run the simulation executable: ./out

System information

OS: Arch Linux Kernel: 6.7.4.arch1-1 GHC: 9.0.2-3 GCC: 13.2.1-5 glibc: 2.39-1 haskell-split: 0.2.5 bsc: e6f95a7c47ac884dc74f843c4fc8fa29881b7407 (current master)

As a note, the above versions were obtained via the Arch Linux Archive. The last known working snapshot is 2024-02-07, while the first known failing snapshot is 2024-02-08.

Appendix

error.log

quark17 commented 1 month ago

In my experience, this error occurs when you are using a BSC that was compiled with different C++ compiler version (or different C++ compiler flags) than you are currently using.

When you use BSC to generate a Bluesim executable, BSC generates C++ code, compiles that code, and then links it with the Bluesim kernel library in BSC. The error occurs when the C++ compiler you are currently using is unable to find the symbols in the pre-compiled Bluesim kernel library, because the kernel library was encoded with a different ABI version than what your current C++ compiler is expecting.

This can happen when you compile BSC with GCC v3 and then try to run BSC with GCC v4 (for example). Or it can happen if you've given a flag to GCC to tell it to use a different ABI than its default (either when compiling BSC or when running BSC). For example, the following flag:

 -D_GLIBCXX_USE_CXX11_ABI=0

My suggestion is to double-check that you are using a BSC that was compiled in the same environment that you are running in, and check that the above flag has not been provided.

I don't believe that GHC packages like syb have any connection to this error.

Vekhir commented 1 month ago

I've done some further investigation that points to a pacman update that I missed being at fault. Likely stripping the package too much.

The environment is always consistent between compilation and testing the issue. Updating, rebuilding, reinstalling, and rebooting didn't help. The flag you are pointing to seems to be for compatibility, so I don't think that's enabled on Arch, which generally uses the latest features available. The syb update was a coincidence, perhaps should've asked whether you'd seen the error before, but you've answered it nonetheless, so thanks for it.

Anyway, seems to be an Arch specific packaging issue, therefore I'm closing this.

Vekhir commented 1 month ago

Vekhir said: I've done some further investigation that points to a pacman update that I missed being at fault.

Yep, that update enabled LTO by default. Disabling LTO fixes the issue.

quark17 said: When you use BSC to generate a Bluesim executable, BSC generates C++ code, compiles that code, and then links it with the Bluesim kernel library in BSC. The error occurs when the C++ compiler you are currently using is unable to find the symbols in the pre-compiled Bluesim kernel library, because the kernel library was encoded with a different ABI version than what your current C++ compiler is expecting.

I guess LTO falls under different ABI. To make this compatible with LTO, bsc would also need to pass in -flto to the compiler, wouldn't it? Probably easier to just disallow LTO.

quark17 commented 1 month ago

If you want to pass additional flags to the C or C++ compiler, you can do so by setting BSC_CFLAGS or BSC_CXXFLAGS in the environment. You also set BSC_OPTIONS to add flags to the BSC command line. So if you need BSC to pass -flto, you could do that by setting BSC_CXXFLAGS=-flto or by setting BSC_OPTIONS="-Xc++ -flto".

However, I'm still unclear why there's a mismatch between the build of BSC and your execution of BSC. If LTO was enabled by default, it should apply to both the build of BSC and the execution of BSC, and no additional flag would be needed. If BSC was built with an explicit -flto flag, then yes you would need to also provide the -flto flag when you run BSC.

quark17 commented 1 month ago

BSC will also consult CFLAGS and CXXFLAGS from the environment. So you may also want to check whether that is being set/unset. You can use that to specify additional flags, but they are of course standard variables that will affect calls to the C/C++ compiler elsewhere; using BSC_CFLAGS and BSC_CXXFLAGS allows you to specify flags only when invoking BSC.

Vekhir commented 1 month ago

Checking out your suggestions, none fixed the issue. I don't think there's a problem with the environment. The section The environment describes the general conventions on Arch and goes into further detail of what I've done in this particular case.

I opened an issue that might be related over at pacman/pacman#150. Packages get stripped of debug symbols, maybe that also removes necessary functions.

The environment

By convention, Arch packages are built in a clean chroot. It's not enforced, but it's best practice. This is done for reproducibility and repeatability as the build environment is always the same between builds and between systems, solely depending on the build instructions within the PKGBUILD. Additionally, this means the packager should ensure that the resulting package works no matter the environment on the final machine; The package only installs files and dependencies. The user is of course free to do whatever if the program allows it.

During testing, I have installed the package (and FibOne.bsv) into the chroot and run the tests there, so the environment is exactly identical. This is a bit hacky, but it works. It has to work for debugging purposes. After testing, the chroot is easily cleaned again.

So, running this bash code (original):

build(){
  #cd "$srcdir/bsc"
  echo "CFLAGS="$CFLAGS
  echo "CXXFLAGS="$CXXFLAGS
  rm FibOne.bo mkFibOne.ba mkFibOne.cxx mkFibOne.h mkFibOne.o model_mkFibOne.cxx model_mkFibOne.h model_mkFibOne.o out out.so
  bsc -v
  bsc -u -sim -simdir . -bdir . -info-dir . -keep-fires -p %/Libraries -g mkFibOne FibOne.bsv
  bsc -e mkFibOne -sim -o ./out -simdir . -p %/Libraries -bdir . -keep-fires
  ./out
  exit
  #make GHC="ghc -dynamic" GHCJOBS=4 GHCRTSFLAGS='+RTS -M8G -A128m -RTS' install-src
  #make install-doc
}

results in:

==> Starting build()...
CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -g -ffile-prefix-map=/build/bluespec-git/src=/usr/src/debug/bluespec-git -flto=auto
CXXFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -Wp,-D_GLIBCXX_ASSERTIONS -g -ffile-prefix-map=/build/bluespec-git/src=/usr/src/debug/bluespec-git -flto=auto
Bluespec Compiler, version 2024.01-17-ge6f95a7c (build e6f95a7c)
This is free software; for source code and copying conditions, see
https://github.com/B-Lang-org/bsc

Invoking command line:
bsc -v

checking package dependencies
compiling FibOne.bsv
code generation for mkFibOne starts
Elaborated module file created: mkFibOne.ba
All packages are up to date.
Bluesim object created: ./mkFibOne.{h,o}
Bluesim object created: ./model_mkFibOne.{h,o}
Simulation shared library created: out.so
Simulation executable created: ./out
Error: dlopen: ./out.so: undefined symbol: _Z21vcd_write_scope_startP9tSimStatePKc
    invoked from within
"sim load $model_name $top_module"
    (file "/opt/bluespec/lib/tcllib/bluespec/bluesim.tcl" line 188)
==> ERROR: A failure occurred in build().

Notably, *FLAGS contains -flto=auto and bsc -v doesn't show anything special. Adding export BSC_OPTIONS="-Xc++ -flto=auto" (or a bare -flto, similar result) at the top yields:

CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -g -ffile-prefix-map=/build/bluespec-git/src=/usr/src/debug/bluespec-git -flto=auto
CXXFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -Wp,-D_GLIBCXX_ASSERTIONS -g -ffile-prefix-map=/build/bluespec-git/src=/usr/src/debug/bluespec-git -flto=auto
Bluespec Compiler, version 2024.01-17-ge6f95a7c (build e6f95a7c)
This is free software; for source code and copying conditions, see
https://github.com/B-Lang-org/bsc

Invoking command line:
bsc -Xc++ -flto=auto -v

checking package dependencies
compiling FibOne.bsv
code generation for mkFibOne starts
Elaborated module file created: mkFibOne.ba
All packages are up to date.
Bluesim object created: ./mkFibOne.{h,o}
Bluesim object created: ./model_mkFibOne.{h,o}
Simulation shared library created: out.so
Simulation executable created: ./out
Error: dlopen: ./out.so: undefined symbol: _Z21vcd_write_scope_startP9tSimStatePKc
    invoked from within
"sim load $model_name $top_module"
    (file "/opt/bluespec/lib/tcllib/bluespec/bluesim.tcl" line 188)
==> ERROR: A failure occurred in build().

Even with proof that bsc is aware of LTO, it still fails.

Vekhir commented 1 month ago

@quark17 The solution is to add -ffat-lto-objects to CXX_FLAGS during building. At runtime, nothing has to be done.

quark17 commented 1 month ago

If I understand, from reading someone else's issue here, the strip utility mangles the index of static archives (.a files) that are compiled for LTO, because strip doesn't know about LTO. That is consistent with your observation that disabling either strip or LTO avoids the problem. One workaround is to use the -ffat-lto-objects flag. But note: that doubles the size of the file, by including both LTO and non-LTO versions of the object in the file. Another fix seems to be running ranlib after strip, to rebuild the index of the archive. That wouldn't need to increase the size of the files. Although, I guess, one advantage of using fat objects is that they would work with C compiles using either LTO or not, since the file contains both.

Vekhir commented 1 month ago

I've done some benchmarking and building with -ffat-lto-objects flag is about 1% faster (0.3s on 25s, but outside variance) during the bsc -e step when using BSC_OPTIONS="-Xc++ -flto" vs when not. The other two steps are within runtime variance. I believe having both versions is OK, as it doesn't require any intervention (like setting variables) to start compiling. Those who want it can always enable it.