GalSim-developers / GalSim

The modular galaxy image simulation toolkit. Documentation:
http://galsim-developers.github.io/GalSim/
Other
228 stars 107 forks source link

Building from source (sdist) succeeds, but import fails, missing dbgout symbol #1313

Closed ccoulombe closed 1 month ago

ccoulombe commented 1 month ago

Howdy!

Building from source succeeds, but once the wheel is installed it fails on import with this missing symbol :

ImportError: _galsim.cpython-312-x86_64-linux-gnu.so: undefined symbol: dbgout

This happens on a Linux systems, with GCC 12.3 with offloading to a GPU. This symbol dbgout is effectively not found:

(2771) [coulombc@node1 GalSim]$ nm -D --demangle --undefined-only  build/lib.linux-x86_64-cpython-311/galsim/_galsim.cpython-311-x86_64-linux-gnu.so | fgrep -i dbgout
                 U dbgout

But, building without offloading (tweaking the setup to enforce it) is correct and I can import galsim correctly!

The two issues on github (where dbgout is mentionned), suggested that this symbol is from in-house debugging. Searching a bit in the source files lead to Std.h where an extern is defined

What puzzle me is that -DNDEBUG is defined in both cases, but I guess that DEBUGLOGGING is somehow defined when building with offloading. Here's a compilation line when building with offloading:

402:ccache gcc -DNDEBUG -g -fwrapv -O3 -Wall -O2 -ftree-vectorize -march=x86-64-v3 -fno-math-errno -fPIC -O2 -ftree-vectorize -march=x86-64-v3 -fno-math-errno -fPIC -fPIC -Iinclude -Iinclude/galsim -Iinclude -Iinclude/galsim -I/tmp/coulombc/2771/lib/python3.11/site-packages/numpy/core/include -I/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Compiler/gcc12/fftw/3.3.10/include -I/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Core/eigen/3.4.0/include -I/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Compiler/gcccore/python-build-bundle/2024a/lib/python3.11/site-packages/pybind11/include -c src/CorrelatedNoise.cpp -o build/temp.linux-x86_64-cpython-311/src/CorrelatedNoise.o -O2 -std=c++11 -fvisibility=hidden -fopenmp -foffload=nvptx-none -DGALSIM_USE_GPU

Reproduce

# With a GCC that supports offloading
git clone --depth 1 https://github.com/GalSim-developers/GalSim.git -b v2.6.0 && cd GalSim
python setup.py build bdist_wheel
nm -D --demangle --undefined-only  build/lib.linux-x86_64-cpython-311/galsim/_galsim.cpython-311-x86_64-linux-gnu.so | fgrep -i dbgout

Any hints ? Thanks

ccoulombe commented 1 month ago

To add, no define are actually done in the source:

(2771) [coulombc@node1 GalSim]$ git grep -e '^#define DEBUGLOGGING' 
(2771) [coulombc@node1 GalSim]$ 

as they are commented.

For now, I'll build without gpu offloading.

rmjarvis commented 1 month ago

This feels like a compiler bug to me. If DEBUGLOGGING is not turned on, then all uses of dbgout are guarded by if (false) which the compiler should trivially optimize away.

But I think I have a way around it. Can you try git checkout '#1313', and see if that branch compiles and links successfully for you?

ccoulombe commented 1 month ago

Yes sure, I will, by the end of the day today! Thanks for looking into this quickly

ccoulombe commented 1 month ago

:tada: Works :) Thanks for the fix. I'll use this to patch the wheels, unless you are about to release 2.6.1? @rmjarvis

rmjarvis commented 1 month ago

Great. Thanks. I'll release 2.6.1 shortly.

rmjarvis commented 1 month ago

@ccoulombe Version 2.6.1 has been released on pypi. Let me know if you still have any problems with it.