ContinuumIO / anaconda-issues

Anaconda issue tracking
646 stars 220 forks source link

Possible C++ ABI incompatibilities with build 1 of libcxxabi-4.0.1 on macOS #10423

Open anjos opened 5 years ago

anjos commented 5 years ago

Actual Behavior

One of our packages (https://gitlab.idiap.ch/bob/bob.learn.boosting) contains C++ code bound to Python via its own APIs - we don't use boost::python or anything like it. The C++ code is compiled using CMake, while the Python bindings are compiled using the normal setuptools/distutils framework. The builds are completely integrated within a call to setup.py install, which is called via conda-build. We do builds for Linux and MacOS routinely, but this problem only shows on MacOS.

Recently, we started to observe segmentation faults in this library, without any change in the code. After careful inspection, we realized that the problem was inside a std::vector<> that was created within the C++ code (compiled by CMake), and then manipulated via code compiled via setuptools. From experience, this type of problem occurs when the ABI is changed between libraries communicating complex objects (such as std::vector's are). We believe there is something strange going on with the latest version of libcxxabi and friends (build 1). After downgrading to the libcxxabi to build 0, the problem stops occurring.

Thread on our gitlab: https://gitlab.idiap.ch/bob/bob.learn.boosting/issues/2

Expected Behavior

Compiled code via setuptools or cmake should be ABI compatible and the exchange of C++ objects possible between such binaries.

Steps to Reproduce

Reproducing this problem requires you experiment with both cmake and setuptools based compilations, which is not trivial, so it is difficult to provide a small, self-contained example. Here is how to compile and reproduce the problem with the original package that shows the issue (on a MacOS machine - ours is a 10.13 system, with a 10.9 SDK installed on /opt/MacOSX10.9.sdk):

$ git clone https://gitlab.idiap.ch/bob/bob.admin #help scripts and conda_build_config.yaml
$ git clone https://gitlab.idiap.ch/bob/bob.learn.boosting #package with the problem
$ conda activate base
(base) $ cd bob.learn.boosting
# the next line will install the environment to compile the package
(base) $ ../bob.admin/conda/conda-bootstrap.py --python=3.6 bug-environment
(base) $ conda activate bug-environment
(bug-environment) $ buildout #this will build the package
(bug-environment) $ ./bin/nosetests -sv #crash will occur
(bug-environment) $ conda install libcxxabi=4.0.1=hebd6815_0
...
  added / updated specs:
    - libcxxabi==4.0.1=hebd6815_0

The following packages will be DOWNGRADED:

    cctools:       895-1            defaults --> 895-h7512d6f_0   defaults
    clang:         4.0.1-1          defaults --> 4.0.1-h662ec87_0 defaults
    clangxx:       4.0.1-1          defaults --> 4.0.1-hc9b4283_0 defaults
    compiler-rt:   4.0.1-hcfea43d_1 defaults --> 4.0.1-h5487866_0 defaults
    ld64:          274.2-1          defaults --> 274.2-h7c2db76_0 defaults
    libcxx:        4.0.1-hcfea43d_1 defaults --> 4.0.1-h579ed51_0 defaults
    libcxxabi:     4.0.1-hcfea43d_1 defaults --> 4.0.1-hebd6815_0 defaults
    llvm:          4.0.1-1          defaults --> 4.0.1-hc748206_0 defaults
    llvm-lto-tapi: 4.0.1-1          defaults --> 4.0.1-h6701bc3_0 defaults
...
(bug-environment) $ git clean -ffdx #to clean-up buggy build
(bug-environment) $ buildout #will rebuild and now it will work
(bug-environment) $ ./bin/nosetests -sv #tests pass
Anaconda or Miniconda version: 4.5.11
Operating System: MacOS 10.13.6
conda info
     active environment : bug
    active env location : /Users/andre/conda/envs/bug
            shell level : 2
       user config file : /Users/andre/.condarc
 populated config files : /Users/andre/.condarc
                          /Users/andre/conda/envs/bug/.condarc
          conda version : 4.5.11
    conda-build version : 3.16.3
         python version : 3.6.7.final.0
       base environment : /Users/andre/conda  (writable)
           channel URLs : https://www.idiap.ch/software/bob/conda/label/beta/osx-64
                          https://www.idiap.ch/software/bob/conda/label/beta/noarch
                          https://www.idiap.ch/software/bob/conda/osx-64
                          https://www.idiap.ch/software/bob/conda/noarch
                          https://repo.anaconda.com/pkgs/main/osx-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/free/osx-64
                          https://repo.anaconda.com/pkgs/free/noarch
                          https://repo.anaconda.com/pkgs/r/osx-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://repo.anaconda.com/pkgs/pro/osx-64
                          https://repo.anaconda.com/pkgs/pro/noarch
          package cache : /Users/andre/conda/pkgs
                          /Users/andre/.conda/pkgs
       envs directories : /Users/andre/conda/envs
                          /Users/andre/.conda/envs
               platform : osx-64
             user-agent : conda/4.5.11 requests/2.20.1 CPython/3.6.7 Darwin/17.7.0 OSX/10.13.6
                UID:GID : 501:20
             netrc file : None
           offline mode : False
conda list --show-channel-urls
# packages in environment at /Users/andre/conda/envs/bug:
#
# Name                    Version                   Build  Channel
alabaster                 0.7.12                   py36_0    defaults
appnope                   0.1.0            py36hf537a9a_0    defaults
asn1crypto                0.24.0                   py36_0    defaults
babel                     2.6.0                    py36_0    defaults
backcall                  0.1.0                    py36_0    defaults
blas                      1.0                         mkl    defaults
bob-devel                 2018.12.11               py36_0    https://www.idiap.ch/software/bob/conda
bob.blitz                 2.0.17b0         py36h295839d_0    https://www.idiap.ch/software/bob/conda/label/beta
bob.buildout              2.1.8b0          py36h528b5c5_0    https://www.idiap.ch/software/bob/conda/label/beta
bob.core                  2.2.2b0          py36h295839d_0    https://www.idiap.ch/software/bob/conda/label/beta
bob.extension             3.1.1b0          py36h5c6ceda_0    https://www.idiap.ch/software/bob/conda/label/beta
bob.io.base               3.0.5b0          py36hb71bad4_0    https://www.idiap.ch/software/bob/conda/label/beta
boost                     1.65.1                   py36_4    defaults
bzip2                     1.0.6                h1de35cc_5    defaults
ca-certificates           2018.03.07                    0    defaults
cctools                   895                  h7512d6f_0    defaults
certifi                   2018.11.29               py36_0    defaults
cffi                      1.11.5           py36h6174b99_1    defaults
chardet                   3.0.4                    py36_1    defaults
clang                     4.0.1                h662ec87_0    defaults
clang_osx-64              4.0.1               h1ce6c1d_11    defaults
clangxx                   4.0.1                hc9b4283_0    defaults
clangxx_osx-64            4.0.1               h22b1bf0_11    defaults
click                     6.7              py36hec950be_0    defaults
click-plugins             1.0.3                    py36_1    defaults
cmake                     3.12.2               haff7e42_0    defaults
compiler-rt               4.0.1                h5487866_0    defaults
coverage                  4.5.1            py36h1de35cc_0    defaults
cryptography              2.4.1            py36ha12b0ac_0    defaults
decorator                 4.3.0                    py36_0    defaults
docutils                  0.14             py36hbfde631_0    defaults
expat                     2.2.6                h0a44026_0    defaults
hdf5                      1.10.1               ha036c08_1    defaults
icu                       58.2                 h4b95b61_1    defaults
idna                      2.7                      py36_0    defaults
imagesize                 1.1.0                    py36_0    defaults
intel-openmp              2019.1                      144    defaults
ipdb                      0.11             py36h30e596e_0    https://www.idiap.ch/software/bob/conda
ipython                   7.2.0            py36h39e3cac_0    defaults
ipython_genutils          0.2.0            py36h241746c_0    defaults
jedi                      0.13.1                   py36_0    defaults
jinja2                    2.10                     py36_0    defaults
ld64                      274.2                h7c2db76_0    defaults
libblitz                  1.0.1                hd7a9176_0    https://www.idiap.ch/software/bob/conda
libboost                  1.65.1               hcc95346_4    defaults
libcurl                   7.62.0               h051b688_0    defaults
libcxx                    4.0.1                h579ed51_0    defaults
libcxxabi                 4.0.1                hebd6815_0    defaults
libedit                   3.1.20170329         hb402a30_2    defaults
libffi                    3.2.1                h475c297_4    defaults
libgfortran               3.0.1                h93005f0_2    defaults
libiconv                  1.15                 hdd342a3_7    defaults
libssh2                   1.8.0                ha12b0ac_4    defaults
llvm                      4.0.1                hc748206_0    defaults
llvm-lto-tapi             4.0.1                h6701bc3_0    defaults
markupsafe                1.1.0            py36h1de35cc_0    defaults
mkl                       2018.0.3                      1    defaults
mkl_fft                   1.0.6            py36hb8a8100_0    defaults
mkl_random                1.0.1            py36h5d10147_1    defaults
mr.developer              1.38                     py36_0    https://www.idiap.ch/software/bob/conda
ncurses                   6.1                  h0a44026_1    defaults
nose                      1.3.7                    py36_2    defaults
numpy                     1.15.1           py36h6a91979_0    defaults
numpy-base                1.15.1           py36h8a80b8c_0    defaults
openssl                   1.1.1a               h1de35cc_0    defaults
packaging                 18.0                     py36_0    defaults
parso                     0.3.1                    py36_0    defaults
pexpect                   4.6.0                    py36_0    defaults
pickleshare               0.7.5                    py36_0    defaults
pkg-config                0.29.2               h3efe00b_8    defaults
prompt_toolkit            2.0.7                    py36_0    defaults
ptyprocess                0.6.0                    py36_0    defaults
py-boost                  1.65.1           py36h1439ea1_4    defaults
pycparser                 2.19                     py36_0    defaults
pygments                  2.2.0            py36h240cd3f_0    defaults
pyopenssl                 18.0.0                   py36_0    defaults
pyparsing                 2.3.0                    py36_0    defaults
pysocks                   1.6.8                    py36_0    defaults
python                    3.6.7                haf84260_0    defaults
pytz                      2018.7                   py36_0    defaults
readline                  7.0                  h1de35cc_5    defaults
requests                  2.19.1                   py36_0    defaults
rhash                     1.3.6                ha12b0ac_0    defaults
scipy                     1.1.0            py36h28f7352_1    defaults
setuptools                40.2.0                   py36_0    defaults
six                       1.11.0                   py36_1    defaults
snowballstemmer           1.2.1            py36h6c7b616_0    defaults
sphinx                    1.8.1                    py36_0    defaults
sphinx_rtd_theme          0.4.1                    py36_0    defaults
sphinxcontrib             1.0                      py36_1    defaults
sphinxcontrib-websupport  1.1.0                    py36_1    defaults
sqlite                    3.25.3               ha441bb4_0    defaults
tk                        8.6.8                ha441bb4_0    defaults
traitlets                 4.3.2            py36h65bd3ce_0    defaults
urllib3                   1.23                     py36_0    defaults
wcwidth                   0.1.7            py36h8c6ec74_0    defaults
xz                        5.2.4                h1de35cc_4    defaults
zc.buildout               2.12.2                   py36_0    https://www.idiap.ch/software/bob/conda
zc.recipe.egg             2.0.7            py36h6217847_0    https://www.idiap.ch/software/bob/conda
zlib                      1.2.11               h1de35cc_3    defaults
anjos commented 5 years ago

I'm suspecting the following patch at the llvm-suite parent package: 0001-If-libc-abi-library-is-given-use-it-to-reexport.patch

This patch may affect cmake-based builds and is outdated following a discussion here: https://reviews.llvm.org/D53797

In particular, the following quote before dismissing this patch is worrying: I don't think we ever want to re-export the current system's libc++abi -- we should always use an explicit list of exported symbols.

Could you either update the patch or instruct me how to rebuild this package to test it?

183amir commented 5 years ago

@mingwandroid Could you please give us some pointers here? As far as I understand two packages with the different build numbers only should be API/ABI compatible especially for something like libc++abi.

mingwandroid commented 5 years ago

Could you please give us some pointers here? As far as I understand two packages with the different build numbers only should be API/ABI compatible especially for something like libc++abi.

No that is only true for packages that implement semver and even then there's times when it's necessary to break the ABI between patch releases (changing some build option can cause this).

I'm too busy to look into this until later, but in this case that patch number does represent an ABI break due to a fix we needed for exceptions. It's not clear what's going on here but I suspect its to do with mixing the system libc++ with ours and passing objects between them which is not supported.

anjos commented 5 years ago

Here is the outcome of a few experiments I conducted:

  1. Build and test with version 4.0.1-0: works
  2. Build and test with version 4.0.1-1: doesn't work
  3. Build with version 4.0.1-1, test with version 4.0.1-0: works
  4. Build with version 4.0.1-0, test with version 4.0.1-1: doesn't work

So, the problem really seems related to the runtime of version 4.0.1-1, since once we deploy version 4.0.1-0, the problems go away, even if our binary is compiled against 4.0.1-1.

Now, I rebuild the "llvm-suite" from scratch (only took 7 hours on my laptop...) to remove this patch. I'll call this "version 4.0.1-2". After installing version 4.0.1-2, the problems go away again and everything works as expected. So, the patch in question is really in the center of the issue!

mingwandroid commented 5 years ago

Thanks @anjos,

The patch is redundant relative to the latest llvm/clang master, but we're building llvm/clang 4.0.1 here where it is not redundant. This patch is essential for C++ exceptions to work correctly. Pinging @isuruf.

The most sensible way forward that I can see is for us to update our macOS compilers to a very recent one and add some tests to the compiler packages for both C++ exceptions and this issue.

Can someone make the smallest possible reproducer for it though? That would be super useful.

I'll try to proritize it as soon as we get such a reproducer. It's still not 100% clear to me, despite the evidence presented that this isn't a bug in the code in question (though I admit that is less likely).

mingwandroid commented 5 years ago

The libc++ team make guarantees about ABI compatibility that they appear not to be reaching :-(

isuruf commented 5 years ago

Yeah, this looks like a problem of mixing libc++ libraries. Can you do export DYLD_PRINT_LIBRARIES="1" and rerun your script to figure out which libc++abi.dylib and libc++.dylib are loaded?

anjos commented 5 years ago

Here is output of the program when that variable is set.

A few notes:

Is this possibly an error in our own build instructions? Not sure how to make setuptools link against the conda version of libc++ explicitly.

load.txt

isuruf commented 5 years ago

@anjos, try doing

export LDFLAGS="-L<path_to_conda_env>/lib -Wl,-rpath,<path_to_conda_env>/lib"

so that setuptools links against libc++ inside the conda env.

183amir commented 5 years ago

@isuruf Isn't this flag automatically exported when you activate the compilers?

mingwandroid commented 5 years ago

In general, linking to /usr/lib/libc++.dylib means that the system compilers got used instead of ours.

Typically that happens when you neglect to pass --host=${HOST} to configure. (HOST is set by the compiler activation scripts, I wish I'd picked a less common name though, CONDA_HOST for example, so if we need to change this at some point I apologise in advance).

183amir commented 5 years ago

As far as I can understand from the discussions in this issue looks like there are two problems:

Am I right?

mingwandroid commented 5 years ago

The libc++ 4.0.1-1 package is not abi compatible with the system one. It's probably not abi compatible with the older conda package either. Maybe we can remove it from the channel index until we have more information?

It's the system compilers you need to stop using here! We have no evidence to suggest our packages are not abi compatible between build numbers, but that's irrelevant, since the deps are exactly the same so the solver will always pick the newer build number.

edit: ignore this comment!

mingwandroid commented 5 years ago

You need to run otool -l on all the packages involved here and find those that link to /usr/lib/libc++.dylib and fix that.

edit: .. and this one.

mingwandroid commented 5 years ago

Looking at the txt file, although it is loading two sets of libc++ dylibs, I'm not sure that's the issue here.

The system one gets loaded from Python through the fact that some of Apple's system libs are written in C++ (in fact the actual dynamic loader is written in C++, but it must be statically linked) so I'm thinking conflicting libc++'s is a red-herring now.

mingwandroid commented 5 years ago

So my latest crazy theory is that the libc++ ABI 'leaks' into the headers.

Does anyone have a stacktrace for the segfault? Can you try to recompile the package in which the crashing function lives and also the package of the caller of that function?

anjos commented 5 years ago

I scanned the whole setup.

I found only a single dylib linking to /usr/lib/libc++.1.dylib:

/Users/andre/conda/envs/bug/lib/libiomp5_db.dylib
    /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 120.1.0)

This is also present in the load.txt file I submitted before, so somehow, our runtime is loading it.

I'll run more tests, but it may be related to this library.

mingwandroid commented 5 years ago

Very interesting ...

Any idea what package that comes from? Can you find /Users/andre/conda/pkgs -name libiomp5_db.dylib

mingwandroid commented 5 years ago

It's from MKL, can you try updating to 2019.1?

anjos commented 5 years ago

Ran the mkl=2019.1 update, here is the list of packages updated:

The following packages will be REMOVED:

    bob-devel:  2018.12.11-py36_0     https://www.idiap.ch/software/bob/conda

The following packages will be UPDATED:

    mkl:        2018.0.3-1            defaults                                --> 2019.1-144            defaults
    mkl_fft:    1.0.6-py36hb8a8100_0  defaults                                --> 1.0.6-py36h27c97d8_0  defaults
    mkl_random: 1.0.1-py36h5d10147_1  defaults                                --> 1.0.2-py36h27c97d8_0  defaults
    numpy:      1.15.1-py36h6a91979_0 defaults                                --> 1.15.4-py36hacdab7b_0 defaults
    numpy-base: 1.15.1-py36h8a80b8c_0 defaults                                --> 1.15.4-py36h6575580_0 defaults
    scipy:      1.1.0-py36h28f7352_1  defaults                                --> 1.1.0-py36h1410ff5_2  defaults

The problem persists, but that library is still linked to /usr/lib/libc++.1.dylib. It is the only one on the whole stack. Looking closely, I realise the library does not come from the mkl package, but rather from intel-openmp-2019.1-144, which is installed any way with both versions 2018 and 2019 of mkl.

I changed that dylib file manually for a quick test using the following command:

$ install_name_tool -change /usr/lib/libc++.1.dylib "@rpath/libc++.1.dylib" /Users/andre/conda/envs/bug/lib/libiomp5_db.dylib

otool -L on that library now shows me:

$ otool -L ~/conda/pkgs/intel-openmp-2019.1-144/lib/libiomp5_db.dylib
/Users/andre/conda/pkgs/intel-openmp-2019.1-144/lib/libiomp5_db.dylib:
    @rpath/libiomp5_db.dylib (compatibility version 0.0.0, current version 0.0.0)
    @rpath/libc++.1.dylib (compatibility version 1.0.0, current version 120.1.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1225.1.1)

Scanning all libraries, I don't see anymore anyone linked specifically against /usr/lib/libc++.1.dylib. Re-running the example I have in hands, still gives me the crash though.

If I check my own libraries (the ones within the package itself), I see they don't link against the system's libc++.1.dylib:

$ otool -L bob/learn/boosting/*.so
bob/learn/boosting/_library.cpython-36m-darwin.so:
    @rpath/libbob_learn_boosting.dylib (compatibility version 0.0.0, current version 0.0.0)
    @rpath/libboost_system.dylib (compatibility version 0.0.0, current version 0.0.0)
    @rpath/libc++.1.dylib (compatibility version 1.0.0, current version 1.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1197.1.1)
bob/learn/boosting/version.cpython-36m-darwin.so:
    @rpath/libc++.1.dylib (compatibility version 1.0.0, current version 1.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1197.1.1)
$ otool -L bob/learn/boosting/*.dylib
bob/learn/boosting/libbob_learn_boosting.dylib:
    @rpath/libbob_learn_boosting.dylib (compatibility version 0.0.0, current version 0.0.0)
    @rpath/libbob_io_base.dylib (compatibility version 0.0.0, current version 0.0.0)
    @rpath/libboost_system.dylib (compatibility version 0.0.0, current version 0.0.0)
    @rpath/libc++.1.dylib (compatibility version 1.0.0, current version 1.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.50.4)

So, it looks sane to me w.r.t. linking. Nevertheless:

$ DYLD_PRINT_LIBRARIES="1" ./bin/python test.py  2>&1 | grep c++
dyld: loaded: /usr/lib/libc++abi.dylib
dyld: loaded: /usr/lib/libc++.1.dylib
dyld: loaded: /usr/lib/libc++abi.dylib
dyld: loaded: /usr/lib/libc++.1.dylib
# these come from our conda-build as you can see above
dyld: loaded: /Users/andre/conda/envs/bug/lib/libc++.1.dylib
dyld: loaded: /Users/andre/conda/envs/bug/lib/libc++abi.1.dylib

Now, just running python itself, from the environment:

$ DYLD_PRINT_LIBRARIES="1" python -c 'exit()'  2>&1 | grep c++
dyld: loaded: /usr/lib/libc++abi.dylib
dyld: loaded: /usr/lib/libc++.1.dylib

So, this does not related to our code at all and Python from the defaults channel seems to be loading the C++ libraries from the system.

anjos commented 5 years ago

Here is a minimal example to test the system library loading from scratch:

$ conda create -n pytest python=3
$ conda activate pytest
(pytest) $ DYLD_PRINT_LIBRARIES="1" python -c 'exit()'  2>&1 | grep c++
dyld: loaded: /usr/lib/libc++abi.dylib
dyld: loaded: /usr/lib/libc++.1.dylib
anjos commented 5 years ago

Here is the reasoning why that happens:

  1. Python links against /usr/lib/libSystem.B.dylib
  2. /usr/lib/libSystem.B.dylib links against /usr/lib/system/libxpc.dylib
  3. /usr/lib/system/libxpc.dylib links against /usr/lib/libobjc.A.dylib
  4. /usr/lib/libobjc.A.dylib links against /usr/lib/libc++abi.dylib

Not sure this is wrong per se. Comments welcome.

Edit: now re-reading the stack @mingwandroid has already commented on this, so please ignore the request for comments.

anjos commented 5 years ago

So, at least in macOS 10.13, my current understanding is that anything linking against /usr/lib/libSystem.B.dylib will end-up with /usr/lib/libc++abi.dylib on their linkage list.

Edit: ignore this as well.

anjos commented 5 years ago

More information: I created a macOS 10.9 machine and compiled my software there, from scratch. The problem persists, as well as all indicators as defined above. So, we can exclude cross-compilation issues.

anjos commented 5 years ago

@anjos, try doing

export LDFLAGS="-L<path_to_conda_env>/lib -Wl,-rpath,<path_to_conda_env>/lib"

so that setuptools links against libc++ inside the conda env.

I double-checked our setup and this is exactly what is executed. The compilation line for setuptools-built bindings look like this:

x86_64-apple-darwin13.4.0-clang++ -bundle -undefined dynamic_lookup -isysroot /opt/MacOSX10.9.sdk -Wl,-pie -Wl,-headerpad_max_install_names -Wl,-rpath,/Users/gitlab/conda/envs/bug/lib -L/Users/gitlab/conda/envs/bug/lib -isysroot /opt/MacOSX10.9.sdk -Wl,-pie -Wl,-headerpad_max_install_names -Wl,-rpath,/Users/gitlab/conda/envs/bug/lib -L/Users/gitlab/conda/envs/bug/lib -Wl,-export_dynamic -Wl,-pie -Wl,-headerpad_max_install_names -Wl,-dead_strip_dylibs -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -O0 -g -DBOB_DEBUG -D_FORTIFY_SOURCE=2 -mmacosx-version-min=10.9 -arch x86_64 build/temp.macosx-10.9-x86_64-3.6/bob/learn/boosting/main.o build/temp.macosx-10.9-x86_64-3.6/bob/learn/boosting/loss_function.o build/temp.macosx-10.9-x86_64-3.6/bob/learn/boosting/jesorsky_loss.o build/temp.macosx-10.9-x86_64-3.6/bob/learn/boosting/weak_machine.o build/temp.macosx-10.9-x86_64-3.6/bob/learn/boosting/stump_machine.o build/temp.macosx-10.9-x86_64-3.6/bob/learn/boosting/lut_machine.o build/temp.macosx-10.9-x86_64-3.6/bob/learn/boosting/boosted_machine.o build/temp.macosx-10.9-x86_64-3.6/bob/learn/boosting/lut_trainer.o -L/Users/gitlab/bob.learn.boosting/build/lib.macosx-10.9-x86_64-3.6/bob/learn/boosting -L/Users/gitlab/conda/envs/bug/lib -L/Users/gitlab/bob.learn.boosting/src/bob.core/bob/core -L/Users/gitlab/bob.learn.boosting/src/bob.io.base/bob/io/base -lbob_learn_boosting -lbob_core -lbob_io_base -lboost_system -lblitz -o build/lib.macosx-10.9-x86_64-3.6/bob/learn/boosting/_library.cpython-36m-darwin.so

otool -L on it shows me everything looks great:

otool -L bob/learn/boosting/*.so
bob/learn/boosting/_library.cpython-36m-darwin.so:
    @rpath/libbob_learn_boosting.dylib (compatibility version 0.0.0, current version 0.0.0)
    @rpath/libboost_system.dylib (compatibility version 0.0.0, current version 0.0.0)
    @rpath/libc++.1.dylib (compatibility version 1.0.0, current version 1.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1197.1.1)
bob/learn/boosting/version.cpython-36m-darwin.so:
    @rpath/libc++.1.dylib (compatibility version 1.0.0, current version 1.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1197.1.1)

Check full (recent) log here: https://gitlab.idiap.ch/bob/bob.learn.boosting/-/jobs/152778

anjos commented 5 years ago

Instead, here is something that seems to "fix it":

$ install_name_tool -change @rpath/libc++.1.dylib /usr/lib/libc++.1.dylib bob/learn/boosting/_library.cpython-36m-darwin.so
$ ./bin/python test.py
1.0  #does not crash!

So, it is really an ABI incompatibility between my generated bindings and the linked libraries.

anjos commented 5 years ago

Thanks @anjos,

The patch is redundant relative to the latest llvm/clang master, but we're building llvm/clang 4.0.1 here where it is not redundant. This patch is essential for C++ exceptions to work correctly. Pinging @isuruf.

The most sensible way forward that I can see is for us to update our macOS compilers to a very recent one and add some tests to the compiler packages for both C++ exceptions and this issue.

Can someone make the smallest possible reproducer for it though? That would be super useful.

I'll try to proritize it as soon as we get such a reproducer. It's still not 100% clear to me, despite the evidence presented that this isn't a bug in the code in question (though I admit that is less likely).

@mingwandroid: I'd be happy to test them in my setup. Minimally, we'd only need to have a build that either excludes or updates the patch below as per my initial suggestion.

I'm suspecting the following patch at the llvm-suite parent package: 0001-If-libc-abi-library-is-given-use-it-to-reexport.patch

This patch may affect cmake-based builds and is outdated following a discussion here: https://reviews.llvm.org/D53797

In particular, the following quote before dismissing this patch is worrying: I don't think we ever want to re-export the current system's libc++abi -- we should always use an explicit list of exported symbols.

Could you either update the patch or instruct me how to rebuild this package to test it?

anjos commented 5 years ago

I continued tests by tweaking compilation flags, but nothing seems to fix this. The more I look at it, the more it looks like a binary issue with the pointed out library.

@mingwandroid: I'm not sure how to provide you the smallest possible reproducer. I tried to explain how to reproduce the problem on the original report.

@isuruf, @mingwandroid: I'm afraid this is breaking our whole software stack and I'm out of ideas on where to look further. Could you please consider rebuilding libcxx/abi with the improved patch as suggested above?

isuruf commented 5 years ago

@anjos, can you try libcxx-8.0.0 and libcxxabi-8.0.0 from conda-forge channel?

anjos commented 5 years ago

To use a new version of the ABI version implies the de-installation of clang=4.0.1 which makes it hard to recompile the package, so my testing may be biased.

The following changes were applied to my software stack:

The following packages will be REMOVED:

  cctools-895-1
  clang-4.0.1-1
  clang_osx-64-4.0.1-h1ce6c1d_11
  clangxx-4.0.1-1
  clangxx_osx-64-4.0.1-h22b1bf0_11
  ld64-274.2-1
  llvm-lto-tapi-4.0.1-1

The following packages will be UPDATED:

  ca-certificates    pkgs/main::ca-certificates-2019.1.23-0 --> conda-forge::ca-certificates-2019.3.9-hecc5488_0
  libcxx                 pkgs/main::libcxx-4.0.1-hcfea43d_1 --> conda-forge::libcxx-8.0.0-2
  libcxxabi           pkgs/main::libcxxabi-4.0.1-hcfea43d_1 --> conda-forge::libcxxabi-8.0.0-2
  openssl              pkgs/main::openssl-1.1.1b-h1de35cc_1 --> conda-forge::openssl-1.1.1b-h01d97ff_2

The following packages will be SUPERSEDED by a higher-priority channel:

  certifi                                         pkgs/main --> conda-forge
  llvm                              pkgs/main::llvm-4.0.1-1 --> pkgs/free::llvm-3.3-0

A simple run after the package is upgraded (but using the previously compiled code) still produces the crash. So I cannot vouch for the new library - but again, I could not recompile the code from scratch.

isuruf commented 5 years ago

You need to recompile though. Can you create a new environment and force install the 2 package without deps?

anjos commented 5 years ago

OK, using --no-deps gets me there:

## Package Plan ##

  environment location: /Users/andre/conda/envs/learn-dev

  added / updated specs:
    - libcxx[version='>=8']
    - libcxxabi[version='>=8']

The following packages will be UPDATED:

  libcxx                 pkgs/main::libcxx-4.0.1-hcfea43d_1 --> conda-forge::libcxx-8.0.0-2
  libcxxabi           pkgs/main::libcxxabi-4.0.1-hcfea43d_1 --> conda-forge::libcxxabi-8.0.0-2

I recompiled the package with the stack above, but the segmentation fault still occurs.

isuruf commented 5 years ago

Otool output is still the same as in https://github.com/ContinuumIO/anaconda-issues/issues/10423#issuecomment-448581732 ?

anjos commented 5 years ago
$ otool -L bob/learn/boosting/*.so
bob/learn/boosting/_library.cpython-36m-darwin.so:
    @rpath/libbob_learn_boosting.dylib (compatibility version 0.0.0, current version 0.0.0)
    @rpath/libboost_system.dylib (compatibility version 0.0.0, current version 0.0.0)
    @rpath/libc++.1.dylib (compatibility version 1.0.0, current version 1.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1197.1.1)
bob/learn/boosting/version.cpython-36m-darwin.so:
    @rpath/libc++.1.dylib (compatibility version 1.0.0, current version 1.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1197.1.1)
anjos commented 5 years ago

So - yes. Exactly the same.

isuruf commented 4 years ago

On conda-forge, we decided not to ship libc++abi and only ship libc++, which should fix this issue.

anjos commented 4 years ago

We build packages against the defaults channel - not conda-forge's. Could you be more descriptive of the fix?

isuruf commented 4 years ago

defaults::libcxx's libc++.dylib links with libc++abi.dylib from defaults::libcxxabi package, but conda-forge::libcxx's libc++.dylib links with /usr/lib/libc++abi.dylib from macosx.

anjos commented 4 years ago

So, are you proposing we stick with conda-forge instead of defaults? Or suggesting that defaults will adopt conda-forge's linking strategy in a next release of libcxx?

isuruf commented 4 years ago

Can you check that it works with conda-forge? defaults will probably adopt the same strategy, but I have no say in that.

anjos commented 4 years ago

@isuruf: I can confirm that using the following list of packages from conda-forge makes my environment work, even without a recompilation:

ca-certificates           2019.9.11            hecc5488_0    conda-forge
cctools                   921                  h5ba7a2e_4    conda-forge
certifi                   2019.6.16                py36_1    conda-forge
clang                     9.0.0                h28b9765_1    conda-forge
clang_osx-64              9.0.0                h22b1bf0_3    conda-forge
clangxx                   9.0.0                         1    conda-forge
clangxx_osx-64            9.0.0                h22b1bf0_3    conda-forge
compiler-rt               9.0.0                hce3ea14_0    conda-forge
ld64                      409.12               h3c32e8a_4    conda-forge
libcxx                    9.0.0                h89e68fa_1    conda-forge
libllvm9                  9.0.0                h770b8ee_2    conda-forge
llvm                      9.0.0                         2    conda-forge
llvm-lto-tapi             4.0.1                         1    conda-forge
openssl                   1.1.1c               h01d97ff_0    conda-forge
tapi                      1000.10.8            h770b8ee_3    conda-forge

These packages were installed once I did conda install -c conda-forge libcxx=9. (Note: libcxxabi==4.0.1 from defaults was lingering after the install command above, I had to remove it manually.)

I would be nice to see (some of) these on defaults.

@mingwandroid: is there an ETA for an update of libcxx on defaults?

mingwandroid commented 4 years ago

When it's ready, but I think in the meantime we could remove this lib and make a new release of libcxx. Pinging @msarahan and @jjhelmus.