conda-forge / conda-forge.github.io

The conda-forge website.
https://conda-forge.org
BSD 3-Clause "New" or "Revised" License
128 stars 274 forks source link

Concerns about static vs dynamic TLS in libgomp #1551

Open h-vetinari opened 2 years ago

h-vetinari commented 2 years ago

I just ran into some very strange errors in https://github.com/conda-forge/staged-recipes/pull/16888

tensorflow_addons/metrics/tests/matthews_correlation_coefficient_test.py:21: in <module>
    from sklearn.metrics import matthews_corrcoef as sklearn_matthew
../[...]/lib/python3.8/site-packages/sklearn/__init__.py:83: in <module>
    from .utils._show_versions import show_versions
../[...]/lib/python3.8/site-packages/sklearn/utils/_show_versions.py:12: in <module>
    from ._openmp_helpers import _openmp_parallelism_enabled
E   ImportError: dlopen: cannot load any more object with static TLS

After some googling, it seems that this is a problem plaguing many users (especially with pytorch/tensorflow). I'm admittedly out of my depths with this, but the following explanation seemed pretty good:

As long as PyTorch has a dependency on libgomp.so with static TLS, there is literally nothing we can do if some of our users decide to import a bunch of third-party libraries that have dynamic TLS, without importing libgomp. They'll gobble up all of the DTV space and libgomp will fail. Note that we exacerbate the problem by depending on libraries ourselves which have dynamic TLS, so that the ceiling is lower, but if the user imports enough libraries they will hit this problem, no matter how much or little TLS we use.

This has some unfortunate side effects like how changing the import order between libraries will make the error appear / go away. Such kinds of accidents lead to the proliferation of unfortunate (because: randomly working or not) advice of e.g. to uninstall a conda-package and reinstall it from pip.

There's apparently a glibc fix for this since 2015 / glibc 2.22. Unfortunately, not even moving to CentOS 7 (#1436) would help with that, so - coming back to the original quote above - I wanted to ask:

is it possible for conda-forge to consistently enforce dynamic TLS in libgomp?

Not sure if that's possible or even a good idea, but I wanted to raise this issue so that conda-forge users don't run into such cryptic problems - if there's a way to avoid it.

h-vetinari commented 2 years ago

Now cvxpy on aarch is failing broadly due to this. I'm disabling the aarch tests there. Comments or inputs welcome.

h-vetinari commented 2 years ago

Actually, I had already raised an issue for this at the time... There's also a bit of discussion in https://github.com/conda-forge/cvxopt-feedstock/issues/55, including a handy reference from Isuru.