AcademySoftwareFoundation / OpenColorIO

A color management framework for visual effects and animation.
https://opencolorio.org
BSD 3-Clause "New" or "Revised" License
1.74k stars 431 forks source link

Linux PyPI wheels are incompatible with VFX platform due to ABI issue #1923

Closed doug-walker closed 5 months ago

doug-walker commented 6 months ago

The current Linux wheels for PyPI are based on manylinux2014 and are built on CentOS 7, which means they use the old libstdc++ ABI. Therefore they don't conform to the VFX Reference Platform for CY2023 and CY2024.

It turns out that even if a library does not use C++ std types in its API, it still may cause crashes. In the case of OCIO, this is because of the way std::regex is implemented in the standard library. (OCIO uses std::regex in several internal functions.)

This may cause crashes when the OCIO Python bindings are used in applications built with the new libstdc++ ABI to comply with the current VFX Platform, or in Python on recent versions of Linux, where the new ABI is now the default.

We should update the wheels to include one of the more recent manylinux options.

remia commented 6 months ago

Thanks for the report @doug-walker, do you have an example of Linux distribution / version we could use to reproduce the issue (in Docker) or any recent version of the majors distro will do?

remia commented 5 months ago

I went ahead and opened a PR #1933 to add generation of manylinux_2_28 Python wheels (alongside manylinux2014), this can be tested from test PyPI with version 2.4.0dev0 https://test.pypi.org/project/opencolorio/. I'm not exactly sure how to confirm it is indeed using the new API, but it seems that this new wheel doesn't contain symbol related to the old string implementation (std::__cow_string::__cow_string*) which is probably a good sign.

I tried to reproduce the issue using some of the latest Linux distro available as Docker images but couldn't find any problem yet, I assume pip wouldn't allow the download of incompatible binary wheels anyway? I guess the main problem occurs when trying to use the dynamic library inside the wheel to link against code using a different ABI. Or when trying to import multiple packages interacting into a Python script with incompatible ABIs? To @doug-walker point about std::regex and the linked SO question, I guess some non trivial things may occur when trying to mix old and new ABI within a single process.

Tagging @JeanChristopheMorinPerso in case you have more clues, is this something that has been discussed in the ci wg?

JeanChristopheMorinPerso commented 5 months ago

Using manylinux_2_28 seems like the solution here. We could also force -D_GLIBCXX_USE_CXX11_ABI=1 in the CXXFLAGS on top of using manylinux_2_28 just to be extra explicit. Though, I don't think it's necessary. Using manylinux_2_28 should be enough.

I assume pip wouldn't allow the download of incompatible binary wheels anyway?

Yep, installers (or pip) will only download wheels that is compatible the the installed glibc. Note that it's only looking at glibc's version and not if the wheel is compiled with the new CXX11 ABI or not. Pip only looks at the wheel name to determine that it should install, and the CXX ABI isn't encoded in the filename.

I'm not exactly sure how to confirm it is indeed using the new API

You can check if std::__cxx11 symbols are defined/present in the compiled libraries. See https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html.

I guess the main problem occurs when trying to use the dynamic library inside the wheel to link against code using a different ABI.

This would worry me since users shouldn't link against the library in the wheels (libOpenColorIO.so).

Or when trying to import multiple packages interacting into a Python script with incompatible ABIs?

Probably this yes. I'm guessing here though since I don't deal with this kind of deal very often. What I know is that we shouldn't mix both ABIs.

is this something that has been discussed in the ci wg?

Not that I'm aware of, at least not this GitHub issue. The vfxplatform and wg-ci are somewhat related but are different.

remia commented 5 months ago

You can check if std::__cxx11 symbols are defined/present in the compiled libraries. See https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html.

Yes that's what I thought, but the current manylinux2014 wheel already has lots of std::cxx11 symbols beside the std::cow_string::__cow_string* ones, I'm not exactly sure why that is the case. At least the latter cow symbols disappeared now so I bet we are good!

Thanks for chiming in JC, looks like we are on the good track :)

doug-walker commented 5 months ago

Thank you Remi and Jean-Christophe! Once these are merged, I will try to have the person who reported the crash test again with the new binding.

doug-walker commented 5 months ago

We verified that Remi's new wheels fix the crash that was reported. Thank you to Shunji Yokozawa and Wayne Arnold for their help!