CadQuery / ocp-build-system

A system to build Python wheel PyPI packages for OCP.
Apache License 2.0
8 stars 11 forks source link

Fixing 7.7.0 manylinux build #10

Closed fpq473 closed 1 year ago

fpq473 commented 1 year ago

Following on from #7, it looks like the Linux wheel built, but it's broken. The only fix I can think of is to bump the manylinux requirement from manylinux_2_31 to manylinux_2_35 (fedora 36+, ubuntu 22.04+, etc.). I'm inclined to say this isn't too restrictive and we should just go ahead. @jmwright Thoughts?

Underlying issue (I think)

Roughly, I think the issue is as follows. With 7.6 we were bundling the VTK 9.1 manylinux wheel, which required manylinux_2_31, to avoid using VTK from conda, which required manylinux_2_35. (Back then, manylinux_2_35 wasn't even defined. It is now.) This replacement no longer works with 7.7, and so we are forced to use VTK from conda, which forces us to use manylinux_2_35.

Not that I really understand this, but in case it's useful to others, one observed difference between manylinux and conda VTK is as follows:

manylinux

nm -CD vtkmodules/libvtkCommonCore-9.2.so | grep GetObjectDescription
000000000065b550 T vtkObjectBase::GetObjectDescription() const
0000000000659320 T vtkObject::GetObjectDescription() const

conda

nm -CD lib/libvtkCommonCore-9.2.so | grep GetObjectDescription
00000000005bc3f0 T vtkObjectBase::GetObjectDescription[abi:cxx11]() const
00000000005ba570 T vtkObject::GetObjectDescription[abi:cxx11]() const
                                                   ^^^^^^^^^

This difference is perhaps causing the error:

ImportError: /usr/share/miniconda3/envs/cadquerytest/lib/python3.10/site-packages/cadquery_ocp.libs/libTKIVtk-d72fb669.so.7.7.0:
    undefined symbol: _ZNK9vtkObject20GetObjectDescriptionB5cxx11Ev

The undefined symbol demangles to vtkObject::GetObjectDescription[abi:cxx11]() const. So it appears that opencascade (from conda) is expecting an "abi:cxx11" VTK, which the manylinux VTK isn't. This conflict is also consistent with various discussions of whether manylinux wheels should be compiled with _GLIBCXX_USE_CXX11_ABI=1 (example).

So the conclusion is that there is no easy way to keep manylinux_2_31 compat. We can revisit if VTK manylinux wheels are made to be "abi:cxx11".

jmwright commented 1 year ago

I'm inclined to say this isn't too restrictive and we should just go ahead. @jmwright Thoughts?

Well, that's a tough one. My laptop (Ubuntu 20.04) will stop being able to install CadQuery via pip when it requires OCP 7.7.0. To buy us some time, we could release the 7.7.0 alpha, but keep CadQuery on the 7.6.3 release. I need to do a CadQuery release this month, and so I'm thinking about leaving the OCP version pinned to 7.6.3. It would be nice to push CadQuery to 7.7.0 because it fixes a few issues, but it also introduces at least one regression (https://github.com/CadQuery/cadquery/issues/1229). A drawback is that people could be running 7.7.0-alpha in conda and 7.6.3 in pip and there will be some differences in modelling results (in certain situations).

Roughly, I think the issue is as follows. With 7.6 we were bundling the VTK 9.1 manylinux wheel, which required manylinux_2_31, to avoid using VTK from conda, which required manylinux_2_35. This replacement no longer works with 7.7, and so we are forced to use VTK from conda, which forces us to use manylinux_2_35.

So, the solution is to build our own VTK wheels on Ubuntu 20.04 or earlier?

fpq473 commented 1 year ago

Yes, and not only that, it would have need to set _GLIBCXX_USE_CXX11_ABI=1 in order to use opencascade from conda. I believe currently:

and assuming it's possible, we need to build with an older compiler and _GLIBCXX_USE_CXX11_ABI=1. I'm also not confident at all that this is sufficient; other dependencies may need be recompiled similarly.

It appears (but definitive source not found) that manylinux_2_24 wheels tend to have _GLIBCXX_USE_CXX11_ABI=1. So our problem might be solved if the VTK folks upgrade their wheels from manylinux_2_17 (ubuntu 13.10+) to manylinux_2_24 (ubuntu 16.10+).

jmwright commented 1 year ago

I have been able to get a working local VTK build per the instructions here. There is a site-packages directory that gets built which matches my installed Python version: vtk_build/dev/vtk/build/lib/python3.8/site-packages/

There is also this section of the OCCT documentation about building VTK for OCCT.

So, my initial thought is to create a vtk-build-system repo which builds VTK from scratch the way we need it. I would need help getting that packaged into a cadquery-vtk wheel the way it's done in this repo though. As you note, there may also be complications with other dependencies that we don't know about yet.

fpq473 commented 1 year ago

Sounds like a plan. We should be able to package it simply by installing it as part of the build workflow: https://github.com/CadQuery/ocp-build-system/blob/5f812e89332dbad0c5e2eed8489887c0752582c3/.github/workflows/build-wheel.yml#L73

In case it helps you, I found some mentions of manylinux in the VTK project. It's possible that's how their builds work, and you may want to leverage that.

fpq473 commented 1 year ago

@jmwright [Moving the discussion back here] I took your VTK artifacts and built manylinux_2_31 wheels for 7.7.0 alpha. OCP imports properly in all 3.x; I ran pytest in 3.10. Action

One issue is that the VTK wheels produced here are almost 400MB, while those from PyPI are under 100MB. This makes the cadquery-ocp wheel close to 500MB instead of under 200MB.

jmwright commented 1 year ago

Looks like the build defaults to Debug, so I added a cmake option that should set it to Release. We will see how much of a difference that makes in the size of the resulting wheel.

https://github.com/CadQuery/ocp-build-system/actions/runs/4029788896

fpq473 commented 1 year ago

Thanks, the linux wheels are ~141 MB now. Feel free to try them out.

https://github.com/CadQuery/ocp-build-system/actions/runs/4030163165

Should we just build for the other platforms and release to PyPI?

jmwright commented 1 year ago

I downloaded the Python 3.8 wheel and tried to install it into a Python 3.8 venv, but got an error.

$ pip install ~/Downloads/cadquery_ocp-7.7.0a0-cp38-cp38-manylinux_2_31_x86_64.whl
ERROR: cadquery_ocp-7.7.0a0-cp38-cp38-manylinux_2_31_x86_64.whl is not a supported wheel on this platform.

I tried pip3 as well and got the same error.

I seem to have the minimum glibc version:

$ ldd --version
ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31

Am I doing something wrong when trying to install it locally?

fpq473 commented 1 year ago

You may have to upgrade pip first -- older pips don't understand certain manylinux tags.

jmwright commented 1 year ago

Ah yes, I should have thought of that. pip has to be upgraded oftentimes to install cadquery-ocp from PyPI.

I was able to build an environment on my Ubuntu 20.04 machine, install the appropriate wheel from this CI, install all the other CadQuery deps, and run all the CadQuery tests. I got no failures.

I would agree that it's time to build the 7.7.0a0 packages for all platforms so we can get them published to PyPI.

jmwright commented 1 year ago

We should probably check to make sure our MacOS wheels don't have a similar versioning problem. We're building on macos-latest and I think it's possible to use a macos-11 runner. I'll try to borrow a Mac tomorrow to make sure our wheel will install on it before we publish the wheels.

jmwright commented 1 year ago

Ah, it looks like the other runners are trying to pull the Linux VTK wheels in.

fpq473 commented 1 year ago

Fixed, run here: https://github.com/CadQuery/ocp-build-system/actions/runs/4035304342

jmwright commented 1 year ago

I tried the wheel on MacOS 10.14.6, and even with upgrading pip, the wheel refuses to install. The wheel is labelled 10.15, so it must be too new. I'm going to try changing the runner to macos-11 to see if that fixes the problem.

fpq473 commented 1 year ago

If macos-11 install works, I think you can just upload to PyPI.

Then this requirement in cadquery can change to something like cadquery-ocp>=7.7.0a0,<7.8.

    reqs = [
        "cadquery-ocp>=7.6,<7.7",
        "ezdxf",
        "multimethod>=1.7,<2.0",

https://github.com/CadQuery/cadquery/blob/60fdbea0e0c85946b44ef31fdf7c7f62d0bf781d/setup.py#L29

jmwright commented 1 year ago

I think people will still have to use the --pre option though if we set the version requirement up that way, and CQ should run with either 7.6.3 or 7.7.0a0 right now. I would suggest that we make it cadquery-ocp>=7.6,<7.8 and allow people to use the --pre option if they want Python 3.11 support or they need the features/fixes in 7.7.0a0. Thoughts?

fpq473 commented 1 year ago

I'll test after 7.7.0a0 hits PyPI and let you know what I find.

But are you saying that github master is compatible with both 7.6 and 7.7?

What about the forthcoming 2.2.0 release (hooray, btw)? Just 7.6, or both 7.6 and 7.7?

jmwright commented 1 year ago

I need to test it, but I think the current master and the 2.2.0 release coming in a few days should work with either 7.6 or 7.7.

jmwright commented 1 year ago

So even when I run the wheel build on MacOS 11, the wheel inside the ZIP file still has a filename that includes 10_15. If I manually change that part of the wheel name to be 10_11 the wheel installs fine on 10.14.6.

I'm also wondering if we should change windows-latest to windows-2019. I would want to test that on a Windows 10/11 machine, but I think that should work, and might even solve the problem one user had with the CQ on Windows 7 (#8).

lorenzncode commented 1 year ago

Nice work.

Just 7.6, or both 7.6 and 7.7?

I'd suggest to be consistent with conda - require OCP 7.7: https://github.com/CadQuery/cadquery/blob/60fdbea0e0c85946b44ef31fdf7c7f62d0bf781d/conda/meta.yaml#L18 Not all tests will pass anymore with 7.6.

jmwright commented 1 year ago

Thanks for the info @lorenzncode . I wonder if that means we'll need to keep the --pre option on the pip install instructions once CQ 2.2.0 is released, or if we only need that when the top level package is a pre-release.

@fpq473 I think this warning might explain the 10_15 in the wheel file name. I'll try changing the runner to macos-12 to see if I can get rid of that warning. I'll change the Windows runner to windows-2019 in the process.

jmwright commented 1 year ago

@fpq473 I finally stopped messing around with the MacOS version. Totally not a Mac guy, and I minunderstood how old the 10.15 version was. It looks like we're limited by the conda packages which were built against 10.15, which was released in 2018. Unfortunately, that will keep me from testing on the Mac that I was borrowing, but I think it should be good for the majority of users.

I also tested the Python 3.8 Windows wheel and it worked fine. I'll create a release tomorrow morning.

jmwright commented 1 year ago

The release is available, but users will need to use pip install --pre cadquery-ocp or pip install cadquery-ocp==7.7.0a0 to install it since it's a pre-release.

https://pypi.org/project/cadquery-ocp/7.7.0a0/

Thanks for all the help @fpq473 . I think we made some good improvements to the wheel builds. I'm going to close this issue and turn my attention to the main CadQuery release now.