aleaxit / gmpy

General Multi-Precision arithmetic for Python 2.6+/3+ (GMP, MPIR, MPFR, MPC)
https://gmpy2.readthedocs.io/en/latest/
GNU Lesser General Public License v3.0
516 stars 86 forks source link

release? #146

Open videlec opened 7 years ago

videlec commented 7 years ago

Is there any plan for alpha, beta, stable release?

Vincent K. already prepared all the integration in pplpy and SageMath (see ticket 22927 and ticket 22928). We are just waiting an official release to move on.

isuruf commented 6 years ago

For the wheels, I'd prefer building shared libraries, but at the moment, OSX wheels with delocate don't support single extension module (with no pure python) packages. (See https://github.com/matthew-brett/delocate/issues/22)

jdemeyer commented 6 years ago

Because with wheels:

C API is disabled
having proper linkings in the cythonization (with libgmp) is a nightmare
Any constructive solution would of course be better but that could be postponed for the next release.

You are of course right. This all makes me think of a similar packaging discussion that we had for cypari2: https://github.com/defeo/cypari2/issues/19

jdemeyer commented 6 years ago

Personal rant: in my opinion, wheels are broken by design and shouldn't be used for these use cases. The real problem is that people expect pip install gmpy2 to work without a compiler and without having the GMP library installed. I think that binary packaging should be left to actual distros (linux distros, Conda, Cygwin) and Python shouldn't try to badly replicate binary distribution.

jdemeyer commented 6 years ago

For my own Python packages, I solve this problem by not shipping wheels at all.

casevh commented 6 years ago

My opinion is that pip is broken - it should prefer source installation by default and only use wheels if requested (or if installation from source fails). But pip is broken in other ways, too.

I may revert to the distribution method used in 2.0.x. I distributed source and Windows wheels. I'd prefer not to distribute wheels at all but compiling on Windows is just too hard and the python.org versions are just too common.

I've also thought of making gmpy2 distributed as exclusively source only and creating a separate PyPi project called win-gmpy2. Then installing Windows wheels would just be "pip install win-gmpy2".

pkittenis commented 6 years ago

If I may contribute a few things, appreciate this is closed, information may be useful regardless.

TL;DR, for binary wheels to work, few things must be done:

I'd be happy to contribute to some/all of these, as long as you are clear on what the end result should be and why these changes are necessary.

Another other option is to forgo wheels entirely and build with conda. This is increasingly used by data scientists as it gives you an entire new toolchain (compiler/libraries etc) on all platforms and simplifies the whole building/bundling thing. It does not work for PyPi distributed binary wheels though, have to use conda. There are already conda packages for gmpy dependencies that simplify things further.

Both conda and pypi wheels could be done, but obviously duplicates effort. Really depends on if you want binary stuff available on PyPi to be installed by pip - the standard python installation method - or not.

jdemeyer commented 6 years ago

python modules consisting only of a single extension and no python package do not work

Why? There is no fundamental difference between an extension module and a plain Python module.

casevh commented 6 years ago

If I may contribute a few things, appreciate this is closed, information may be useful regardless.

Hi!

What wheels aim to solve is the problem of distributing binary native code extensions on PyPi. That necessitates a standard way of building and shipping both the native code Python extension and all its dependencies. As any Linux programmer can appreciate, this is a minefield due to how many different versions of GCC/glibc there are. The manylinux1 platform specifically targets Centos5 to maintain compatibility with the largest proportion of linux systems (not all). To the end user, from source installations that require development toolchains, dependencies, header files et al is a huge pain in the rear, if it even works. User is then required to do this on every machine they want to install it on which is even worse. Binary wheels otoh install in seconds and work out of the box.

This is an admirable goal.

Building binary wheels for use with the standard CPython interpreter distributed by python.org requires that the extensions are built with the same compiler as the interpreter on non-Linux platforms. This makes building windows wheels a huge PITA.

It is true that compiling arbitrary C code requires the same compiler and C runtime. But it is not a requirement enforced by CPython. If you avoid the (unfortunately) undocumented differences in the C runtime versions, you can use a different compiler version. I've been very careful in gmpy2 to use the appropriate memory management calls (regardless of the platform, any memory allocated in GMP/MPFR/MPC must be de-allocated by GMP/MPFR/MPC). But I do have access to the older Microsoft compilers and do make my Windows builds with the matching compiler. I do agree that compiling on Windows is a PITA.

From experience (most of my projects involve native code extensions, some wrappers for native libraries like gmpy), python modules consisting only of a single extension and no python package do not work - this is what gmpy is currently doing.

From what I've been able to find in the my research, this requirement (Python stub loading an extension) was added to support importing an extension from a zip file. But it breaks backwards compatibility - the shared object file is now in a different location and makes upgrading from a pure distutils installation to a setuptools installation fail. There may be others reasons, but that is the only one I found. gmpy/gmpy2 have always been a single shared library.

Most projects bundle a python package with an internally used extension so that python packaging tools designed around.. python packages, will work correctly. This is pretty obvious if you think about it but catches people off guard quite a lot.

I think this is a flaw with the packaging tools.

TL;DR, for binary wheels to work, few things must be done:

There is a bigger challenge with gmpy2 v2.1. There is now a C-API that allows other C or Cython extensions to interact with gmpy2 at a very low level. To avoid memory management issues, gmpy2 and the extensions must use the same shared library. I can't see how that is compatible with binary wheels. The C-API needs to be disabled in a binary wheel (done automatically by making a statically linked version of gmpy2). So we have a scenario where binary wheels and from-source compilations offer different capabilities. Combined with pip's default behavior of using a binary wheel, most installations will not support the C API.

I do not know how to resolve this dilemma.

Gmpy to be a python package with the extension as part of it, eg gmpy2 is the package, gmpy2._gmpy2 the extension. gmpy2 can import from gmpy2._gmpy2.whatever it wants to make available as a top level import. This is doable but will take some time to refactor/rename.

Dependencies need to be built as part of wheel build, either by setuptools itself (it can build external dependencies), or via external script and static or dynamic linking + bundling via delocate/auditwheel.

Again Windows is the bigger PITA here, but there are several boiler plate appveyor configs for python extensions, see this for an example.

I have the compilers required to build binary wheels for Windows.

The hacky stuff in setup.py to dynamically figure out library paths are.. hacky and only work for from-source installations. Extending setup.py in that way is therefore not encouraged as it breaks wheels.

I am not proud of the hacks.

I'd be happy to contribute to some/all of these, as long as you are clear on what the end result should be and why these changes are necessary.

See below.

Another other option is to forgo wheels entirely and build with conda. This is increasingly used by data scientists as it gives you an entire new toolchain (compiler/libraries etc) on all platforms and simplifies the whole building/bundling thing. It does not work for PyPi distributed binary wheels though, have to use conda. There are already conda packages for gmpy dependencies that simplify things further.

Both conda and pypi wheels could be done, but obviously duplicates effort. Really depends on if you want binary stuff available on PyPi to be installed by pip - the standard python installation method - or not.

Like I mentioned earlier, I can't think of a solution that will work for all use cases. I've thought of serveral options:

The Python ecosystem has migrated to a pip-based world. It is better for most projects. And I accept that gmpy2 will need to adapt to a pip/setuptools world. And I also need to realize that I have little free time available.

Here's my proposal:

The only preferences that I have on the scope of the changes is that:

I am open to other suggestions. I don't have a good answer. I'd like to continue adding new features to gmpy2 but managing releases has become too difficult for me.

I've created a new issue #176 to track these discussions.

isuruf commented 6 years ago

The C-API needs to be disabled in a binary wheel (done automatically by making a statically linked version of gmpy2).

FYI, binary wheels can be made with dynamic linking if as @pkittenis says, gmpy2.so was changed to gmpy2/_gmpy2.so and a gmpy2/__init__.py was made

casevh commented 6 years ago

The C-API needs to be disabled in a binary wheel (done automatically by making a statically linked version of gmpy2).

FYI, binary wheels can be made with dynamic linking if as @pkittenis says, gmpy2.so was changed to gmpy2/_gmpy2.so and a gmpy2/init.py was made

I understand that dynamically linked wheel is possible but how do we ensure that gmpy2 and some arbitrary other extension that uses the C API are both using the same shared library? With the same memory manager functions (since they can be change in GMP)? But what happens if gmpy2 and the other extension provide different dynamic libraries.

It may be just one of those "it usually works" but I don't know how the guarantee it. If we consistently link to the Centos 5 versions, it may work.

isuruf commented 6 years ago

I understand that dynamically linked wheel is possible but how do we ensure that gmpy2 and some arbitrary other extension that uses the C API are both using the same shared library?

Best way to avoid conflicts would be to avoid linking to gmp, mpfr and mpc libraries by those using the C API and load gmpy2 by doing import gmpy2 before loading their extension.

casevh commented 6 years ago

@isuruf Thanks for your comments.

I've given it more thought and dynamically linked wheels would probably work in some|many|most situations. But I just can't convince myself that we won't encounter issues in some scenarios.

I'll be okay with any solution that works easily for most people.

jdemeyer commented 6 years ago

For me personally, the conclustion is that wheels are broken and shouldn't be used. Build from source works great, conda works great, distro packaging works great. But I think that there is simply no clean solution to do this with wheels.

@pkittenis and @isuruf Do you any pointers to documentation or examples (preferably by the PyPA or other reputable sources) on how to deal with this?

isuruf commented 6 years ago

@pkittenis and @isuruf Do you any pointers to documentation or examples (preferably by the PyPA or other reputable sources) on how to deal with this?

On how to deal with what?

jdemeyer commented 6 years ago

On how to deal with what?

Wheels containing Python modules which dynamically link against libraries which are not contained in the wheel.

isuruf commented 6 years ago

I don't have any pointers to documentation. Do you have an example scenario about using wheels that I can help clarify?

isuruf commented 6 years ago

Maybe have a look at https://github.com/pypa/manylinux/blob/master/pep-513.rst

jdemeyer commented 6 years ago

That PEP is Linux only. What about other OSes?

jdemeyer commented 6 years ago

Do you have an example scenario about using wheels that I can help clarify?

As I said: a wheel which contains a Python module that depends on a dynamic library. For this project for example, a wheel which depends on libgmp

jdemeyer commented 6 years ago

And ideally, it should work in all cases where a build-from-source would work too.

isuruf commented 6 years ago

Have a look at "auditwheel" linked in the PEP above. ("delocate" for OSX wheels). For windows, tools are WIP.

jdemeyer commented 6 years ago

OK, and is there a "minimal working example", i.e. an example project showing how to use these tools properly?

isuruf commented 6 years ago

I've got one setup for gmpy2 here, https://github.com/isuruf/gmpy2-wheels which is based on https://github.com/matthew-brett/multibuild

jdemeyer commented 6 years ago

Have a look at "auditwheel" linked in the PEP above.

If I understand it correctly (it is very possible that I don't), this copies the dynamic library in the wheel and then forces the use of that specific library. So it doesn't deal properly with libraries already installed in the system.

isuruf commented 6 years ago

Yes, reasons are outlined at https://www.python.org/dev/peps/pep-0513/#bundled-wheels-on-linux

jdemeyer commented 6 years ago

Yes, reasons are outlined at https://www.python.org/dev/peps/pep-0513/#bundled-wheels-on-linux

So it's broken by design and not a solution...

pkittenis commented 6 years ago

| dynamically link against libraries which are not contained in the wheel.

Here is a PyPa manylinux1 demo repository that shows how this works for Linux - it links against cblas. It uses auditwheel as mentioned previously. The OSX equivalent is delocate. Both tools embed used shared libraries into the wheel and ensure they take precedence over system libraries by making the directories show up first in the embedded RPATH.

Any system libraries that may or may not be installed are not relevant - they are not used. For Windows static linking is the norm because of Windows semantics around dynamic loading.

Other libraries using gmpy2 can use the C-API with binary wheels as long as those wheels also use the same versions of embedded shared libraries. Have used this in other projects with no problems. A simple documentation statement saying as such should suffice. Worrying about C-API using developers doing the wrong thing for their project is not really a job for gmpy2 IMO - as long as the correct approach is clearly documented.

Can we please focus on what the approach should be moving forward rather than criticizing the wheel design - it's not changing either way. Its purpose is not to replace system package managers, it is to allow distributing binary packages on PyPi. If you want a cross platform package manager for all your packages, use conda. If you want system packages, make system packages.

In short once again - binary wheels with external dynamically linked dependencies bundle those dependencies in the wheel. System libraries of the same dependency are not used. Other libraries linking against that dependency at the C level should bundle the same version of the external dependency in their own binary wheels. From source builds have to build gmpy2 and the external library with the same external dependencies like any other dynamically linked library.

This is required for those libraries to build binary wheels regardless of what gmpy2 does. It is not possible to, for example, link against gmpy2's embedded libraries when building another gmpy2 using library.

Why? There is no fundamental difference between an extension module and a plain Python module.

To the developer perhaps not. To the packaging tools there is. This is more of a limitation of the packaging tools but none the less, that is the end result. You do not have to like it :)

As long as standard setuptools functionality is used this all works fine for from source builds (which can be forced by adding --no-binary to pip) as it does for binary wheels. By that I mean telling setuptools what libraries to link against etc. If you want to require that from source builds will always build against system libraries (and error out if one is not provided), that's all that is needed.

This is nothing specific to setuptools or binary wheels, same goes for any other shared library. You do not expect a Centos7 RPM to work on Centos6. Binary wheels save you the effort of building and maintaining a separate set of X many different system packages for all the platforms you want to support and specifically for Windows and OSX, there is no better built in tool as those platforms lack native package managers.

It is true that compiling arbitrary C code requires the same compiler and C runtime. But it is not a requirement enforced by CPython. If you avoid the (unfortunately) undocumented differences in the C runtime versions, you can use a different compiler version.

Yes, I was making a blanket statement for simplicity's sake. It's not just memory allocation BTW, there are differences in how long ints are implemented between GCC and Visual Studio for example that leads to weird errors. It is possible, but may or may not lead to hard to track errors down the line that makes it not worth the effort.

jdemeyer commented 6 years ago

| dynamically link against libraries which are not contained in the wheel.

Any system libraries that may or may not be installed are not relevant - they are not used.

I am a bit lost here... these two statements seem to contradict each other. It is possible or not to link against system libraries with a wheel?

pkittenis commented 6 years ago

Like was mentioned earlier - it is a normal build process with building and linking to whatever library is available on the system.

That particular library then gets embedded in the wheel. When a user installs the wheel, only the library that was embedded at build time is used, not any system libraries that user may have installed. See demo repository.

pkittenis commented 6 years ago

I have the compilers required to build binary wheels for Windows.

I am sure you do, but unless you want to be solely responsible for building and publishing wheels manually on each and every release, that whole process needs to be automated as part of an appveyor config, the only free CI supporting Windows. That's where the mostly boilerplate appveyor config and wrapper scripts come in.

This includes building all dependencies as well, so automated steps to do that would be very useful (If instructions include 'Open Visual Studio', that's not automated).

jdemeyer commented 6 years ago

I've got one setup for gmpy2 here, https://github.com/isuruf/gmpy2-wheels which is based on https://github.com/matthew-brett/multibuild

Can I find the actual resulting wheels somewhere? Are they the ones which are posted on https://pypi.python.org/pypi/gmpy2/2.1.0a1

casevh commented 6 years ago

I used https://github.com/isuruf/gmpy2-wheels for https://pypi.python.org/pypi/gmpy2/2.1.0a1.

The Windows wheels are ones I build locally.

On Wed, Jan 24, 2018 at 7:08 AM, jdemeyer notifications@github.com wrote:

I've got one setup for gmpy2 here, https://github.com/isuruf/gmpy2-wheels which is based on https://github.com/matthew-brett/multibuild

Can I find the actual resulting wheels somewhere? Are they the ones which are posted on https://pypi.python.org/pypi/gmpy2/2.1.0a1

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/aleaxit/gmpy/issues/146#issuecomment-360164640, or mute the thread https://github.com/notifications/unsubscribe-auth/ABkdb3DKzzaSDSCd_MakG1wQDDaMfDGVks5tN0d-gaJpZM4OBKyR .

isuruf commented 6 years ago

Those wheels were built with static linking libgmp, etc. I can also make wheels for dynamic linking libgmp, if you want