ariovistus / pyd

Interoperability between Python and D
MIT License
159 stars 32 forks source link

Distributing PyD-produced binaries #54

Open lomereiter opened 8 years ago

lomereiter commented 8 years ago

Hello,

I figured this issue tracker is a good place to spark this discussion.

Let's imagine that someone fiddles enough with PyD to create something useful. How is the produced library to be distributed? The options are seemingly as follows:

  1. Kindly ask users to install a D compiler
  2. Use rpath and include libphobos2.so into the package, making its size rather huge
  3. Link statically against custom version of Phobos compiled with -fPIC
  4. Put libphobos2-{dmd,ldc,gdc}.so package to Anaconda/PyPI

As far as I understand, though, options 2 and 3 are fine only as long as the user doesn't load two such packages simultaneously because of GC and such.

Option 1 is rather outrageous these days, now that even numpy and scipy finally have manylinux1 wheels so that one doesn't need to compile them. That's not the biggest issue, though: the nightmare will really begin when different packages start popping up with different D frontend version requirements.

Option 4 looks the most reasonable; with regard to PyPI it feels somewhat of an abuse, though I suppose we're to see more and more such unintended usage in the future. There is a small catch, though: manylinux1 PEP puts requirements on the system library versions, namely, the binary must be compatible with the rather old CentOS 5.11 which is still in use on many clusters; druntime currently relies on qsort_r from newer glibc 2.8. That said, I've got a patch to handle this (required versions can be checked by running readelf -V libphobos2.so | grep 'Name:').

Any thoughts?

ariovistus commented 8 years ago

Ugh, I don't want to think about this now, dmd is the only compiler that is rigorously capable of compiling pyd right now (well, 6-8 months ago when I last checked), and distributing dmd's phobos might run into legal issues. Or it might not, idk.

But suppose we lived in a perfect world: what do we want when we have just written the python extension dawesomeness? We want to be able to upload it as bdist to pypi so that others can

pip install dawesomeness

and it just works, on windows, mac, as many flavors of linux as is reasonable, without needing to set off a local build. We want others to be able to use dawesomeness alongside dslurpiness, a pyd extension someone else wrote and uploaded.

So back in the real world, when you have dawesomeness and dslurpiness binary distributions, the first questions are which compiler did you compile each with, and if same, which version of the compiler did you compile them with? If compilers are different, things may work out, but I have not tried that and don't know. If compilers are the same, but the versions are different, then there might be problems. As I understand it, the compiler version is tightly coupled to the standard library version. So dawesomeness depends on libphobos2.so.0.68.2 and dslurpiness depends on libphobos2.so.0.72.0, and now you can't use both libraries in the same python program due to symbol conflicts, package conflicts, etc.

If it were possible to compile with dmd v2.72.0 and target runtime 2.68.2, then it might be reasonable to specify a standard runtime level that pyd wheels should target. Then option 4 becomes a bit more viable. Assuming the glibc compatibility constraint is met.

Next question: what happens when dawesomeness depends on libphobos2.so.0.69.1 and the user has libphobos.so installed on their system from somewhere other than pypi? What happens if the version number is the same? what happens if it is different?

Next question: what happens when dawesomeness is installed into a virtualenv?

lomereiter commented 8 years ago

Regarding your symbol conflict concerns: this issue is easy to solve at least on Linux/OS X with RPATH/@loader_path, whereby shared library dependencies can be encoded inside an executable/library as absolute/relative paths so that package managers are able to guarantee the correct versions will be loaded. That's how Nix and Guix work. In fact, Conda also follows this approach (docs).

Licensing concerns could be avoided by forcing developers to build code with LDC, it keeps up with DMD's innovations quite well.

The real issue is avoiding two copies of runtime loaded at the same time. And for that I find your idea of targeting a specific runtime quite reasonable. If druntime and phobos are separate, and druntime version is handled by package manager, it's entirely possible to use different versions of phobos. HACKING.md in druntime also suggests that it should be possible with stabilized ABI and name mangling scheme.

ariovistus commented 8 years ago

Handy. I would prefer ldc be the standard but for some dumb blockers (#43).

pyd extensions are still going to depend on phobos, and packages depending on different versions of phobos are still going to prevent them from being used side by side, unless symbols of dependencies of dependencies get namespaced, no idea if that's how things work or not.

Failing that, we could work towards making pyd less dependent on phobos. Pyd uses a lot of metaprogramming apis, and it also provides conversion support for a bunch of phobos types. The former would have to be pulled into pyd, the latter might have to be dropped. Not liking the sound of any of this.

ariovistus commented 8 years ago

Some homework for me:

ariovistus commented 8 years ago

Another question that comes to mind: how do things get distributed in d-land? Dub didn't do binary distributions last I checked, so I would assume you're left to your distro's package manager for *nixen and nothing or chocolatey on windows.

lomereiter commented 8 years ago

Removing phobos from dependencies doesn't make any sense. It's definitely possible to use two different copies of same symbols simultaneously, even if it requires some fiddling with linking flags. I had original FFTW and MKL-version of it (inside numpy/scipy) working at the same time; with static linking it required -Bsymbolic on my side, with dynamic hopefully just setting rpaths would be enough.

I guess binary distributions in d-land don't exist a) for the same reason this discussion is taking place; b) because dmd is fast enough, and most projects are relatively small; c) libraries are often full of templates anyway.