conda-forge / rdkit-feedstock

A conda-smithy repository for rdkit.
BSD 3-Clause "New" or "Revised" License
8 stars 22 forks source link

Improve packaging; split python bits from lib, add run-export; make `-dev` count #154

Closed h-vetinari closed 2 months ago

h-vetinari commented 6 months ago

Currently all the C++ libraries are duplicated between various python versions, because rdkit is a monolithic build per python version.

A better setup would be to have librdkit as the C++ bits, with a proper run-export (that we can also add to the pinning), and upon which rdkit can then depend.

Also, the current rdkit-dev is useless, as it brings in 1:1 the same content as rdkit itself (i.e. it's an content-less package depending on rdkit)

greglandrum commented 6 months ago

This sounds reasonable to me, and if it's possible to be better citizens in the conda-forge ecosystem, it would great to do so. @jaimergp @pstjohn @mcs07 : I don't know how to do what @h-vetinari is requesting. If any of you do know how to do this and can take it on, please let me know. Otherwise I will try to make a block of time to figure it out.

@h-vetinari : if this is something you can easily do, a PR would be welcome. :-)

mcs07 commented 5 months ago

Sounds reasonable to me too, but I'm very out of the loop on current best practices here. I guess the libboost/libboost-python split in the boost feedstock might be a useful example to follow. However, correct me if I'm wrong, but my understanding is there's no RDKit cmake configuration to just compile the python bindings by pointing to an existing C++-only RDKit installation? So to achieve this, the conda build for each python might still have to do the full RDKit compile then some hacks to only install the python-specific parts (maybe just a post-install cleanup of lib/libRDKit*.so).

My recollection with rdkit-dev is that it does in fact differ on windows (or at least it used to). I can't remember if it was intentionally added for all platforms for consistency or this happened accidentally. Regardless, is there any harm in preserving it now for backwards compatibility? Looks like the boost feedstock similarly has a bunch of packages like this due to past renaming.

greglandrum commented 5 months ago

However, correct me if I'm wrong, but my understanding is there's no RDKit cmake configuration to just compile the python bindings by pointing to an existing C++-only RDKit installation? So to achieve this, the conda build for each python might still have to do the full RDKit compile then some hacks to only install the python-specific parts (maybe just a post-install cleanup of lib/libRDKit*.so).

Yeah, the only way to get the Python build is to do the C++ build as well. But I believe there's some way in the conda-forge infrastructure to construct multiple install artifacts from a single build, so I don't think there should be wasted work

jaimergp commented 5 months ago

You can create multiple outputs from a single "build script", let's say, so you can do "Run CMake on this repo" followed by "this output gets the .so files" and "this other output gets the .py files".

However, I don't know if we can have a single job build all the python versions. Usually we have one Python version per job. So we would have to build the .so files in all builds, but only package the librdkit bits in one of them.

h-vetinari commented 5 months ago

However, I don't know if we can have a single job build all the python versions.

Yes, that's very possible. It's how boost/arrow/grpc etc. do it. As long as not all outputs (including the "global" build stage, i.e. what is before outputs:; where build.sh / bld.bat is run) contain a python dependency, the CI jobs will be collapsed into one per platform, and only the python-dependent outputs get executed multiple times (one per python version).

However, that presupposes that we can build the two things (C++ lib & python bindings) separately. Otherwise things get messy.

h-vetinari commented 5 months ago

Yeah, the only way to get the Python build is to do the C++ build as well.

It's possible that these are the current assumptions of the upstream build scripts, but it's usually not such a big deal to patch this out and point to an existing C++ lib (built from the same tarball, so we don't have to worry about consistency).