Open jdblischak opened 7 months ago
TileDB-Vector-Search is also being updated to not always copy the shared objects into the wheel
https://github.com/TileDB-Inc/TileDB-Vector-Search/pull/361 https://github.com/TileDB-Inc/tiledb-vector-search-feedstock/pull/35
Now that tiledb-vcf-feedstock was updated to 0.32.0 (https://github.com/TileDB-Inc/tiledb-vcf-feedstock/pull/120), which was the first to use the new scikit-build-core setup, the tiledbcf-py conda binaries ballooned in size since they now vendor libtiledb, libtiledbvcf, and htslib.
As a concrete example, linux-64/tiledbvcf-py-0.31.1-py39h1dd0e15_0.conda is 2.0 MB and linux-64/tiledbvcf-py-0.32.0-py39h59b0bc9_0.conda is 9.4 MB.
Not only does this duplication increase the size of our cloud Docker images, but it will complicate future libtiledb updates. If we release libtiledb 2.23.1, all the other conda binaries will automatically use the new libtiledb 2.23.1, but presumably tiledbvcf-py will continue to use its vendored libtiledb 2.23.0.
but presumably tiledbvcf-py will continue to use its vendored libtiledb 2.23.0.
That is problematic. We need to do our best to ensure a single libtiledb is used and loaded. This simplifies for passing different structures back and forth in python (i.e creating a tiledb.Config
and using it as a parameter to tiledbvcf.ReadConfig(tiledb_config
)
I suspect this is the cause of the user reported error in https://forum.tiledb.com/t/tiledbvcf-installation-error-on-macos/710
When tiledbvcf-py is built, it builds against whatever libgoogle-cloud version that upstream tiledb is currently pinned to. It then vendors this libtiledb.dylib
inside the Python package. Then when the upstream tiledb conda binary migrates to a new version of libgoogle-cloud, it gets installed into the conda env, but the bundled libtiledb.dylib still expects the libgoogle-cloud symbols from whatever previous version it happened to be built against.
Hence the short-term solution is to downgrade libgoogle-cloud until you find the compatible one that your verison of tiledbvcf-py was built against (Azure doesn't keep old build logs, so trial and error is the only option).
Long-term we need to stop copying the shared objects into the Python package, like we've already done for TileDB-Vector-Search (https://github.com/TileDB-Inc/TileDB-Vector-Search/pull/361) and TileDB-Py (https://github.com/TileDB-Inc/TileDB-Py/pull/1988). Maybe @dudoslav can help with this
There are various situations where we want to be able to build tiledbvcf-py against an existing external
libtiledbvcf.so
:This is the same situation that we previously addressed for tiledbsoma-py in https://github.com/single-cell-data/TileDB-SOMA/pull/1937 and https://github.com/single-cell-data/TileDB-SOMA/pull/2221. Unfortunately tiledbsoma-py uses
setup.py
, so I can't directly apply the previous solution to the scikit-build-core setup we are now using for tiledbvcf-py.I think two things need to happen:
RUNPATH
(so thatlibtiledbvcf.cpython-3XX-x86_64-linux-gnu.so
can still find the externallibtiledbvcf.so
at runtime)Here is a reprex to demonstrate the current shared object copying behavior:
xref: #701, #702