TileDB-Inc / TileDB-VCF

Efficient variant-call data storage and retrieval library using the TileDB storage library.
https://tiledb-inc.github.io/TileDB-VCF/
MIT License
85 stars 14 forks source link

Nightly build for macOS #579

Closed jdblischak closed 11 months ago

jdblischak commented 11 months ago

The nightly macOS build fails right at the end of the installation of the Python package (link to failed build on my fork). I don't understand why it is searching PyPI for a locally built package, and I wasn't able to find any obvious solution when searching the error message. Has anyone observed this error before?

Processing tiledbvcf-0.26.1.dev2-py3.9-macosx-10.9-x86_64.egg
creating /Users/runner/micromamba/envs/py4vcf/lib/python3.9/site-packages/tiledbvcf-0.26.1.dev2-py3.9-macosx-10.9-x86_64.egg
Extracting tiledbvcf-0.26.1.dev2-py3.9-macosx-10.9-x86_64.egg to /Users/runner/micromamba/envs/py4vcf/lib/python3.9/site-packages
Adding tiledbvcf 0.26.1.dev2 to easy-install.pth file
Installed /Users/runner/micromamba/envs/py4vcf/lib/python3.9/site-packages/tiledbvcf-0.26.1.dev2-py3.9-macosx-10.9-x86_64.egg
Processing dependencies for tiledbvcf==0.26.1.dev2
Searching for tiledbvcf==0.26.1.dev2
Reading https://pypi.org/simple/tiledbvcf/
Couldn't find index page for 'tiledbvcf' (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading https://pypi.org/simple/
No local packages or working download links found for tiledbvcf==0.26.1.dev2
error: Could not find suitable distribution for Requirement.parse('tiledbvcf==0.26.1.dev2')
jdblischak commented 11 months ago

Waiting on PR #580

jdblischak commented 11 months ago

This PR is ready for review

[  2%] Building CXX object tiledb/CMakeFiles/TILEDB_CORE_OBJECTS.dir/sm/array_schema/array_schema.cc.o
In file included from /Users/runner/work/TileDB-VCF/TileDB-VCF/TileDB/tiledb/sm/array_schema/array_schema.cc:58:
/Users/runner/work/TileDB-VCF/TileDB-VCF/TileDB/tiledb/../tiledb/type/apply_with_type.h:43:43: error: 'T' does not refer to a value
concept TileDBFundamental = std::integral<T> || std::floating_point<T>;
                                          ^
/Users/runner/work/TileDB-VCF/TileDB-VCF/TileDB/tiledb/../tiledb/type/apply_with_type.h:42:17: note: declared here
template <class T>
                ^
/Users/runner/work/TileDB-VCF/TileDB-VCF/TileDB/tiledb/../tiledb/type/apply_with_type.h:43:34: error: no member named 'integral' in namespace 'std'; did you mean 'internal'?
concept TileDBFundamental = std::integral<T> || std::floating_point<T>;
                            ~~~~~^~~~~~~~
                                 internal
/Applications/Xcode_13.2.1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/ios:960:1: note: 'internal' declared here
internal(ios_base& __str)
^
jdblischak commented 11 months ago

Actually, I think the failure to build dev libtiledb on macOS was expected. Starting with commit 9d79edd, the macOS Azure job on TileDB started failing with the same errors

[  8%] Building CXX object tiledb/CMakeFiles/TILEDB_CORE_OBJECTS.dir/sm/array_schema/array_schema.cc.o
In file included from /Users/runner/work/1/s/tiledb/sm/array_schema/array_schema.cc:58:
/Users/runner/work/1/s/tiledb/../tiledb/type/apply_with_type.h:43:43: error: 'T' does not refer to a value
concept TileDBFundamental = std::integral<T> || std::floating_point<T>;
/Users/runner/work/1/s/tiledb/../tiledb/type/apply_with_type.h:43:34: error: no member named 'integral' in namespace 'std'; did you mean 'internal'?
concept TileDBFundamental = std::integral<T> || std::floating_point<T>;
                            ~~~~~^~~~~~~~
                                 internal
/Applications/Xcode_13.2.1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/ios:960:1: note: 'internal' declared here
internal(ios_base& __str)
^
jdblischak commented 11 months ago

Updates:

Update: See https://github.com/TileDB-Inc/TileDB-VCF/pull/579#issuecomment-1745698982 for the failing test

Now the only failure is from test_dask.py when built against dev libtiledb

tests/test_dask.py ....                                                  [  8%]
Fatal Python error: Aborted

However, looking at the Python tests when built against release libtiledb, while they didn't fail, it looks like the dask tests were also aborted early:

tests/test_dask.py ....                                                  [  8%]
tests/test_tiledbvcf.py ........s............s.......................    [100%]
jdblischak commented 11 months ago

Actually, the dask tests fail to complete for all the builds in the matrix. It's just for some reason only the dev libtiledb build on macOS properly fails

update: nevermind. The 8% is the total of all the tests. This means that the 4 tests are passing even on the failed build, but then the threads aren't able to be shut down properly

tests/test_dask.py ....                                                  [  8%]
tests/test_tiledbvcf.py ........s............s.......................    [100%]

So I'm going to simply uninstall dask to skip these tests. They aren't related to the interface with dev libtiledb anyways

jdblischak commented 11 months ago

The problem is this test that uses an S3 URI, so there must be an issue with linking the AWS SDK for dev libtiledb on macOS

https://github.com/TileDB-Inc/TileDB-VCF/blob/1e4efd86dd171b21f8f054b67b06bd6f9198c3e5/apis/python/tests/test_tiledbvcf.py#L907-L908

ihnorton commented 11 months ago

@jdblischak is AWS SDK supposed to be coming from conda here, because it is installed in the conda environment, but is then being built separately:

2023-10-03T19:56:33.3359280Z -- Could NOT find AWSSDK (missing: AWSSDK_DIR)
2023-10-03T19:56:33.3456940Z -- Could NOT find AWSSDK
2023-10-03T19:56:33.3557690Z -- Adding AWSSDK as an external project
2023-10-03T20:24:28.6770090Z -- Found TileDB: /Users/runner/work/TileDB-VCF/TileDB-VCF/install/lib/libtiledb.dylib
jdblischak commented 11 months ago

is AWS SDK supposed to be coming from conda here, because it is installed in the conda environment, but is then being built separately:

@ihnorton No. The conda env is only used for the Python build

It's possible that the conda-installed AWS SDK libs are interfering with the Python tests (since libtiledbvcf and libtiledb were built outside of the conda env). But I doubt that is the problem because the only failure is with dev libtiledb on macOS.

jdblischak commented 11 months ago

In another branch, I switched from conda binaries to PyPI wheels. This works on macOS, but not Ubuntu (for some reason the link to libarrow_python.so is not found)

jdblischak commented 11 months ago

Another update from a CI run on my fork. When I install the Python dependencies from PyPI with pip, all the tests pass on macOS, even when using dev libtiledb. So maybe the conda binaries are interfering in some way (still strange though that there was no problem with the release libtiledb build).

https://github.com/jdblischak/TileDB-VCF/actions/runs/6408933363

But I can't migrate to PyPI wheels unless we can fix the linking on Ubuntu (unless we want to use conda for the Ubuntu nightly and PyPI wheels for the macOS nightly). I think the main problem is that libarrow_python.so is not found

$ ldd /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/tiledbvcf/libtiledbvcf.cpython-310-x86_64-linux-gnu.so
    linux-vdso.so.1 (0x00007ffe99979000)
    libtiledbvcf.so => /home/runner/work/TileDB-VCF/TileDB-VCF/install/lib/libtiledbvcf.so (0x00007fba27800000)
    libarrow_python.so => not found
    libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fba27400000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fba27adc000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fba27000000)
    libhts.so.1.15.1 => /home/runner/work/TileDB-VCF/TileDB-VCF/install/lib/libhts.so.1.15.1 (0x00007fba276e3000)
    libtiledb.so.2.18 => /home/runner/work/TileDB-VCF/TileDB-VCF/install/lib/libtiledb.so.2.18 (0x00007fba25800000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fba27b55000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fba27319000)
    libdeflate.so.0 => /lib/x86_64-linux-gnu/libdeflate.so.0 (0x00007fba27ab6000)
    liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007fba276b8000)
    libbz2.so.1.0 => /lib/x86_64-linux-gnu/libbz2.so.1.0 (0x00007fba276a5000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fba27689000)
    liblz4.so.1 => /lib/x86_64-linux-gnu/liblz4.so.1 (0x00007fba27669000)
    libzstd.so.1 => /lib/x86_64-linux-gnu/libzstd.so.1 (0x00007fba2724a000)
    libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x00007fba26f5c000)
    libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007fba25200000)

$ python -c 'import tiledbvcf; print(tiledbvcf.version)'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/tiledbvcf/__init__.py", line 31, in <module>
    from .dataset import ReadConfig, TileDBVCFDataset, Dataset, config_logging
  File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/tiledbvcf/dataset.py", line 8, in <module>
    from . import libtiledbvcf
ImportError: /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/tiledbvcf/libtiledbvcf.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN5arrow5fieldENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt10shared_ptrINS_8DataTypeEEbS6_IKNS_16KeyValueMetadataEE
jdblischak commented 11 months ago

I decided to be pragmatic for now. I install the Python dependencies with conda on Ubuntu and with PyPI wheels on macOS. I'll continue to investigate the issues I reported above

Note that the failed Azure linux build has nothing to do with this PR. The error appears to be related to mamba during setup, but I didn't investigate further

jdblischak commented 11 months ago

I've attempted to fix the mamba error in the Azure build. The error message is ImportError: /usr/share/miniconda/lib/python3.11/site-packages/libmambapy/../../../libmamba.so.2: undefined symbol: solver_ruleinfo2str, version SOLV_1.0, which according to the troubleshooting guide means that the conda-forge and defaults channels were mixed. However, after I updated all the packages in the base env to be installed from conda-forge, now conda can't even manage to install mamba in the first place.

Longer-term I recommend we migrate to micromamba, but I'd prefer if that was done in a follow-up PR