TileDB-Inc / TileDB-VCF

Efficient variant-call data storage and retrieval library using the TileDB storage library.
https://tiledb-inc.github.io/TileDB-VCF/
MIT License
83 stars 13 forks source link

tiledb-vcf-java jar doesn't include native libraries #655

Closed lynnjo closed 5 months ago

lynnjo commented 5 months ago

Hello - we have a project that includes tiledb-vcf to store our VCF data files. To use the Java API from within our project, I have compiled the TileDB-VCF and TileDB-Java jar files and included them in the build.gradle file. This is working fine when we run on a linux box.

I now want to compile and run on a MAC (with Intel chip, not ARM)

When I try to access the VCFReader, I get the error:

java.lang.UnsatisfiedLinkError: 'java.lang.String io.tiledb.libvcfnative.LibVCFNative.tiledb_vcf_version()'

Using "jar -tf" to look at the tiledb-vcf-java-0.28.0.jar created on the MAC, I see it does not include the native libraries show below. These ARE included when we've compiled for version 25 on a linux machine.

lib/
lib/libspdlog.a
lib/libtiledb.so.2.16
lib/libtiledbvcfjni.so
lib/libhts.so.1.15.1
lib/libtiledbvcf.so

The commands I am using on the MAC (with Intel chip, NOT M1) are:

(pre-run: set JAVA version to 11)
git clone https://github.com/TileDB-Inc/TileDB-VCF.git
cd TileDB-VCF/apis/java
./gradlew --debug shadowjar

I have also compiled and loaded the jar file for TileDB-Java using these commands:

git clone https://github.com/TileDB-Inc/TileDB-Java.git
./gradlew assemble

The jar file contains the files below. lib/ lib/libtiledb.dylib lib/libtiledbjni.dylib

Has something changed in the later versions the excludes the native libraries when compiling TileDB-VCF/apis/java into a jar file? or perhaps there is a step or parameter I am missing?

Thanks - Lynn

lynnjo commented 5 months ago

Update to the issue above. I've attached the output from the execution of the command (run from the ../TIleDB-VCF/apis/java folder)

./gradlew --info shadowjar

I note the build can't find files it expects in a "resource" folder. Should files have been copied to that folder earlier, or is creating these file part of the make command?

From the file:

> Task :processResources NO-SOURCE
file or directory '/Users/lcj34/git/TileDB-VCF/apis/java/src/main/resources', not found
file or directory '/Users/lcj34/git/TileDB-VCF/dist/lib', not found
file or directory '/Users/lcj34/git/TileDB-VCF/apis/java/build/install/lib', not found
Skipping task ':processResources' as it has no source files and no previous output files.
:processResources (Thread[Execution worker for ':',5,main]) completed. Took 0.0 secs.
:classes (Thread[Execution worker for ':',5,main]) started.

gradlewInfo_shadowjarRun_out.txt

awenocur commented 5 months ago

Thank you for the detailed feedback about making TileDB-VCF work with Java on macOS X86_64.

This is the first time we've encountered feedback about such a configuration in the field, so it's great to know how you're using it. We will look into this matter.

lynnjo commented 5 months ago

Additional information that I hope will help in debuggin this. My colleague successfully created the java-tiledb libraries for linux compiling from the tiledb-VCF/apis/java 0.25.2 version.

When I pulled the code, it was a bit later - the 0.28.0 version

Here are differences we see in the output between Linux and Mac (Intel).

When I run "./gradlew assemble" I get warnings when it is running the Task :makeJNITask. I have attached the output from this command. While it claims the build is successful, running "./gradlew test" results in failures on all tests with the error message:

java.lang.NoClassDefFoundError: Could not initialize class io.tiledb.libvcfnative.LibVCFNative
    at io.tiledb.libvcfnative.VCFReader.<init>(VCFReader.java:71)
    at io.tiledb.libvcfnative.VCFReaderTest.getVFCReader(VCFReaderTest.java:82)
        ...

Comparing the files created after a linux build vs a MAC (intel chip) build, I see these differences:

folder ~/TileDB-VCF/apis/java/build/libs Linux: 3 files: tiledb-vcf-java-0.25.2-javadoc.jar, tiledb-vcf-java-0.25.2-sources.jar, tiledb-vcf-java-0.25.2.jar MAC: the same (but version 0.28.0)

folder ~/TileDB-VCF/apis/java/build/resources/main/lib/ Linux: 5 files: libhts.so.1.15.1, libspdlog.a, ibtiledb.so.2.16, libtiledbvcf.so, libtiledbvcfjni.so MAC: 2 filse: libtiledbvcf.dylib and libtiledbvcfjni.dylib

folder ~/TileDB-VCF/dist/lib Linux: 4 files: libhts.so.1.15.1, libspdlog.a, libtiledb.so.2.16, libtiledbvcf.so MAC: 1 file: libtiledbvcf.dylib

I included ~/TileDB-VCF/dist/lib because the process resources task in build.gradle copies from there.

Because LInux and MAC have different library requirements, I don't know if what shows up in MAC is correct or if something is missing.

My gradle.properties file has these entries:

FORCE_EXTERNAL_HTSLIB=ON
FORCE_EXTERNAL_TILEDB=OFF
DOWNLOAD_TILEDB_PREBUILT=ON
CMAKE_BUILD_TYPE=Release

assembleOutput_fromMAC_intel.txt

Thanks for your time - Lynn

lynnjo commented 5 months ago

Still having no luck here. I did get "./gradlew assemble" to work without error, but running "./gradlew test" continues to throw an exception that it can't find the necessary libraries.

Question: Has anyone on your development team successfully compiled and tested the java jar on a MAC or do you not expect to support this platform?

I've attached the ./gradlew test output which was run after successful execution of gradleTest_output.txt ./gradle assemble

lynnjo commented 5 months ago

UPDATE - I finally got this to work for Intel MAC Intel!!

What I ended up doing was updating the 2 jar files to include the *.dylib files at a level above lib via these commands:

NOTE: in our repository, we have copied the jar files to phg_v2/repo. So those are the jar files I am updating with these commands.

  1. git clone TIleDB-VCF
  2. cd apis/java
  3. use brew to load various MAC things that were missing, e.g. automake
  4. run ./gradlew assemble
  5. cd to the TIleDB-VCF/dist folder (the parent folder of lib/libtiledbvcf.dylib
  6. run: jar uf /Users/lcj34/git/phg_v2/repo/tiledb-vcf-java-0.28.0.jar -C lib libtiledbvcf.dylib.

THen:

  1. git clone TIleDB-Java
  2. run ./gradlew assemble
  3. cd to ~/git/TileDB-VCF/libtiledbvcf/build/externals/install
  4. run: jar uf /Users/lcj34/git/phg_v2/repo/tiledb-java-0.19.6-SNAPSHOT.jar -C lib libtiledb.dylib

If you run "jar tf ..." on the jar, you will see that the .dylib files now appear in 2 places: in the original lib/.dylib and one folder above that.

However, when I run our junit test now that calles the TIleDB VCFReader from the java API, it is found and our tests pass.

Sorry for all the postings, I will tackle ARM MACs next week.

lynnjo commented 5 months ago

Closing as I found a fix (see final comment above)

lynnjo commented 5 months ago

Closing this was a bit premature. Turns out the above did not fix this. Cannot reproduce this working.

Hoping someone from the tiledb team has insights that can help.

gspowley commented 5 months ago

Hi @lynnjo, so far the Java/Spark users we work with are using Linux, so we have not focused on building and testing a TileDB-VCF jar file on macos.

It sounds like the issue is related to building the libtiledbvcf shared library and packaging it in the jar file. We do run Java/Spark tests on macos in CI, maybe the CI yaml file will be helpful.

lynnjo commented 5 months ago

@gspowley Thanks for the response. What the yaml file shows me is that when running the "./gradle assemble" command on my MAC, the libtiledb.dylib file does not get written to the ~/TileDB-VCF/dist/lib folder. I test this by trying to run this line from the yaml file:

./dist/bin/tiledbvcf version

If I copy libtiledb.dylib from ~/TileDB-Java/build/install/lib/libtiledb.dylib to ~/TileDB-VCF/dist/lib/libtiledb.dylib, then run the command "./dist/bin/tiledbvcf version" works (this is line 55 in the yaml file you link above).

It seems to come down to how the native libaries are packaged. The code doesn't seem to be able to find the native libraries in the jars, though they are there.

If I toss both libtiledb.dylib and libtiledb.dylib into my java folder where other native libraries live, all is fine. This works for testing the libraries but is not a solution for a customer build. I need to determine how to get them visible from the shadow jars.

Finally, I would note you now have a tiledb project of users who use both Linux and Mac, so perhaps this priority could be raised? :-) We really like tiledb for the efficiency of processing VCF data and hope to continue using it.

lynnjo commented 5 months ago

One final comment. I do have it working now for MAC. Turns out some of the difficulty was with Intellij gradle not updating correctly to the new jar files. (Clicking the gradle icon when you change the build.gradle.kts file did not correctly update to pull in these files)

Nothing special needed. TIleDB-VCF:

lynnjo commented 5 months ago

Closed