Open zbendhiba opened 4 months ago
I wrote some tests but am currently blocked on a native issue related to JNI usage:
Caused by: java.lang.UnsatisfiedLinkError: ai.djl.huggingface.tokenizers.jni.TokenizersLibrary.encode(JLjava/lang/String;Z)J [symbol: Java_ai_djl_huggingface_tokenizers_jni_TokenizersLibrary_encode or Java_ai_djl_huggingface_tokenizers_jni_TokenizersLibrary_encode__JLjava_lang_String_2Z]
at org.graalvm.nativeimage.builder/com.oracle.svm.core.jni.access.JNINativeLinkage.getOrFindEntryPoint(JNINativeLinkage.java:152)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.jni.JNIGeneratedMethodSupport.nativeCallAddress(JNIGeneratedMethodSupport.java:54)
at ai.djl.huggingface.tokenizers.jni.TokenizersLibrary.encode(Native Method)
at ai.djl.huggingface.tokenizers.HuggingFaceTokenizer.encode(HuggingFaceTokenizer.java:213)
at ai.djl.huggingface.tokenizers.HuggingFaceTokenizer.encode(HuggingFaceTokenizer.java:224)
at ai.djl.huggingface.tokenizers.HuggingFaceTokenizer.tokenize(HuggingFaceTokenizer.java:183)
at dev.langchain4j.model.embedding.OnnxBertBiEncoder.embed(OnnxBertBiEncoder.java:59)
at dev.langchain4j.model.embedding.AbstractInProcessEmbeddingModel.embedAll(AbstractInProcessEmbeddingModel.java:50)
at dev.langchain4j.model.embedding.EmbeddingModel.embed(EmbeddingModel.java:34)
at org.apache.camel.component.langchain4j.embeddings.LangChain4jEmbeddingsProducer.process(LangChain4jEmbeddingsProducer.java:41)
Seems there were some similar(ish) problems in quarkiverse-langchain4j. They actually have embeddings native tests disabled.
@jamesnetherton Do you know from where this Tokenizer is pulled ? Is it coming directly from our camel component?
Do you know from where this Tokenizer is pulled
Basically whenever you use any langchain4j-embeddings-*
.
I got a bit further by using quarkus-langchain4j-parsers-base
. But am now stuck with a runtime segfault, similar to what the Quarkus Lanchain4j folks also encounter.
Starting the stack walk in a possible caller:
A SP 0x000000016fcf9900 IP 0x000000010051c6b4 size=240 ai.onnxruntime.OrtSession.run(Native Method)
A SP 0x000000016fcf99f0 IP 0x000000010051ad08 size=224 ai.onnxruntime.OrtSession.run(OrtSession.java:395)
i SP 0x000000016fcf9ad0 IP 0x00000001010549e8 size=128 ai.onnxruntime.OrtSession.run(OrtSession.java:242)
i SP 0x000000016fcf9ad0 IP 0x00000001010549e8 size=128 ai.onnxruntime.OrtSession.run(OrtSession.java:210)
A SP 0x000000016fcf9ad0 IP 0x00000001010549e8 size=128 dev.langchain4j.model.embedding.OnnxBertBiEncoder.encode(OnnxBertBiEncoder.java:115)
A SP 0x000000016fcf9b50 IP 0x0000000101053e0c size=112 dev.langchain4j.model.embedding.OnnxBertBiEncoder.embed(OnnxBertBiEncoder.java:64)
A SP 0x000000016fcf9bc0 IP 0x0000000101050540 size=112 dev.langchain4j.model.embedding.AbstractInProcessEmbeddingModel.embedAll(AbstractInProcessEmbeddingModel.java:50)
A SP 0x000000016fcf9c30 IP 0x0000000101052f7c size=96 dev.langchain4j.model.embedding.EmbeddingModel.embed(EmbeddingModel.java:34)
A SP 0x000000016fcf9c90 IP 0x000000010204d0f4 size=80 org.apache.camel.component.langchain4j.embeddings.LangChain4jEmbeddingsProducer.process(LangChain4jEmbeddingsProducer.java:41)
Describe the feature here
Improve the Langchain4J embeddings extension, to provide integration tests and native support