deepjavalibrary / djl

An Engine-Agnostic Deep Learning Framework in Java
https://djl.ai
Apache License 2.0
4.16k stars 661 forks source link

Add intfloat/multilingual-e5-large-instruct to model zoo #3486

Closed david-sitsky closed 2 weeks ago

david-sitsky commented 1 month ago

Description

@frankfliu asked I create an issue to track this which was originally reported by me on Slack: https://deepjavalibrary.slack.com/archives/C01AURG857U/p1727308663498229.

Can we please add intfloat/multilingual-e5-large-instruct to the model zoo? It is considered the best E5 model in terms of MTEB scoring and would be highly valuable. Thank you.

frankfliu commented 1 month ago

@xyang16

david-sitsky commented 1 month ago

@xyang16 - any ideas when intfloat/multilingual-e5-large-instruct will be added to the model zoo? I am pretty keen to try this out..

xyang16 commented 1 month ago

@david-sitsky I just uploaded this model to model zoo.

david-sitsky commented 1 month ago

@xyang16 - many thanks. I just tried accessing it with DJL 0.30.0, using either OnnxRuntime or PyTorch, but it still seems to fail. Any ideas?

Caused by: java.lang.IllegalArgumentException: Invalid djl URL: djl://ai.djl.huggingface.onnxruntime/intfloat/multilingual-e5-large-instruct
    at ai.djl.repository.RepositoryFactoryImpl$DjlRepositoryFactory.newInstance(RepositoryFactoryImpl.java:262)
    at ai.djl.repository.RepositoryFactoryImpl.newInstance(RepositoryFactoryImpl.java:64)
    at ai.djl.repository.Repository.newInstance(Repository.java:90)
    at ai.djl.repository.zoo.DefaultModelZoo.parseLocation(DefaultModelZoo.java:66)
    at ai.djl.repository.zoo.DefaultModelZoo.<init>(DefaultModelZoo.java:47)
    at ai.djl.repository.zoo.Criteria$Builder.optModelUrls(Criteria.java:577)

and

Caused by: java.lang.IllegalArgumentException: Invalid djl URL: djl://ai.djl.huggingface.pytorch/intfloat/multilingual-e5-large-instruct
    at ai.djl.repository.RepositoryFactoryImpl$DjlRepositoryFactory.newInstance(RepositoryFactoryImpl.java:262)
    at ai.djl.repository.RepositoryFactoryImpl.newInstance(RepositoryFactoryImpl.java:64)
    at ai.djl.repository.Repository.newInstance(Repository.java:90)
    at ai.djl.repository.zoo.DefaultModelZoo.parseLocation(DefaultModelZoo.java:66)
    at ai.djl.repository.zoo.DefaultModelZoo.<init>(DefaultModelZoo.java:47)
david-sitsky commented 1 month ago

My apologies.. I forgot to delete the ~/.djl.ai/cache directory.

david-sitsky commented 1 month ago

@xyang16 - despite zapping the cache dir, I still see the same errors. Also when I download https://mlrepo.djl.ai/model/nlp/text_embedding/ai/djl/huggingface/pytorch/models.json.gz, it seems to be missing intfloat/multilingual-e5-large-instruct.

Am I missing something or has this not been done yet?

xyang16 commented 1 month ago

@david-sitsky Sorry. I have uploaded again. Please try again.

david-sitsky commented 1 month ago

@xyang16 - many thanks for that. It works for PyTorch but not for OnnxRuntime. The HF repo has the ONNX model too - can that be fixed? This works for existing E5 models.

Caused by: java.lang.IllegalArgumentException: Invalid djl URL: djl://ai.djl.huggingface.onnxruntime/intfloat/multilingual-e5-large-instruct
    at ai.djl.repository.RepositoryFactoryImpl$DjlRepositoryFactory.newInstance(RepositoryFactoryImpl.java:262)
    at ai.djl.repository.RepositoryFactoryImpl.newInstance(RepositoryFactoryImpl.java:64)
    at ai.djl.repository.Repository.newInstance(Repository.java:90)
    at ai.djl.repository.zoo.DefaultModelZoo.parseLocation(DefaultModelZoo.java:66)
    at ai.djl.repository.zoo.DefaultModelZoo.<init>(DefaultModelZoo.java:47)
david-sitsky commented 3 weeks ago

@xyang16 , @frankfliu - can we please also upload the associated .onnx mode files like we have done for all the other E5 models please? While this works for PyTorch, it fails with OnnxRuntime. Many thanks in advance..

Caused by: java.lang.IllegalArgumentException: Invalid djl URL: djl://ai.djl.huggingface.onnxruntime/intfloat/multilingual-e5-large-instruct
    at ai.djl.repository.RepositoryFactoryImpl$DjlRepositoryFactory.newInstance(RepositoryFactoryImpl.java:262)
    at ai.djl.repository.RepositoryFactoryImpl.newInstance(RepositoryFactoryImpl.java:64)
    at ai.djl.repository.Repository.newInstance(Repository.java:90)
    at ai.djl.repository.zoo.DefaultModelZoo.parseLocation(DefaultModelZoo.java:66)
    at ai.djl.repository.zoo.DefaultModelZoo.<init>(DefaultModelZoo.java:47)
    at ai.djl.repository.zoo.Criteria$Builder.optModelUrls(Criteria.java:577)
xyang16 commented 3 weeks ago

@david-sitsky I have uploaded the onnx model to model zoo.

david-sitsky commented 3 weeks ago

Many thanks! Sadly it seems the Onnx model for this model has been generated with an unsupported Onnx ML opset 5. Sigh.. not your problem though. I've asked the author if he can update it appropriately.

failed:/onnxruntime_src/onnxruntime/core/graph/model_load_utils.h:46 void onnxruntime::model_load_utils::ValidateOpsetForDomain(const std::unordered_map<std::__cxx11::basic_string<char>, int>&, const onnxruntime::logging::Logger&, bool, const std::string&, int) ONNX Runtime only *guarantees* support for models stamped with official released onnx opset versions. Opset 5 is under development and support for this is limited. The operator schemas and or other functionality may change before next ONNX release and in this case ONNX Runtime will not guarantee backward compatibility. Current official support for domain ai.onnx.ml is till opset 4.
david-sitsky commented 3 weeks ago

@xyang16 , @frankfliu - I think the Onnx model files are wrong in the zoo. This is what is in my cache:

sits@diannao9:~/.djl.ai/cache/repo/model/nlp/text_embedding/ai/djl/huggingface/onnxruntime/intfloat/multilingual-e5-large-instruct/0.0.1/multilingual-e5-large-instruct$ ls -lh
total 1.1G
-rw-rw-r-- 1 sits sits  727 Oct 31 15:16 config.json
-rw-rw-r-- 1 sits sits 406K Oct 31 15:16 model.onnx
-rw-rw-r-- 1 sits sits 1.1G Oct 31 15:16 model.onnx.data
-rw-rw-r-- 1 sits sits 1.2K Oct 31 15:16 ort_config.json
-rw-rw-r-- 1 sits sits 4.9M Oct 31 15:16 sentencepiece.bpe.model
-rw-rw-r-- 1 sits sits  173 Oct 31 15:16 serving.properties
-rw-rw-r-- 1 sits sits  964 Oct 31 15:16 special_tokens_map.json
-rw-rw-r-- 1 sits sits 1.2K Oct 31 15:16 tokenizer_config.json
-rw-rw-r-- 1 sits sits  17M Oct 31 15:16 tokenizer.json

However if you look at the ONNX files from HF, they are very different in size (and filenames): https://huggingface.co/intfloat/multilingual-e5-large-instruct/tree/main/onnx. For example, model.onnx_data is 2.24 GB is size and has a different filename to your model.onnx.data file.

I thought the DJL zoo would be using these ONNX files directly, but if not, is some conversion process involved? If so, that is incorrectly creating ONNX files with "Onnx ML opset 5" as per my previous message with the Onnx error.

Can this please be fixed? Thanks.

xyang16 commented 3 weeks ago

@david-sitsky The onnx model is converted, not directly from model.onnx.data.

I have updated model zoo yesterday, please delete the cache and retry. Let me know if it works.

david-sitsky commented 2 weeks ago

Thanks.. it is working now!