Anush008 / fastembed-rs

Library for generating vector embeddings, reranking in Rust
https://docs.rs/fastembed
Apache License 2.0
264 stars 36 forks source link

add Multilingual E5 models #48

Closed kounoike closed 6 months ago

kounoike commented 6 months ago

This PR adds intfloat/multilingual-e5-base, -small models.

these models have no "token_type_ids" inputs, so I introduce check logic.

image

curiously, intfloat/multilingual-e5-large onnx model is only 546kB(small is 470MB, base is 1.11GB). and it can't run inference. so, I commented out for large model definition.

Anush008 commented 6 months ago

Hi Kounoike, thanks for the contribution. Looks great overall. Could you please add the models to the README and consider https://github.com/Anush008/fastembed-rs/pull/48#discussion_r1555553214?

kounoike commented 6 months ago

I've reflected the modification suggested in the comment, AI review is awesome! and I've also modified the README.

kounoike commented 6 months ago

and thank you for quick review!

Anush008 commented 6 months ago

Awesome!!

Barney241 commented 6 months ago

@kounoike @Anush008 i have working fork for e5_large i will resolve conflicts of this pr and can create pr to support it.

Anush008 commented 6 months ago

@Barney241 Awesome. Thank you.