Open enjalot opened 1 year ago
I would recommend taking a look at https://www.sbert.net/ . To my best knowledge the OpenAI models are not outstanding at all for embeddings (https://huggingface.co/spaces/mteb/leaderboard) but it is convenience to use the API of them - at least for us.
If it helps, I have successfully used: sentence-transformers/all-mpnet-base-v2
as an alternative to the OpenAI text-embedding-ada-002
Hello, i am able to extract the embeddings from the model.
from transformers import AutoModelForCausalLM, AutoConfig,AutoTokenizer
checkpoint = "path/to/the/model"
config = AutoConfig.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_config(config)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
inputs = tokenizer.encode('Stability AI democratised AI by open sourcing large models', return_tensors="pt")
outputs = model(inputs)
hidden_states = output[1]
now hidden states has output of all the layers. You can use the output of last layer.
Since i am a newbie to huggingface, there might be better ways to do this. Please share if you find something better.
These models are multi-lingual?
If you are looking for another convenient API might consider embaas. They offer a similiar structure to openai and you can use the MTEB leaderboard top members. They have some mutlilingual models as well and integrate wiht langchain or have an easy to use python client
Is it possible to get embeddings from the model for my input text?
I.e. could I replace GTP3 calls from OpenAI with some python code and this model?