Stability-AI / StableLM

StableLM: Stability AI Language Models
Apache License 2.0
15.84k stars 1.04k forks source link

Embeddings with StableLM? #20

Open enjalot opened 1 year ago

enjalot commented 1 year ago

Is it possible to get embeddings from the model for my input text?

I.e. could I replace GTP3 calls from OpenAI with some python code and this model?

sirwalt commented 1 year ago

I would recommend taking a look at https://www.sbert.net/ . To my best knowledge the OpenAI models are not outstanding at all for embeddings (https://huggingface.co/spaces/mteb/leaderboard) but it is convenience to use the API of them - at least for us.

lingster commented 1 year ago

If it helps, I have successfully used: sentence-transformers/all-mpnet-base-v2 as an alternative to the OpenAI text-embedding-ada-002

sandyflute commented 1 year ago

Hello, i am able to extract the embeddings from the model. from transformers import AutoModelForCausalLM, AutoConfig,AutoTokenizer

checkpoint = "path/to/the/model"

config = AutoConfig.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_config(config) tokenizer = AutoTokenizer.from_pretrained(checkpoint)

inputs = tokenizer.encode('Stability AI democratised AI by open sourcing large models', return_tensors="pt") outputs = model(inputs) hidden_states = output[1]

now hidden states has output of all the layers. You can use the output of last layer.

Since i am a newbie to huggingface, there might be better ways to do this. Please share if you find something better.

wajihullahbaig commented 1 year ago

These models are multi-lingual?

juliuslipp commented 1 year ago

If you are looking for another convenient API might consider embaas. They offer a similiar structure to openai and you can use the MTEB leaderboard top members. They have some mutlilingual models as well and integrate wiht langchain or have an easy to use python client