abdeladim-s / pyllamacpp

Python bindings for llama.cpp
https://abdeladim-s.github.io/pyllamacpp/
MIT License
62 stars 21 forks source link

Embeddings #19

Open ParisNeo opened 1 year ago

ParisNeo commented 1 year ago

Hi there. I am upgrading my bindings for the lord of llms tool and I now need to be able to vectorize text to embedding space of the current model. Is there a way to have access to the latent space of the model ? I input a text and get the encoder output in latent space?

Best regards

abdeladim-s commented 1 year ago

Hi,

I am a little bit confused, maybe you are mixing terms! Do you want to get the embeddings of the text? or you want the latent space of the encoder output ?

ParisNeo commented 1 year ago

The latent space of the encoder output.

abdeladim-s commented 1 year ago

I don't think the encoder output is exposed by llama.cpp API header file AFAIK. But let me know if you have an idea.

But why you would use llama.cpp for this task ? why not just use the original LLaMA Python code in this case. You can get the output of any transform block you want ?

ParisNeo commented 1 year ago

I was thinking to build an animation that shows the model encoder output moving inside a 2d or 3d projection of the latent space with a background of text chunks data distribution of the document that is used by chat_with_document personality. This allows me to see how the model is exploring the ideas while generating its outputs compared to the reference texts. So I kinda wanted to use the model being used at the instant and not need to reload another pytorch model that is not exactly the same and is not quantized. It eats memory space. I figured, it would be better to have every thing done by the same model.

Maybe I'll ask llamacpp for this feature.

Thank you anyway.

ParisNeo commented 1 year ago

I thought about it and maybe just expose the embed function of llamacpp would already be useful for me.

abdeladim-s commented 1 year ago

I was thinking to build an animation that shows the model encoder output moving inside a 2d or 3d projection of the latent space with a background of text chunks data distribution of the document that is used by chat_with_document personality. This allows me to see how the model is exploring the ideas while generating its outputs compared to the reference texts. So I kinda wanted to use the model being used at the instant and not need to reload another pytorch model that is not exactly the same and is not quantized. It eats memory space. I figured, it would be better to have every thing done by the same model.

Maybe I'll ask llamacpp for this feature.

Thank you anyway.

Yeah, I understand. Nice idea. You can ask llama.cpp and if you get any solution I'll be more than happy to integrate it in the bindings.

abdeladim-s commented 1 year ago

I thought about it and maybe just expose the embed function of llamacpp would already be useful for me.

I think this is already exposed, Are ou talking about llama_get_embedding, isn't-it ?

ParisNeo commented 1 year ago

I need to give it text and it gives me the embeddings for the input text. Can you expose that in the model?

abdeladim-s commented 1 year ago

Ok, I will try to expose it in the model class. But I have just noticed that this method does not take the text as input, but just the context. Take a look. So it is not the same as what you are looking for.

ParisNeo commented 1 year ago

In llama-cpp-python binding, they have embed function in their model: https://abetlen.github.io/llama-cpp-python/

Also the ctransformers binding have embed method: https://github.com/marella/ctransformers

I think they use the llamacpp in background.

abdeladim-s commented 1 year ago

yeah, if you want it just like what they did then I can add it. they are using the same function under hood. The problem I was thinking about is just It is an overkill to eval the whole transformer to get just the first block.

If you are using the generate function on a prompt, then eval will be called and the embedding vector will change, without rerunning eval again.

Anyways I think I will add the two functions, one to get the last embedding and one to create them based on a string as input ?

ParisNeo commented 1 year ago

Excellent!

abdeladim-s commented 1 year ago

@ParisNeo, Here you go

get_prompt_embeddings is what you are looking for.

if you don't have any other request I will push a new version to PYPI ?

ParisNeo commented 1 year ago

Thanks. No requests for now. I'll update my binding as soon as you push it to pypi.

Thanks alot

abdeladim-s commented 1 year ago

You are welcome. The new version has been pushed to Pypi.