Open ParisNeo opened 1 year ago
Hi,
I am a little bit confused, maybe you are mixing terms! Do you want to get the embeddings of the text? or you want the latent space of the encoder output ?
The latent space of the encoder output.
I don't think the encoder output is exposed by llama.cpp
API header file AFAIK. But let me know if you have an idea.
But why you would use llama.cpp
for this task ? why not just use the original LLaMA Python code in this case. You can get the output of any transform block you want ?
I was thinking to build an animation that shows the model encoder output moving inside a 2d or 3d projection of the latent space with a background of text chunks data distribution of the document that is used by chat_with_document personality. This allows me to see how the model is exploring the ideas while generating its outputs compared to the reference texts. So I kinda wanted to use the model being used at the instant and not need to reload another pytorch model that is not exactly the same and is not quantized. It eats memory space. I figured, it would be better to have every thing done by the same model.
Maybe I'll ask llamacpp for this feature.
Thank you anyway.
I thought about it and maybe just expose the embed function of llamacpp would already be useful for me.
I was thinking to build an animation that shows the model encoder output moving inside a 2d or 3d projection of the latent space with a background of text chunks data distribution of the document that is used by chat_with_document personality. This allows me to see how the model is exploring the ideas while generating its outputs compared to the reference texts. So I kinda wanted to use the model being used at the instant and not need to reload another pytorch model that is not exactly the same and is not quantized. It eats memory space. I figured, it would be better to have every thing done by the same model.
Maybe I'll ask llamacpp for this feature.
Thank you anyway.
Yeah, I understand. Nice idea.
You can ask llama.cpp
and if you get any solution I'll be more than happy to integrate it in the bindings.
I thought about it and maybe just expose the embed function of llamacpp would already be useful for me.
I think this is already exposed, Are ou talking about llama_get_embedding, isn't-it ?
I need to give it text and it gives me the embeddings for the input text. Can you expose that in the model?
Ok, I will try to expose it in the model class. But I have just noticed that this method does not take the text as input, but just the context. Take a look. So it is not the same as what you are looking for.
In llama-cpp-python binding, they have embed function in their model: https://abetlen.github.io/llama-cpp-python/
Also the ctransformers binding have embed method: https://github.com/marella/ctransformers
I think they use the llamacpp in background.
yeah, if you want it just like what they did then I can add it. they are using the same function under hood. The problem I was thinking about is just It is an overkill to eval the whole transformer to get just the first block.
If you are using the generate
function on a prompt, then eval will be called and the embedding vector will change, without rerunning eval again.
Anyways I think I will add the two functions, one to get the last embedding and one to create them based on a string as input ?
Excellent!
@ParisNeo, Here you go
get_prompt_embeddings
is what you are looking for.
if you don't have any other request I will push a new version to PYPI ?
Thanks. No requests for now. I'll update my binding as soon as you push it to pypi.
Thanks alot
You are welcome. The new version has been pushed to Pypi.
Hi there. I am upgrading my bindings for the lord of llms tool and I now need to be able to vectorize text to embedding space of the current model. Is there a way to have access to the latent space of the model ? I input a text and get the encoder output in latent space?
Best regards