SciSharp / LLamaSharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
https://scisharp.github.io/LLamaSharp
MIT License
2.65k stars 345 forks source link

[llava] How to clear imagepaths ? #643

Open dchatel opened 7 months ago

dchatel commented 7 months ago

How to clear the embedded images from the context ? Apparently, calling executor.ImagePaths.Clear() has no effect on the private variable _embeds

AsakusaRinne commented 7 months ago

FYI @SignalRT

SignalRT commented 7 months ago

@dchatel, In my knowledge there is no LlavaAPI to reset images from the context (I could be wrong).

Right now the option that I can think about is to create a new context / executor when the user prompts new images. Here you have the example modified:

https://github.com/SignalRT/LLamaSharp/blob/LlavaResetIContext/LLama.Examples/Examples/LlavaInteractiveModeExecute.cs

@AsakusaRinne , If this is OK to you I can make a PR with this change.

AsakusaRinne commented 7 months ago

@SignalRT According to the code, LLavaExecutor clear the image embeds when it has finished to generate the output. Is it possible to fix this issue based on this? (Please correct me if I'm wrong)

SignalRT commented 7 months ago

I will review if there is a better way to manage this.

dchatel commented 7 months ago

I will review if there is a better way to manage this.

Maybe a good way would be to clear the image embeds when ImagePaths is cleared?... Or reflect ImagePaths state into image embeds?

SignalRT commented 7 months ago

ImagePath it's clearing right now automatically at the end of InferInternal:

                    _EmbedImagePosition = -1;
                    _imageEmbedHandles.Clear();
                    ImagePaths.Clear();

I'm trying some options searching for "continuity" adding or changing images during conversation.

I get something way better, but I need to solve some cases before to make a PR.

IntptrMax commented 7 months ago

Will llama_kv_cache_seq_rm can help? When embding image/prompt, we can get n_past, will it can be recorded as the positison. (I could be wrong).

`// Removes all tokens that belong to the specified sequence and have positions in [p0, p1) // seq_id < 0 : match any sequence // p0 < 0 : [0, p1] // p1 < 0 : [p0, inf)

LLAMA_API bool llama_kv_cache_seq_rm( struct llama_context * ctx, llama_seq_id seq_id, llama_pos p0, llama_pos p1);`

SignalRT commented 7 months ago

@IntptrMax, it’s what I’m working on, I also introduce the capability to add images in the middle of conversation that’s something that is not possible with master code. I expect to have the complete solution this week.

SignalRT commented 6 months ago

@dchatel , @IntptrMax In this PR #664 I introduce the idea to clear the images and to change the images during conversation.