Support for Multimodal models

dranger003 / llama.cpp-dotnet

Minimal C# bindings for llama.cpp + .NET core library with API host/client.

MIT License

64 stars 7 forks source link

Support for Multimodal models #15

Closed AshD closed 5 months ago

AshD commented 5 months ago

I do like the simplicity of this project's bindings to LLama.cpp.

Are there plans to add multimodal model support - like llava and phi-3 vision. I can test the bindings for these.

Thanks, Ash

dranger003 commented 5 months ago

@AshD Thanks, that is only possible if llama.cpp supports it - and currently it doesn't, unfortunately. I find that Phi-3-vision is actually really good, I hope they add support eventually.

AshD commented 5 months ago

Agreed. Someone opened an issue in llama.cpp for Phi-3 vision. https://github.com/ggerganov/llama.cpp/issues/7444

Would it be possible to get support for llava models. I think llama.cpp has support for those.

dranger003 commented 5 months ago

I don't think I currently have time to support that kind of work, but if they add support for phi-3-vision, then I would be quite interested in updating this codebase to support it.

AshD commented 5 months ago

Thanks @dranger003 I like the fact that this project is a thin wrapper around llama.cpp I am closing this issue now and will open a new one when the phi-3 vision support is added to llama.cpp

dranger003 commented 5 months ago

@AshD I created a quick API hosting in Python along with a C# client, if you want to check it out - while we're waiting for llama.cpp to support the model.

https://gist.github.com/dranger003/daff444ebf04951d4279b5b2dee71ab4

AshD commented 5 months ago

Thanks @dranger003 Looks good :-)