Support llama.cpp - Githubissues

Unfortunately, llama.cpp currently only supports fast inference for generative models and does not yet support embedding models, which means that VisRAG-Ret cannot use llama.cpp. As for VisRAG-Gen, the models we used in the paper—MiniCPM-V-2, MiniCPM-Llama3-V-2_5, and MiniCPM-V-2_6—are also not supported by llama.cpp. However, we’re pleased to share that the newly released MiniCPM3-4B does support llama.cpp, and we welcome you to try it out!

OpenBMB / VisRAG

Support llama.cpp #9