Open ThiloteE opened 3 months ago
If implemented, let users choose backend and hardware (CPU vs GPU / GPU1 or GPU2 or GPU3 ...) choose in preferences.
Currently in langchain4j
in-process embedding models (meaning they run locally on a computer) are run only on CPU. There is an issue to run embedding models on GPU, but it's not resolved.
In order to implement this we have these choices:
langchain4j
: simpler to develop, better from architectural point of view.langchain4j
: good.It's a very good idea, we should look into it, but probably a bit later, when we finally release AI chat and, maybe, add summarization.
I'll mark the issue as low-priority, but it's only low priority for this context: week 1 and first release
Actually, no, I'll remove low-priority
, and won't assign a milestone
I collect it at the final "anything else" Milestone "final polishing" 😅
GPU support (for embedding models) with llama.cpp:
GPU support with Deep Java library: https://docs.djl.ai/engines/onnxruntime/onnxruntime-engine/index.html#install-gpu-package. Unfortunately they also use Microsofts ONNX, which seems to be very slow. I assume models need to be compatible with ONNX too, because not many models are uploaded on Huggingface in ONNX file format!
At least, one can paint everything blue in the CPU utilization
One solution to providing support for GPU acceleration for LLMs (NOT necessarily embedding models!) is to provide proper support for OpenAI API. See issue https://github.com/JabRef/jabref/issues/11872. Using external applications like llama.cpp, GPT4All, LMStudio, Ollama, Jan, KobolCPP etc. that already provide support for GPU acceleration, there is no need to add and maintain this feature in JabRef. It would still be nice to have GPU acceleration for embedding models though. Maybe do it like Koboldcpp and only provide a Vulkan backend, which is much much smaller than a Cuda backend (\~1.5 GB in pytorch; 200 - 500 MB in llama.cpp).
Historical "what the fuck" is available at https://github.com/JabRef/jabref/pull/11430#issuecomment-2209278098
Advantages:
Disadvantages: