ggerganov / llama.cpp

LLM inference in C/C++
MIT License
61.35k stars 8.77k forks source link

Make -DLLAMA_HIP_UMA a dynamic setting. #7145

Open sebastian-philipp opened 2 months ago

sebastian-philipp commented 2 months ago

Feature Description

Please provide a detailed written description of what you were trying to do, and what you expected llama.cpp to do as an enhancement.

Ollama uses a compiled version of llama.cpp . Letting end-uses re-compile ollama and llama.cpp in order to enable the usage of integrated GPUs is problematic. I would enjoy LLAMA_HIP_UMA to be a dynamic setting that can be enabled regardless of the compile time flags.

I think right now there are three ways to get iGPUs working in ollama:

  1. re-compile llama.cpp
  2. use something like https://github.com/segurac/force-host-alloction-APU to force the use of hipHostMalloc circumventing the already built-in feature.
  3. change the Dedicated video memory in the BIOS.

See also https://github.com/ollama/ollama/issues/2637

Possible Implementation

std::getenv("LLAMA_HIP_UMA")

?

Djip007 commented 2 weeks ago

On llamafile https://github.com/Mozilla-Ocho/llamafile/pull/473 Made it as a fallback if not enough VRAM (when hipalloc failed.)