ggerganov / llama.cpp

LLM inference in C/C++
MIT License
64.6k stars 9.25k forks source link

Bug: LLAMAFILE = 0 in `llama_print_system_info` even though compiled with `-DGGML_LLAMAFILE=ON` and can see set during compilation #8656

Closed tc-wolf closed 1 month ago

tc-wolf commented 1 month ago

What happened?

In llama_print_system_info in llama.cpp, the output is LLAMAFILE = 0, even though can see that it's enabled (from the output Using llamafile when building).

$ cmake -B build -DGGML_LLAMAFILE=ON
-- Accelerate framework found
-- Metal framework found
-- Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES)
-- Could NOT find OpenMP_CXX (missing: OpenMP_CXX_FLAGS OpenMP_CXX_LIB_NAMES)
-- Could NOT find OpenMP (missing: OpenMP_C_FOUND OpenMP_CXX_FOUND)
CMake Warning at ggml/src/CMakeLists.txt:151 (message):
  OpenMP not found

-- BLAS found, Libraries: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.5.sdk/System/Library/Frameworks/Accelerate.framework
-- BLAS found, Includes:
-- Using llamafile
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: arm64
-- ARM detected
-- Configuring done (0.7s)
-- Generating done (1.6s)
-- Build files have been written to: <path redacted>/llama.cpp/build

$ cmake --build build --config Release -j 12 --target llama-server llama-cli
$ ./build/bin/llama-server
INFO [                    main] build info | tid="0x1f3be4c00" timestamp=1721763495 build=3448 commit="b841d074"
INFO [                    main] system info | tid="0x1f3be4c00" timestamp=1721763495 n_threads=8 n_threads_batch=-1 total_threads=12 system_info="AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 1 | SVE = 0 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 0 | "

When building with -DGGML_LLAMAFILE=ON (the default), the C #define for GGML_USE_LLAMAFILE is set in ggml/src/CMakeLists.txt, but llama.cpp (which contains the llama_print_system_info never defines that to be true.

Could perhaps fix by duplicating the logic add_compile_definitions(GGML_USE_LLAMAFILE) in ggml/src/CMakeLists.txt to CMakeLists.txt? I'm not sure what best practices are for cmake.

Low priority since this is still set / used by ggml code but definitely surprising/incorrect in the debug info.

Name and Version

./build/bin/llama-cli --version
version: 3448 (b841d074)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.5.0

What operating system are you seeing the problem on?

Tested on aarch64 linux + Mac.

tc-wolf commented 1 month ago

A workaround is to compile with make instead of using cmake, but downstream projects like llama-cpp-python have that as part of the build system.