Closed robbiemu closed 3 days ago
this was my bad, I went and looked at the llama.cpp llama.h file for that function and .. my understanding was lacking, the fp16 or whatever output weights are still processed in an fp32 context so it is fp32 logits in the end, and the function is properly bound.
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
Please provide a detailed written description of what you were trying to do, and what you expected
llama-cpp-python
to do.with logits_all = True, the scores shape should be (supplied_ctx, n_vocab) and the dtype should reflect the actual dtype.
Current Behavior
llama_cpp.llama_get_logits() returns an array of c_type.c_float, which is always fp32. Almost no GGUF model you can find is actually keeping output weights in fp32. For unquantized models, it is typically fp16 or bf16. For k-type quantized models with the "L" (large) type, the output weights are fp8. Otherwise (k_m, k_s, etc) they may be smaller still. It is understandable that a np.ctypeslib.as_array() can't meaningfully (or at least easily) go below fp16, but it should at least go that far.
For my model, with 8192 context and 256000 vocabulary, that equates to a 4GB error (in terms of wasted space) per chunk.
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
system_profiler SPHardwareDataType ... Model Name: MacBook Pro Model Identifier: Mac15,9 Model Number: Z1CM0013LLL/A Chip: Apple M3 Max Total Number of Cores: 16 (12 performance and 4 efficiency) Memory: 48 GB OS Loader Version: 11881.41.5 ...
$ uname -a
uname -a Darwin xiao-mbp 24.1.0 Darwin Kernel Version 24.1.0: Thu Oct 10 21:05:23 PDT 2024; root:xnu-11215.41.3~2/RELEASE_ARM64_T6031 arm64
python3 --version Python 3.12.0
I have the llama.cpp installed from homebrew, I had no compile step: llama-server --version version: 3912 (edc26566) built with Apple clang version 16.0.0 (clang-1600.0.26.3) for arm64-apple-darwin24.0.0
Failure Information (for bugs)
Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
[llama_gguf_optmize v0.5.2] 17:49:28 - DEBUG - Logits shape (8192, 256000) dtype float32
that output with the bf16 uncompressed gguf of a model with this config: