Open mahesh557 opened 2 months ago
It worked after removing the cdll_args in return statement of llama_cpp.py
if _lib_path.exists():
try:
return ctypes.CDLL(str(_lib_path))
#return ctypes.CDLL(str(_lib_path), **cdll_args)
However i still see that CPU is being used for computes and not the GTX 1060 Graphic Card. I am invoking the model as below
llama_model = Llama(model_path=model_path, n_gpu_layers=50)
output = llama_model(question,max_tokens=5000)
llm_load_print_meta: model ftype = Q4_K - Medium
llm_load_print_meta: model params = 2.51 B
llm_load_print_meta: model size = 1.39 GiB (4.75 BPW)
llm_load_print_meta: general.name = gemma-2b-it
llm_load_print_meta: BOS token = 2 '<bos>'
llm_load_print_meta: EOS token = 1 '<eos>'
llm_load_print_meta: UNK token = 3 '<unk>'
llm_load_print_meta: PAD token = 0 '<pad>'
llm_load_print_meta: LF token = 227 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.06 MiB
llm_load_tensors: offloading 18 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 19/19 layers to GPU
llm_load_tensors: CPU buffer size = 1420.21 MiB
............................................................
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
WARNING: failed to allocate 9.00 MB of pinned memory: CUDA driver version is insufficient for CUDA runtime version
llama_kv_cache_init: CPU KV buffer size = 9.00 MiB
llama_new_context_with_model: KV self size = 9.00 MiB, K (f16): 4.50 MiB, V (f16): 4.50 MiB
WARNING: failed to allocate 6.01 MB of pinned memory: CUDA driver version is insufficient for CUDA runtime version
llama_new_context_with_model: CPU input buffer size = 6.01 MiB
WARNING: failed to allocate 504.25 MB of pinned memory: CUDA driver version is insufficient for CUDA runtime version
llama_new_context_with_model: CUDA_Host compute buffer size = 504.25 MiB
llama_new_context_with_model: graph splits (measure): 1
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 |
Model metadata: {'general.name': 'gemma-2b-it', 'general.architecture': 'gemma', 'gemma.context_length': '8192', 'gemma.block_count': '18', 'gemma.attention.head_count_kv': '1', 'gemma.embeddin
g_length': '2048', 'gemma.feed_forward_length': '16384', 'gemma.attention.head_count': '8', 'gemma.attention.key_length': '256', 'gemma.attention.value_length': '256', 'gemma.attention.layer_no
rm_rms_epsilon': '0.000001', 'tokenizer.ggml.model': 'llama', 'general.quantization_version': '2', 'tokenizer.ggml.bos_token_id': '2', 'general.file_type': '15', 'tokenizer.ggml.eos_token_id': '1', 'tokenizer.ggml.padding_token_id': '0', 'tokenizer.ggml.unknown_token_id': '3'}
Using fallback chat format: None
I have the same issue, the solution of removing arguments as suggested by @mahesh557 did not help me.
I had the same issue, but it was not a bug in my case; I simply had not yet installed the cuda toolkit.
First, uninstall llama-cpp-python and install the cuda toolkit from https://developer.nvidia.com/cuda-toolkit.
You should find CUDA_PATH in the command prompt restarted.
echo %CUDA_PATH%
If CUDA_PATH isn't registered correctly with os.add_dll_directory()
,
CDLL()
may refuse to load the dependencies of llama.dll.
Then, install the binary that supports your cuda and python version in the releases section. https://github.com/abetlen/llama-cpp-python/releases
For example, install v0.2.69 in python 3.11 and cuda 12.2 environment:
pip install https://github.com/abetlen/llama-cpp-python/releases/download/v0.2.69-cu122/llama_cpp_python-0.2.69-cp311-cp311-win_amd64.whl
Hi,
I am running llama-cpp-python on surface book 2 having i7 with nvidea geforce gtx 1060. I installed vc++, cuda drivers 12.4 Running on Python 3.11.3 Compiled llama using below command on MinGW bash console
It ran successfully and yielded llama.dll However, when i try to load, it is throwing error
I tried fixing it as per other suggestions by modifying llama_cpp.py where the error was throwing as below but didn't work
I did set environment variables as well, still couldn't work. Can you please help how can i fix it ?