koboldcpp having issue with -lcuda path

DarkReaperBoy commented 3 months ago

hello there, here is output when installing

/nix/store/j2y057vz3i19yh4zjsan1s3q256q15rd-binutils-2.41/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
make: *** [Makefile:575: koboldcpp_cublas] Error 1
error: builder for '/nix/store/ilylx30p0i9yc7q52pd3yicikzbn3m21-koboldcpp-libs-1.61.2.drv' failed with exit code 2
error: 1 dependencies of derivation '/nix/store/il77gqs8iqf6rv61bcakazr1y03qjmc6-koboldcpp-1.61.2.drv' failed to build
error: 1 dependencies of derivation '/nix/store/f3794zyy605g7vwrf2fyqpfp7ynx566n-koboldcpp-1.61.2_fish-completions.drv' failed to build
error: 1 dependencies of derivation '/nix/store/4p5nyifj4r124d1lc6z5z0wn5aj33zij-man-paths.drv' failed to build
error: 1 dependencies of derivation '/nix/store/sbhjqs88si7csnpdckrj7karsnz0mqyi-system-path.drv' failed to build
error: 1 dependencies of derivation '/nix/store/wav4p1brs2wr47270j0drrl6ay4wik2r-nixos-system-nixos-24.05pre604424.d8fe5e6c92d0.drv' failed to build

checked: https://nixos.wiki/wiki/CUDA i had to do cuda-fhs.nix way to make the binary working

if you needed help from the maintainers, they offer their help https://discord.gg/kYSbJAhsgF https://discord.com/channels/849937185893384223/849937402050379787/1223647072280907828 (all assuming it's not a me issue)

note: i already have

  # cuda support
  nixpkgs.config.cudaSupport = true;

enabled

AtaraxiaSjel commented 3 months ago

Hello! Thank you for letting me know.

I don't always test all the apps in this repository and given that they are updated automatically, sometimes they break down. Ideally I need to figure out how to build packages with Hydra and test them automatically.

In the mean time I would try to look into this issue, but I don't have a machine that supports cuda though.

DarkReaperBoy commented 3 months ago

hello again, tysm, that would be great :3 so so, in this readme: https://github.com/LostRuins/koboldcpp?tab=readme-ov-file#considerations there is a Since v1.55, lcuda paths on Linux are hardcoded and may require manual changes to the makefile if you do not use koboldcpp.sh for the compilation. maybe that is the reason it broke in first place, but since i am newbie to nix, didn't tried before in nix, just installed nur

henk717 commented 3 months ago

@AtaraxiaSjel KoboldAI dev here (Although not the main Koboldcpp dev). The issue we are having is that the -lcuda location needs to be hardcoded because thats the only thing the makefile supports.

It expects the lib in one of these locations:

/usr/local/cuda/lib64 -/opt/cuda/lib64 -$(CUDA_PATH)/targets/x86_64-linux/lib -/usr/local/cuda/targets/aarch64-linux/lib -/usr/local/cuda/targets/sbsa-linux/lib -/usr/lib/wsl/lib

For our own build script since its a relative path we also have LLAMA_ADD_CONDA_PATHS=1 which then adds the following two relative paths:

conda/envs/linux/lib
conda/envs/linux/lib/stubs

So ./koboldcpp.sh rebuild will be able to build correctly and thats the method we support the best, but this does pull in an entire micromamba environment. For your nix package mounting / copying the cuda file to one of the expected locations is probably going to work best.

AtaraxiaSjel commented 3 months ago

First off, thanks for showing me what the issue is! I don't have much experience with cuda on NixOS or in general...

I have added the path to the cuda stubs in the makefile and applied the addOpenGLRunpath hook to the libraries being built. Now the compilation completes successfully, but I don't have a machine to test with cuda.

If you can, please check if it works now. I published koboldcpp branch. We can continue discussion here or in this pr #13.

You can check it with something like this:

nix run github:AtaraxiaSjel/nur/koboldcpp#koboldcpp -- --help

and you need this to be enabled indeed:

nixpkgs.config.cudaSupport = true;

DarkReaperBoy commented 3 months ago

hello again, just woke up, apologizes for late response, updated my system to make sure and ran

results: yep, it works, would be great if it were to be merged in main repo.

nix run github:AtaraxiaSjel/nur/koboldcpp#koboldcpp -- --usecublas --model kukulemon-7B-Q4_K_M-imat.gguf --gpulayers 33                                        (base) 
do you want to allow configuration setting 'extra-substituters' to be set to 'https://ataraxiadev-foss.cachix.org' (y/N)? 
do you want to permanently mark this value as untrusted (y/N)? 
warning: ignoring untrusted flake configuration setting 'extra-substituters'.
Pass '--accept-flake-config' to trust it
do you want to allow configuration setting 'extra-trusted-public-keys' to be set to 'ataraxiadev-foss.cachix.org-1:ws/jmPRUF5R8TkirnV1b525lP9F/uTBsz2KraV61058=' (y/N)? 
do you want to permanently mark this value as untrusted (y/N)? 
warning: ignoring untrusted flake configuration setting 'extra-trusted-public-keys'.
Pass '--accept-flake-config' to trust it
***
Welcome to KoboldCpp - Version 1.61.2
Warning: CuBLAS library file not found. Non-BLAS library will be used.
Initializing dynamic library: /nix/store/yf09xvniwggyr80nd2ipvjd50f0llzc2-koboldcpp-1.61.2/lib/koboldcpp_default.so
==========
Namespace(model='kukulemon-7B-Q4_K_M-imat.gguf', model_param='kukulemon-7B-Q4_K_M-imat.gguf', port=5001, port_param=5001, host='', launch=False, config=None, threads=5, usecublas=[], usevulkan=None, useclblast=None, noblas=False, gpulayers=33, tensor_split=None, contextsize=2048, ropeconfig=[0.0, 10000.0], blasbatchsize=512, blasthreads=5, lora=None, smartcontext=False, noshift=False, bantokens=None, forceversion=0, nommap=False, usemlock=False, noavx2=False, debugmode=0, skiplauncher=False, hordeconfig=None, onready='', benchmark=None, multiuser=0, remotetunnel=False, highpriority=False, foreground=False, preloadstory='', quiet=False, ssl=None, nocertify=False, sdconfig=None, mmproj='', password=None, ignoremissing=False)
==========
Loading model: /home/nako/Desktop/models/kukulemon-7B-Q4_K_M-imat.gguf 
[Threads: 5, BlasThreads: 5, SmartContext: False, ContextShift: True]

The reported GGUF Arch is: llama

---
Identified as GGUF model: (ver 6)
Attempting to Load...
---
Using automatic RoPE scaling. If the model has customized RoPE settings, they will be used directly instead!
System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | 
llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /home/nako/Desktop/models/kukulemon-7B-Q4_K_M-imat.gguf (version GGUF V3 (latest))
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attm      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 32768
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = unknown, may not work
llm_load_print_meta: model params     = 7.24 B
llm_load_print_meta: model size       = 4.07 GiB (4.83 BPW) 
llm_load_print_meta: general.name     = D:\Ferramentas\gguf-quantizations\models
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.13 MiB
llm_load_tensors:        CPU buffer size =  4165.37 MiB
................................................................................................
Automatic RoPE Scaling: Using (scale:1.000, base:10000.0).
llama_new_context_with_model: n_ctx      = 2128
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =   266.00 MiB
llama_new_context_with_model: KV self size  =  266.00 MiB, K (f16):  133.00 MiB, V (f16):  133.00 MiB
llama_new_context_with_model:        CPU  output buffer size =    62.50 MiB
llama_new_context_with_model:        CPU compute buffer size =   169.16 MiB
llama_new_context_with_model: graph splits: 1
Load Text Model OK: True
Embedded Kobold Lite loaded.
Starting Kobold API on port 5001 at http://localhost:5001/api/
Starting OpenAI Compatible API on port 5001 at http://localhost:5001/v1/
======
Please connect to custom endpoint at http://localhost:5001
^C⏎

DarkReaperBoy commented 3 months ago

off-topic: just to make sure, dunno me having these will effect the result or not

  programs.nix-ld.libraries = with pkgs; [

    # Add any missing dynamic libraries for unpackaged programs
    # here, NOT in environment.systemPackages
    libz
    fuse  
    icu
    procps
    util-linux
    libepoxy.dev
    cudatoolkit linuxPackages.nvidia_x11
    xorg.libXdmcp xorg.libXtst xorg.libXi xorg.libXmu xorg.libXv xorg.libXrandr
    xorg.libX11 xorg.libxcb
    zlib
    ncurses5
  ];

### add to shell to make cuda work for binaries
      environment.variables = {
    CUDA_PATH = "${pkgs.cudatoolkit}";
    EXTRA_LDFLAGS = "-L/lib -L${pkgs.linuxPackages.nvidia_x11}/lib";
    EXTRA_CCFLAGS = "-I/usr/include";
  };

wanted to make cuda global and not single shell only https://nixos.wiki/wiki/CUDA

AtaraxiaSjel commented 3 months ago

@DarkReaperBoy Excellent! Fix is merged to master branch.

off-topic: just to make sure, dunno me having these will effect the result or not

Honestly, I don't sure too. But addOpenGLRunpath hook should work without settings you provided, only with

nixpkgs.config.cudaSupport = true;

so, all good i think.

DarkReaperBoy commented 3 months ago

hai, installed in configuration.nix, works now, ty again :pray:

WARNING: failed to allocate 266.00 MB of pinned memory: CUDA driver is a stub library
llama_kv_cache_init:        CPU KV buffer size =   266.00 MiB
llama_new_context_with_model: KV self size  =  266.00 MiB, K (f16):  133.00 MiB, V (f16):  133.00 MiB
WARNING: failed to allocate 62.50 MB of pinned memory: CUDA driver is a stub library
llama_new_context_with_model:        CPU  output buffer size =    62.50 MiB
WARNING: failed to allocate 169.16 MB of pinned memory: CUDA driver is a stub library

i wonder if this is concerning but everything works fine now

AtaraxiaSjel commented 3 months ago

Hm, maybe it don't work after all. Can you check if model is offloading to gpu?

DarkReaperBoy commented 3 months ago

@AtaraxiaSjel hai again, yes, tried and saw it's not offloading at all (inside nvtop) and slow. the binary version + cuda-fhs.nix from https://nixos.wiki/wiki/CUDA works butter smooth. anyways though, i was learning home-manager stuff since i just installed nix, so apologizes for not properly testing it. ;-;

AtaraxiaSjel commented 3 months ago

@DarkReaperBoy No worries! I'd check it out myself, but I only have a machine with amd gpu (rocm). I will try to fix the issue this week nonetheless.

DarkReaperBoy commented 3 months ago

i'll keep following my notification, take your time!

DarkReaperBoy commented 2 months ago

hmm, i would just like to make kobo noice

https://github.com/AtaraxiaSjel/nur/assets/110972562/29da6d11-a86e-4e5a-8784-32a96bdfc1b7

henk717 commented 2 months ago

Does that mean its working now?

DarkReaperBoy commented 2 months ago

Does that mean its working now?

nope. sucks to be nvidia user ig. binaries + https://nixos.wiki/wiki/CUDA still works, so i am not left out.

AtaraxiaSjel commented 2 months ago

Hi! Sorry for the delayed response!

For the purposes mentioned here (https://github.com/NixOS/nixpkgs/issues/217780), we need to apply autoAddDriverRunpath (formerly autoAddOpenGLRunpathHook) hook to load CUDA libraries at runtime from the /run/opengl-driver/lib directory.

In previous version of derivation, I had already applied this hook. However, it did solve the issue, and koboldcpp was still unable to load these libraries, resulting in the error "CUDA driver is a stub library."

I investigated the problem further and noticed that in the koboldcpp-libs derivation, koboldcpp_cublas.so contained /run/opengl-driver/lib in its RPATH. However, after copying this library to the koboldcpp derivation, /run/opengl-driver/lib disappeared from the RPATH.

Upon closer inspection, I found that stdenv.mkDerivation uses the patchElf hook in the fixupPhase of the koboldcpp derivation to shrink the RPATHs of ELF executables. To prevent this from happening, I added dontPatchELF = true; to the koboldcpp derivation. Now, koboldcpp_cublas.so retains /run/opengl-driver/lib in its RPATH.

Before the change:

> patchelf --print-rpath result/lib/koboldcpp_cublas.so

/nix/store/mj0z77zqa6kkrm8k54d0qhwscvizyacj-cuda_cudart-12.2.140-lib/lib/stubs:/nix/store/mj0z77zqa6kkrm8k54d0qhwscvizyacj-cuda_cudart-12.2.140-lib/lib:/nix/store/lig5zg0ls4a64f2364cfdfwp3k19nhqy-libcublas-12.2.5.6-lib/lib:/nix/store/35pq4hr29c3sl79lgfwgsvd9nwzyp4am-glibc-2.39-5/lib:/nix/store/f1ii69v7p27z1r5zybmlbld3bdzm6a5f-gcc-13.2.0-lib/lib

After the change:

> patchelf --print-rpath result/lib/koboldcpp_cublas.so

/run/opengl-driver/lib:/nix/store/mj0z77zqa6kkrm8k54d0qhwscvizyacj-cuda_cudart-12.2.140-lib/lib/stubs:/nix/store/mj0z77zqa6kkrm8k54d0qhwscvizyacj-cuda_cudart-12.2.140-lib/lib:/nix/store/lig5zg0ls4a64f2364cfdfwp3k19nhqy-libcublas-12.2.5.6-lib/lib:/nix/store/35pq4hr29c3sl79lgfwgsvd9nwzyp4am-glibc-2.39-5/lib:/nix/store/f1ii69v7p27z1r5zybmlbld3bdzm6a5f-gcc-13.2.0-lib/lib

I hope this should fix the issue. The koboldcpp branch has been updated, and I will merge it into main if you can confirm that this fixes it.

DarkReaperBoy commented 2 months ago

works flawlessly at quick test, it's fine for main branch pr, ty. :pray:

AtaraxiaSjel commented 2 months ago

@DarkReaperBoy done! Thanks for testing 👍🏻

DarkReaperBoy commented 2 months ago

@DarkReaperBoy done! Thanks for testing 👍🏻

hai, i am really sowy, tonight when i was using it, saw that is slow, than realized: the cuda isn't even detected. i really apologize for not properly testing again. i think last time i tested though had cuda which is weird...

ik it's my fault but ty anyways.

AtaraxiaSjel commented 2 months ago

@DarkReaperBoy No worries! Anyway, if you can I want to solve this issue. Can you provide command, how you load your model? And show lib directory from derivation? Path from third string on screenshot.

DarkReaperBoy commented 2 months ago

well how it runs is:

Welcome to KoboldCpp - Version 1.61.2
Attempting to use CuBLAS library for faster prompt ingestion. A compatible CuBLAS will be required.
Initializing dynamic library: /nix/store/fb5456a6znbzh1fs0p8r7schqg155zqm-koboldcpp-1.61.2/lib/koboldcpp_cublas.so
==========
Namespace(model='llama-3-lewdplay-8b-evo.Q4_K_M.gguf', model_param='llama-3-lewdplay-8b-evo.Q4_K_M.gguf', port=5001, port_param=5001, host='', launch=False, config=None, threads=12, usecublas=[], usevulkan=None, useclblast=None, noblas=False, gpulayers=33, tensor_split=None, contextsize=2048, ropeconfig=[0.0, 10000.0], blasbatchsize=512, blasthreads=12, lora=None, smartcontext=False, noshift=False, bantokens=None, forceversion=0, nommap=False, usemlock=False, noavx2=False, debugmode=0, skiplauncher=False, hordeconfig=None, onready='', benchmark=None, multiuser=0, remotetunnel=False, highpriority=False, foreground=False, preloadstory='', quiet=False, ssl=None, nocertify=False, sdconfig=None, mmproj='', password=None, ignoremissing=False)
==========
Loading model: /home/nako/Desktop/models/llama-3-lewdplay-8b-evo.Q4_K_M.gguf 
[Threads: 12, BlasThreads: 12, SmartContext: False, ContextShift: True]

The reported GGUF Arch is: llama

---
Identified as GGUF model: (ver 6)
Attempting to Load...
---
Using automatic RoPE scaling. If the model has customized RoPE settings, they will be used directly instead!
System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | 
ggml_init_cublas: no CUDA devices found, CUDA will be disabled
llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /home/nako/Desktop/models/llama-3-lewdplay-8b-evo.Q4_K_M.gguf (version GGUF V3 (latest))
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attm      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = unknown, may not work
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 4.58 GiB (4.89 BPW) 
llm_load_print_meta: general.name     = Llama-3-LewdPlay-8B-evo
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_tensors: ggml ctx size =    0.13 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:        CPU buffer size =  4685.30 MiB
........................................................................................
Automatic RoPE Scaling: Using model internal value.
llama_new_context_with_model: n_ctx      = 2128
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
WARNING: failed to allocate 266.00 MB of pinned memory: CUDA driver is a stub library
llama_kv_cache_init:        CPU KV buffer size =   266.00 MiB
llama_new_context_with_model: KV self size  =  266.00 MiB, K (f16):  133.00 MiB, V (f16):  133.00 MiB
WARNING: failed to allocate 250.50 MB of pinned memory: CUDA driver is a stub library
llama_new_context_with_model:        CPU  output buffer size =   250.50 MiB
WARNING: failed to allocate 258.50 MB of pinned memory: CUDA driver is a stub library
llama_new_context_with_model:  CUDA_Host compute buffer size =   258.50 MiB
llama_new_context_with_model: graph splits: 1
Load Text Model OK: True
Embedded Kobold Lite loaded.
Starting Kobold API on port 5001 at http://localhost:5001/api/
Starting OpenAI Compatible API on port 5001 at http://localhost:5001/v1/
======
Please connect to custom endpoint at http://localhost:5001

now that i think, i have a suspension on:

### add to shell to make cuda work for binaries
    environment.variables = {
    CUDA_PATH = "${pkgs.cudatoolkit}";
    EXTRA_LDFLAGS = "-L/lib -L${pkgs.linuxPackages.nvidia_x11}/lib";
    EXTRA_CCFLAGS = "-I/usr/include";
    DOTNET_SYSTEM_GLOBALIZATION_PREDEFINED_CULTURES_ONLY= "false";
    DOTNET_SYSTEM_GLOBALIZATION_INVARIANT = "1";
    NIXPKGS_ALLOW_UNFREE = "1";
  };

part of my config, i'll comment it to see what will happen

DarkReaperBoy commented 2 months ago

no, commenting

CUDA_PATH = "${pkgs.cudatoolkit}";
EXTRA_LDFLAGS = "-L/lib -L${pkgs.linuxPackages.nvidia_x11}/lib";

didn't fix repo version and broke the binary version as well, also, i have no idea about

And show lib directory from derivation? Path from third string on screenshot.

AtaraxiaSjel commented 2 months ago

@DarkReaperBoy Can you provide command how you starting koboldcpp? Something like

koboldcpp --contextsize 8192 --usecublas normal mmq --gpulayers 99 --model models/dolphin-2.5-mixtral-8x7b.Q4_K_M.gguf

What this commands would print for you?

ls -lah /nix/store/fb5456a6znbzh1fs0p8r7schqg155zqm-koboldcpp-1.61.2/lib

and

patchelf --print-rpath /nix/store/fb5456a6znbzh1fs0p8r7schqg155zqm-koboldcpp-1.61.2/lib/koboldcpp_cublas.so

DarkReaperBoy commented 2 months ago

@AtaraxiaSjel hello, i use

koboldcpp --usecublas --threads 12 --model llama-3-lewdplay-8b-evo.Q4_K_M.gguf --gpulayers 33

as for the second question it does:

nako@nixos ~/D/models> ls -lah /nix/store/fb5456a6znbzh1fs0p8r7schqg155zqm-koboldcpp-1.61.2/lib                      (base) 
total 47M
dr-xr-xr-x 2 root root 4.0K Jan  1  1970 .
dr-xr-xr-x 5 root root 4.0K Jan  1  1970 ..
-r--r--r-- 1 root root 1.6M Jan  1  1970 kcpp_docs.embd
-r--r--r-- 1 root root 738K Jan  1  1970 klite.embd
-r-xr-xr-x 1 root root  29M Jan  1  1970 koboldcpp_cublas.so
-r-xr-xr-x 1 root root 5.0M Jan  1  1970 koboldcpp_default.so
-r-xr-xr-x 1 root root 5.2M Jan  1  1970 koboldcpp_failsafe.so
-r-xr-xr-x 1 root root 5.2M Jan  1  1970 koboldcpp_noavx2.so
-r--r--r-- 1 root root 398K Jan  1  1970 rwkv_vocab.embd
-r--r--r-- 1 root root 794K Jan  1  1970 rwkv_world_vocab.embd

the third one wasn't installed so i did:

nix-shell -p patchelf

and the output is:

nako@nixos ~/D/models> patchelf --print-rpath /nix/store/fb5456a6znbzh1fs0p8r7schqg155zqm-koboldcpp-1.61.2/lib/koboldcpp_cublas.so
/nix/store/ibsml62bca7zlx80cfwf4mjpqzgm14lc-cuda_cudart-12.2.140-lib/lib/stubs:/nix/store/ibsml62bca7zlx80cfwf4mjpqzgm14lc-cuda_cudart-12.2.140-lib/lib:/nix/store/8lc4iisqw0lajd8lbjwdbiywrlzkg8hb-libcublas-12.2.5.6-lib/lib:/nix/store/1rm6sr6ixxzipv5358x0cmaw8rs84g2j-glibc-2.38-44/lib:/nix/store/agp6lqznayysqvqkx4k1ggr8n1rsyi8c-gcc-13.2.0-lib/lib

AtaraxiaSjel commented 2 months ago

@DarkReaperBoy It seems that you may have an older version of my nur repo. You're using koboldcpp v1.61.2, but after I fixed the problem, the repo has been updated to v1.63, and now it's at v1.64.

As a result, koboldcpp_cublas.so doesn't have the correct path to the CUDA libraries in its RPATH. How are you using this repo? Maybe you forgot to switch the branch back to main from koboldcpp? Try updating the repo, or run koboldcpp like this:

nix run github:AtaraxiaSjel/nur/koboldcpp#koboldcpp -- --usecublas <other flags>

If this would not help you, try to build koboldcpp

nix build github:AtaraxiaSjel/nur/koboldcpp#koboldcpp

and print rpaths like this:

patchelf --print-rpath result/lib/koboldcpp_cublas.so

And sorry to bother you, I don't have an nvidia gpu, so I can't really test or debug this myself :)

DarkReaperBoy commented 2 months ago

hello again. it's me who didn't test properly and take time. i sincerely apologize. i am really am glad to help you. so... i am using nixos-unstable really find it weird that

sudo nixos-rebuild switch --upgrade

doesn't update anything for a while. maybe it's a wrong command and i should do it along with

sudo nix-channel --update && sudo nix-collect-garbage -d

either way, i did both, will figure out someday. now to the issue, why i have brought up is that maybe it's related? so, running the first command gives:

nako@nixos ~/D/models> nix run github:AtaraxiaSjel/nur/koboldcpp#koboldcpp -- --usecublas --threads 12 --model Meta-Llama-3-8B-Instruct.Q4_K_S.gguf --gpulayers 33
do you want to allow configuration setting 'extra-substituters' to be set to 'https://ataraxiadev-foss.cachix.org' (y/N)? 
do you want to permanently mark this value as untrusted (y/N)? 
warning: ignoring untrusted flake configuration setting 'extra-substituters'.
Pass '--accept-flake-config' to trust it
do you want to allow configuration setting 'extra-trusted-public-keys' to be set to 'ataraxiadev-foss.cachix.org-1:ws/jmPRUF5R8TkirnV1b525lP9F/uTBsz2KraV61058=' (y/N)? 
do you want to permanently mark this value as untrusted (y/N)? 
warning: ignoring untrusted flake configuration setting 'extra-trusted-public-keys'.
Pass '--accept-flake-config' to trust it
***
Welcome to KoboldCpp - Version 1.63
Warning: CuBLAS library file not found. Non-BLAS library will be used.
Initializing dynamic library: /nix/store/g700v6k453k05abckmhy8lbzs0vj6bih-koboldcpp-1.63/lib/koboldcpp_default.so
==========
Namespace(model='Meta-Llama-3-8B-Instruct.Q4_K_S.gguf', model_param='Meta-Llama-3-8B-Instruct.Q4_K_S.gguf', port=5001, port_param=5001, host='', launch=False, config=None, threads=12, usecublas=[], usevulkan=None, useclblast=None, noblas=False, gpulayers=33, tensor_split=None, contextsize=2048, ropeconfig=[0.0, 10000.0], blasbatchsize=512, blasthreads=12, lora=None, smartcontext=False, noshift=False, bantokens=None, forceversion=0, nommap=False, usemlock=False, noavx2=False, debugmode=0, skiplauncher=False, hordeconfig=None, onready='', benchmark=None, multiuser=0, remotetunnel=False, highpriority=False, foreground=False, preloadstory='', quiet=False, ssl=None, nocertify=False, sdconfig=None, mmproj='', password=None, ignoremissing=False, chatcompletionsadapter='')
==========
Loading model: /home/nako/Desktop/models/Meta-Llama-3-8B-Instruct.Q4_K_S.gguf 
[Threads: 12, BlasThreads: 12, SmartContext: False, ContextShift: True]

The reported GGUF Arch is: llama

---
Identified as GGUF model: (ver 6)
Attempting to Load...
---
Using automatic RoPE scaling. If the model has customized RoPE settings, they will be used directly instead!
System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | 
llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /home/nako/Desktop/models/Meta-Llama-3-8B-Instruct.Q4_K_S.gguf (version GGUF V3 (latest))
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = unknown, may not work
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 4.36 GiB (4.67 BPW) 
llm_load_print_meta: general.name     = .
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_tensors: ggml ctx size =    0.17 MiB
llm_load_tensors:        CPU buffer size =  4467.80 MiB
.......................................................................................
Automatic RoPE Scaling: Using (scale:1.000, base:500000.0).
llama_new_context_with_model: n_ctx      = 2144
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =   268.00 MiB
llama_new_context_with_model: KV self size  =  268.00 MiB, K (f16):  134.00 MiB, V (f16):  134.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.49 MiB
llama_new_context_with_model:        CPU compute buffer size =   258.50 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 1
Load Text Model OK: True
Embedded Kobold Lite loaded.
Starting Kobold API on port 5001 at http://localhost:5001/api/
Starting OpenAI Compatible API on port 5001 at http://localhost:5001/v1/
======
Please connect to custom endpoint at http://localhost:5001

and it is not even version v1.64 strangely. as for

How are you using this repo? Maybe you forgot to switch the branch back to main from koboldcpp?

so i did add nur to my flake like this:

{
  inputs = { 
    nixpkgs.url = "github:nixos/nixpkgs/nixos-unstable"; 
    home-manager.url = "github:nix-community/home-manager";
    home-manager.inputs.nixpkgs.follows = "nixpkgs";
    nix-flatpak.url = "github:gmodena/nix-flatpak";
    nur.url = "github:nix-community/NUR";
};

  outputs = { self, nixpkgs, nix-flatpak, home-manager, nur, ... }@inputs:
    let
      system = "x86_64-linux";
      username = "nako";
      pkgs = nixpkgs.legacyPackages.${system};
    in {
      nixosConfigurations.nixos = nixpkgs.lib.nixosSystem {
        specialArgs = { inherit inputs; };
        modules = [
        nur.nixosModules.nur
        home-manager.nixosModules.home-manager
        nix-flatpak.nixosModules.nix-flatpak
        ./configuration.nix
         ];
      };
    };
}

then added kobo to the "environment.systemPackages" (main place to install stuff as system application) with "config.nur.repos.ataraxiasjel.koboldcpp". (i would gladly share my configuration.nix if needed). that is how i use.

nix build github:AtaraxiaSjel/nur/koboldcpp#koboldcpp

did nothing. and lastly:

nako@nixos ~/D/models> patchelf --print-rpath result/lib/koboldcpp_cublas.so                                                                        (base) 
patchelf: getting info about 'result/lib/koboldcpp_cublas.so': No such file or directory

let me share my config just in case (changed to txt because github doesn't allow.).

configuration.txt

flake.txt

AtaraxiaSjel / nur

koboldcpp having issue with -lcuda path #12