Open cspenn opened 11 months ago
Everything seems fine, although you're probably not using Metal (I think, did you build with LLAMA_METAL=1)? But accelerate is pretty fast too.
Startup times depend on having the model weights transferred into RAM or VRAM. My guess is that during earlier versions the weights were already cached in memory and thus able to be loaded very quickly. Did you measure the speed the second time you open KoboldCpp? It should load substantially faster on second open. Perhaps the previous time you tried, you just downloaded the model.
I did build with Metal, yes. I'll repull just to be sure. I have a script I use to swap models in and out, so the weights don't stay in RAM very long.
Huh. What changed in 1.54? It's back to its super fast load times- Nete 13B which was not in memory loaded in seconds. Mixtral used to take close to 5 minutes to load up under 1.53, was ready to go in 25 seconds.
That's good, but I don't think anything has changed. Like I mentioned, I believe your issue with 1.53 was due to some other bottleneck with the way the weights are stored/loaded in your system. I think if you test 1.53 again after loading in 1.54, it will also load just as quickly.
Since 1.52, Kobold seems to take substantially longer to start up - on the order of 10x the previous startup times.
MacOS Sonoma, currently on KoboldCpp 1.53. Here's what it shows at startup:
Here's the launch command, issued through a zsh script:
if [[ -n $selected_model ]]; then python3 koboldcpp.py "$selected_model" 6969 --gpulayers 128 --contextsize $context_size
How would I troubleshoot what's changed and why it takes so long to start up now?