LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.66k stars 334 forks source link

Is old GGML format dropped? #715

Closed beebopkim closed 5 months ago

beebopkim commented 5 months ago

I tried to run a old GGML model from https://huggingface.co/StarFox7/Llama-2-ko-7B-chat-gguf , and found that it was failed.

(kdev_env) koboldcpp_dev % python koboldcpp.py --noblas --gpulayers 999 --blasbatchsize 256 --contextsize 16384 --model /Volumes/cuttingedge/large_language_models/models_ggml_converted/StarFox7_Llama-2-ko-7B-chat-gguf/Llama-2-ko-7B-chat-gguf-q5_1.bin
***
Welcome to KoboldCpp - Version 1.59.1
Warning: OpenBLAS library file not found. Non-BLAS library will be used.
Initializing dynamic library: koboldcpp_default.so
==========
Namespace(model='/Volumes/cuttingedge/large_language_models/models_ggml_converted/StarFox7_Llama-2-ko-7B-chat-gguf/Llama-2-ko-7B-chat-gguf-q5_1.bin', model_param='/Volumes/cuttingedge/large_language_models/models_ggml_converted/StarFox7_Llama-2-ko-7B-chat-gguf/Llama-2-ko-7B-chat-gguf-q5_1.bin', port=5001, port_param=5001, host='', launch=False, lora=None, config=None, threads=4, blasthreads=4, highpriority=False, contextsize=16384, blasbatchsize=256, ropeconfig=[0.0, 10000.0], smartcontext=False, noshift=False, bantokens=None, forceversion=0, nommap=False, usemlock=False, noavx2=False, debugmode=0, skiplauncher=False, hordeconfig=None, noblas=True, useclblast=None, usecublas=None, usevulkan=None, gpulayers=999, tensor_split=None, onready='', benchmark=None, multiuser=0, remotetunnel=False, foreground=False, preloadstory='', quiet=False, ssl=None, nocertify=False)
==========
Loading model: /Volumes/cuttingedge/large_language_models/models_ggml_converted/StarFox7_Llama-2-ko-7B-chat-gguf/Llama-2-ko-7B-chat-gguf-q5_1.bin 
[Threads: 4, BlasThreads: 4, SmartContext: False, ContextShift: True]
gguf_init_from_file: GGUFv1 is deprecated. please update if possible.
GGML_ASSERT: ggml.c:19922: info->n_dims <= GGML_MAX_DIMS
zsh: abort      python koboldcpp.py --noblas --gpulayers 999 --blasbatchsize 256 --contextsiz
(kdev_env) koboldcpp_dev %

And I also found that program from tag v1.56 loads this model in old GGML format.

So I have a question; Is old GGML format dropped?

virt-god commented 5 months ago

Try disabling ContextShift. --noshift ContextShift is only for gguf SmartContext should work for ggml

beebopkim commented 5 months ago

Try disabling ContextShift. --noshift

ContextShift is only for gguf

SmartContext should work for ggml

I put --noshift and tried to do it again, and it was failed with the same error message.

LostRuins commented 5 months ago

No, support is not dropped but I can confirm it has been broken since 1.56. I will try to fix it if possible.

LostRuins commented 5 months ago

found the bug. fixed in my experimental branch.

beebopkim commented 5 months ago

I checkouted concedo_experimental branch and has confirmed that old GGML format is now working. Thanks for your fast response!

LostRuins commented 5 months ago

Latest version is released, which should have this fixed.

beebopkim commented 5 months ago

I confirmed that b67a906 tags/v1.60 is working with old GGML models very well. Thanks!