Closed shimaowo closed 1 year ago
What do you mean mismatch? Are you saying that a model quantized on the official repo does not work on mine?
Yes, in some cases. If you take the tip of the ggml repo and build it, and use the (gpt-j folder's) convert-h5-to-ggml.py, followed by gpt-j-quantize (because it builds multiple quantize executables in that repo), the resulting model will work with ggml's own gpt-j inference executable, but will crash koboldcpp as detailed above.
I assumed from the error that there is a source mismatch between this repo's embedded ggml files and the original repo, but I didn't look into it very far.
ggml is fairly problematic in general with these sort of hidden incompatibilities (it has multiple convert-h5-to-ggml.py scripts that do different things, for example, and it's annoying how it and llamacpp constantly handle the same inputs differently). So this may be less a bug and more a heads-up that this doesn't work, and that it's likely this sort of problem will keep cropping up.
This also may be due to my running the tip of ggml, but the latest (non-cuda) release exe of koboldcpp, so it's possible there are some unreleased updates that might affect things. I'll try with a source build, but it probably won't be for a day or two.
Please try again with the latest version of my repo
Awesome, a quick test with 1.23 looks like it works. I'll try on a couple other models later, but they all had the same issue so I expect this fixed the issue.
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
Using gglm to convert and quantize a gpt-j model should load properly in koboldcpp.
Current Behavior
koboldcpp crashes on startup with the following output:
Environment and Context
Win10, rtx3080, 64gb ram
Other info
It seems like there may be some kind of mismatch between your version of ggml files and the actual ggml repo. In order to get a gpt-j model to convert and quantize properly, I had to use the tools in that actual repo, under the relevant example folder.
It's worth nothing that the versions in the llamacpp repo don't support these models either, as not all of the ggml formats/etc have made it over there yet.
koboldcpp has worked correctly on other models I have converted to q5_1 and tried. It failed on 2 gpt-j models, at which point I stopped trying. Also the quantized models themselves work when using the gpt-j example application from ggml.