Open TheBloke opened 1 year ago
gpt_neox_model_load: ggml ctx size = 17592186043162.29 MB
It seems to be a calculation error with signed and unsigned integers.
Change int
to size_t
in these lines:
const int n_embd = hparams.n_embd;
const int n_layer = hparams.n_layer;
const int n_ctx = hparams.n_ctx;
const int n_vocab = hparams.n_vocab;
to
const size_t n_embd = hparams.n_embd;
const size_t n_layer = hparams.n_layer;
const size_t n_ctx = hparams.n_ctx;
const size_t n_vocab = hparams.n_vocab;
Working on q8_0 quantization:
./main -m litterature-7b-q8_0.bin
main: seed = 1685837187
gpt_neox_model_load: loading model from 'litterature-7b-q8_0.bin' - please wait ...
gpt_neox_model_load: n_vocab = 50432
gpt_neox_model_load: n_ctx = 16384
gpt_neox_model_load: n_embd = 4096
gpt_neox_model_load: n_head = 32
gpt_neox_model_load: n_layer = 32
gpt_neox_model_load: n_rot = 128
gpt_neox_model_load: par_res = 0
gpt_neox_model_load: ftype = 2007
gpt_neox_model_load: qntvr = 2
gpt_neox_model_load: ggml ctx size = 25384.91 MB
gpt_neox_model_load: memory_size = 8192.00 MB, n_mem = 524288
gpt_neox_model_load: ................................................ done
gpt_neox_model_load: model size = 6953.16 MB / num tensors = 388
extract_tests_from_file : No test file found.
test_gpt_tokenizer : 0 tests failed out of 0 tests.
main: number of tokens in prompt = 1
main: token[0] = 3726, They
They-the-heavens! I've been sitting here, and he's never come back!^C
Thanks so much! i will test and close this shortly
Hey guys
Today I was doing quants of a new GPTNeoX model called Literature-7B-16384
I tried making GGMLs through the usual process:
Both steps completed fine. But the models can't be used.
Trying to use the fp32:
Trying an fp16 conversion instead is even more spectacular:
And then trying a quantised version made from either fp32 or fp16 gives the same errors as with the fp32:
I tried various
-n
values with both files but that made no difference.I assume it's because some support needs to be made for the unusually large context size? I have previously tested GPTNeoX models with 4k and 8k context and they seemed to work.
I don't know if this is a bug or a feature request, but I thought I'd let you guys know. Let me know if you'd like me to upload the fp16, fp32 or q4_0 GGMLs anywhere for inspection.
Thanks in advance!