Llama2 quantized q5_1 - Githubissues

I am getting this error:

llama.cpp: loading model from /Documents/Proj/delta/llama-2-7b-chat/ggml-model-q5_1.bin
error loading model: unrecognized tensor type 14

llama_init_from_file: failed to load model
node:internal/process/promises:289
            triggerUncaughtException(err, true /* fromPromise */);
            ^

[Error: Failed to initialize LLama context from file: /Documents/Proj/delta/llama-2-7b-chat/ggml-model-q5_1.bin] {
  code: 'GenericFailure'
}

My index.js:

import { LLM } from "llama-node";
import { LLamaCpp } from "llama-node/dist/llm/llama-cpp.js";
import path from "path";

const model = path.resolve(process.cwd(), "./llama-2-7b-chat/ggml-model-q5_1.bin");
const llama = new LLM(LLamaCpp);
const config = {
    modelPath: model,
    enableLogging: false,
    nCtx: 1024,
    seed: 0,
    f16Kv: false,
    logitsAll: false,
    vocabOnly: false,
    useMlock: false,
    embedding: false,
    useMmap: true,
    nGpuLayers: 0
};

const run = async () => {
    await llama.load(config);

    await llama.createCompletion({
        prompt: "My favorite movie is",
        nThreads: 4,
        nTokPredict: 1024,
        topK: 40,
        topP: 0.1,
        temp: 0.3,
        repeatPenalty: 1,
      }, (response) => {
        process.stdout.write(response.token)
      })
  }

  run();

It worked before I quantized, but I am hoping quantization makes it faster because it is so slow right now (I'm assuming this would have fixed the speed).

Atome-FE / llama-node

Llama2 quantized q5_1 #108