Atome-FE / llama-node

Believe in AI democratization. llama for nodejs backed by llama-rs, llama.cpp and rwkv.cpp, work locally on your laptop CPU. support llama/alpaca/gpt4all/vicuna/rwkv model.
https://llama-node.vercel.app/
Apache License 2.0
865 stars 63 forks source link

Error: Failed to convert napi value Function into rust type `f64` #67

Open ralyodio opened 1 year ago

ralyodio commented 1 year ago

Getting this error: Error: Failed to convert napi value Function into rust typef64``

import { LLM } from "llama-node";
//import { LLamaRS } from "llama-node/dist/llm/llama-rs.js";
import { LLamaCpp } from "llama-node/dist/llm/llama-cpp.js";
import path from "path";
import os from 'os';

export default class AI {
    constructor(chat, msg) {
        this.chat = chat;
        this.msg = msg;
        this.model = path.resolve(os.homedir(), 'models', path.basename(this.chat.model));
        this.llama = new LLM(LLamaCpp);
        this.template = this.msg.body;
        this.prompt = `Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:

${this.template}

### Response:`;

        this.cppConfig = {
            enableLogging: true,
            nCtx: 1024,
            nParts: -1,
            seed: 0,
            f16Kv: false,
            logitsAll: false,
            vocabOnly: false,
            useMlock: false,
            embedding: false,
            useMmap: true,
        };

        this.cppParams = {
            prompt: this.prompt,
            nThreads: 4,
            nTokPredict: 2048,
            topK: 40,
            topP: 0.1,
            temp: this.model.temp || 0.2,
            repeatPenalty: this.model.repeat || 1,
        };

        this.getAIResponse = this.getAIResponse.bind(this);
    }

    async getAIResponse() {
        console.log('calling ai: ', this.chat);
        try {
            await this.llama.load({ path: this.model, ...this.cppConfig });
            await this.llama.createCompletion(this.cppParams, (response) => {
                process.stdout.write(JSON.stringify({ prompt, response: response.token }));
                process.stdout.write(response.token);

                return {
                    prompt: this.prompt,
                    response: response.token
                };
            });
        } catch (err) {
            console.error(err);
        }
    }
}

Model is: WizardLM-7B-uncensored.ggml.q5_1.bin

ralyodio commented 1 year ago

here is the stack trace:

llama.cpp: loading model from /home/ettinger/models/ggml-model-q4_1.bin
llama_model_load_internal: format     = ggjt v1 (pre #1405)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 1024
llama_model_load_internal: n_embd     = 6656
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 52
llama_model_load_internal: n_layer    = 60
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 3 (mostly Q4_1)
llama_model_load_internal: n_ff       = 17920
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 30B
error loading model: this format is no longer supported (see https://github.com/ggerganov/llama.cpp/pull/1305)
llama_init_from_file: failed to load model
[2023-05-16T10:47:19Z INFO  llama_node_cpp::context] AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
Error: Failed to convert napi value Function into rust type `f64`
    at file:///home/ettinger/src/descriptive.chat/descriptive-web/node_modules/llama-node/dist/llm/llama-cpp.js:73:39
    at new Promise (<anonymous>)
    at LLamaCpp.<anonymous> (file:///home/ettinger/src/descriptive.chat/descriptive-web/node_modules/llama-node/dist/llm/llama-cpp.js:71:14)
    at Generator.next (<anonymous>)
    at file:///home/ettinger/src/descriptive.chat/descriptive-web/node_modules/llama-node/dist/llm/llama-cpp.js:33:61
    at new Promise (<anonymous>)
    at __async (file:///home/ettinger/src/descriptive.chat/descriptive-web/node_modules/llama-node/dist/llm/llama-cpp.js:17:10)
    at LLamaCpp.createCompletion (file:///home/ettinger/src/descriptive.chat/descriptive-web/node_modules/llama-node/dist/llm/llama-cpp.js:67:12)
    at LLM.<anonymous> (/home/ettinger/src/descriptive.chat/descriptive-web/node_modules/llama-node/dist/index.cjs:56:23)
    at Generator.next (<anonymous>) {
  code: 'NumberExpected'
ralyodio commented 1 year ago

I upgraded to latest version and this started happening. It work on 0.1.2. Official example no longer works either.

hlhr202 commented 1 year ago

it looks like you are using GGJT v1 model and llama.cpp backend has ended the support of it. https://github.com/ggerganov/llama.cpp/pull/1305/files#diff-150dc86746a90bad4fc2c3334aeb9b5887b3adad3cc1459446717638605348efR921 https://github.com/ggerganov/llama.cpp/blob/master/llama.cpp#L932

ralyodio commented 1 year ago

ok, so how do i find models on huggingface.co that are actually supported?

do you know what the reasoning is for dropping support? I don't see many q8 models available.

hlhr202 commented 1 year ago

ok, so how do i find models on huggingface.co that are actually supported?

do you know what the reasoning is for dropping support? I don't see many q8 models available.

@ralyodio its hard to identify the real version of a model. this is a huge trap for ggml eco system. q4_1 and q5_1 etc are just the quantization format and they are not used to label a version of a model... for llama.cpp the most latest models are mostly in q5_1. the reasons for this are performance improvement/cuda supports/mmap loading, etc.