Atome-FE / llama-node

Believe in AI democratization. llama for nodejs backed by llama-rs, llama.cpp and rwkv.cpp, work locally on your laptop CPU. support llama/alpaca/gpt4all/vicuna/rwkv model.
https://llama-node.vercel.app/
Apache License 2.0
865 stars 63 forks source link

cannot use RWKV models #121

Open rozek opened 1 year ago

rozek commented 1 year ago

I just tried to use the current version of "llama-node" with the "rwkv.cpp" backend and failed.

The link found in the docs where I should be able to download RWKV models points to nowhere.

Since I could not find pre-quantized models anywhere, I followed the instructions found in the rwkv.cpp repo to download, convert and quantize the 1.5B and 0.1B models - I even uploaded them to HuggingFace.

Then, I copied the example found in your docs added a path to my quantized model, changed the template and tried to run the result.

Unfortunately, I got nothing but an error message:

llama.cpp: loading model from /Users/andreas/rozek/AI/RWKV/RWKV-5-World-0.1B-v1-20230803-ctx4096-Q4_1.bin
error loading model: unknown (magic, version) combination: 67676d66, 00000065; is this really a GGML file?
llama_init_from_file: failed to load model
node:internal/process/promises:288
            triggerUncaughtException(err, true /* fromPromise */);
            ^

[Error: Failed to initialize LLama context from file: /Users/andreas/rozek/AI/RWKV/RWKV-5-World-0.1B-v1-20230803-ctx4096-Q4_1.bin] {
  code: 'GenericFailure'
}

Node.js v18.17.0

Do you have any idea what could be wrong?

rozek commented 1 year ago

I just learned that RWKV-5 models are not yet supported by rwkv.cpp.

So I tried RWKV-4 instead - took the .pth model and converted it to .bin following the docs. Unfortunately, however, the result is the same:

llama.cpp: loading model from /Users/andreas/rozek/AI/RWKV/RWKV-4-World-0.1B-v1-20230520-ctx4096.bin
error loading model: unknown (magic, version) combination: 67676d66, 00000065; is this really a GGML file?
llama_init_from_file: failed to load model
node:internal/process/promises:288
            triggerUncaughtException(err, true /* fromPromise */);
            ^

[Error: Failed to initialize LLama context from file: /Users/andreas/rozek/AI/RWKV/RWKV-4-World-0.1B-v1-20230520-ctx4096.bin] {
  code: 'GenericFailure'
}

Node.js v18.17.0

Using the same model with python python/generate_completions.py /rwkv/RWKV-4-World-0.1B-v1-20230520-ctx4096.bin from rwkv.cpp works, however

yorkzero831 commented 1 year ago

hi there, could you please check your ggml version, it my not work if you are using the recent ggml version

rozek commented 1 year ago

how do I check my GGML version? I'm using the current version of rwkv.cpp

rozek commented 1 year ago

I just found a section in th rwkv.cpp README.md which says:

⚠️ Python API was restructured on 2023-09-20, you may need to change paths/package names in your code when updating rwkv.cpp.

may this be the reason for misbehaviour?

rozek commented 1 year ago

FYI: I just used the version of rwkv.cpp from from Sept, 20th (before they restructured the Python API) and tried again - with the same results.

Which means: no, the API restructuring is not the reason for not loading the RWKV model

rozek commented 1 year ago

FYI: going back to the latest commit (of rwkv.cpp) before "update ggml" fails because the resulting code can not be compiled.

Thus, in order to test if "llama-node" does work with RWKV actually means to go back to commit "update ggml" (8db73b1) and manually revert any changes related to GGML

Damn...

Not being a C++ developer, I have to give up here - I'll mention this problem in rwkv.cpp as well (see issue 144), let's see who will be able to fix it

saharNooby commented 1 year ago

Hi!

The module rwkv-cpp in llama-node explicitly points to a specific version of rwkv.cpp: rwkv.cpp @ 363dfb1. In turn, this version of rwkv.cpp explicitly points to a specific version of ggml: ggml @ 00b49ec. For all of this to work, I highly recommend not using newest/arbitrary versions of the packages, and stick to the ones that are explicitly referenced -- that way, everything should be compatible with each other.

saharNooby commented 1 year ago

If it helps debugging, for some reason llama.cpp loads the RWKV file, not rwkv.cpp:

llama.cpp: loading model from /Users/andreas/rozek/AI/RWKV/RWKV-4-World-0.1B-v1-20230520-ctx4096.bin
rozek commented 1 year ago

That was quick - thank you very much.

Unfortunately, I cannot get rwkv.cpp @ 363dfb1 to compile.

Unless I manage to find out why, I may have to wait for RWKV-5 support.

Nevertheless, thank you very much for your effort!

rozek commented 1 year ago

FYI: I managed to compile rwkv.cpp again - my mistake was to only git reset --hard rwkv.cpp itself, but not the included ggml repo...

Now I'm trying to use it - a first attempt with the current version of llama-node failed with the same error message as before.

Let's see what the detail llama.cpp: loading model means

rozek commented 1 year ago

Ok, I think I have to give up - now RWKV crashes with

Unsupported file version 101
/Users/runner/work/llama-node/llama-node/packages/rwkv-cpp/rwkv-sys/rwkv.cpp/rwkv.cpp:195: version == RWKV_FILE_VERSION
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '                        ' was expected to have ID '50254' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '                       ' was expected to have ID '50255' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '                      ' was expected to have ID '50256' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '                     ' was expected to have ID '50257' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '                    ' was expected to have ID '50258' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '                   ' was expected to have ID '50259' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '                  ' was expected to have ID '50260' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '                 ' was expected to have ID '50261' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '                ' was expected to have ID '50262' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '               ' was expected to have ID '50263' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '              ' was expected to have ID '50264' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '             ' was expected to have ID '50265' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '            ' was expected to have ID '50266' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '           ' was expected to have ID '50267' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '          ' was expected to have ID '50268' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '         ' was expected to have ID '50269' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '        ' was expected to have ID '50270' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '       ' was expected to have ID '50271' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '      ' was expected to have ID '50272' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '     ' was expected to have ID '50273' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '    ' was expected to have ID '50274' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '   ' was expected to have ID '50275' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - WARN - tokenizers::tokenizer::serialization] - Warning: Token '  ' was expected to have ID '50276' but was given ID 'None'
[Sat, 11 Nov 2023 13:57:54 +0000 - INFO - rwkv_node_cpp::context] - AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 
zsh: segmentation fault  node RWKV.mjs

I installed llama-node using

npm install llama-node
npm install @llama-node/rwkv-cpp

which seems to be wrong anyway as the RWKV inference example refers to a file (20B_tokenizer.json) which is only found within the node_modules/llama-node folder and should not have to be referenced there

yorkzero831 commented 1 year ago

@rozek I think this is because of your rwkv model was quantified by wrong version of rwkv.cpp, you may have last try on quantify the model file with rwkv.cpp @ 363dfb1.

yorkzero831 commented 1 year ago

FYI: only rwkv-4-raven been tested

rozek commented 1 year ago

Well, I meanwhile used rwkv.cpp @ 363dfb1 with ggml @ 00b49ec, as mentioned above.

But, as described before

yorkzero831 commented 1 year ago

@rozek I used https://drive.google.com/file/d/1JyUmwZ9npQJDdYaAd4XKBfTNwUQpQXn9/view?usp=sharing before, it should work well

yorkzero831 commented 1 year ago

and been tested in "llama-node": "^0.1.6", lol maybe too old