Gadersd / llama2-burn

Llama2 LLM ported to Rust burn
MIT License
272 stars 17 forks source link

Getting Buffer size error #2

Closed vsndev3 closed 1 year ago

vsndev3 commented 1 year ago

Hi, When I am trying to convert I am getting below error at the end of conversion (having 64GB ram and 320GB swap):

$ cargo run --release --bin convert /x/llama-py/params llama2-7b-chat
...
/ai/llama2-burn/llama-py/params/layer31/feedforward/w2/weight.npy
/ai/llama2-burn/llama-py/params/layer31/feedforward/w3/weight.npy
/ai/llama2-burn/llama-py/params/layer31/ffn_norm/weight.npy
/ai/llama2-burn/llama-py/params/layer31/ffn_norm/eps.npy
/ai/llama2-burn/llama-py/params/n_ctx.npy
/ai/llama2-burn/llama-py/params/theta.npy
/ai/llama2-burn/llama-py/params/multiple_of.npy
thread 'main' panicked at 'wgpu error: Validation Error

Caused by:
    In Device::create_buffer
      note: label = `Buffer Src`
    Buffer size 524288000 is greater than the maximum buffer size (268435456)

', /home/bx/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.17.0/src/backend/direct.rs:3056:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Anything I am missing here?

Gadersd commented 1 year ago

I looked and noticed that I accidentally use the GPU during the conversion process. I'll switch it to the CPU later today and it should work then.

Gadersd commented 1 year ago

I modified the conversion and testing to use the CPU so it should work now.

vsndev3 commented 1 year ago

Thanks for quick turnaround, unfortunately I am still facing the same issue.

/ai/llama2-burn$ git log -1
commit c5127866f5faa70a5d91c1a434c28c44f21136a7 (HEAD -> main, origin/main, origin/HEAD)
Author: Gadersd <gadersd@gmail.com>
Date:   Sun Jul 30 10:15:13 2023 -0400

    Use CPU for conversion and test

/ai/llama2-burn$ python3 llama-py/test.py /modeldisk/storage/models/llama-2-7b-chat/ /modeldisk/storage/models/llama-2-7b-chat/tokenizer.model 
#words: 32000 BOS ID: 1 EOS ID: 2 PAD ID: -1
#words: 32000 BOS ID: 1 EOS ID: 2 PAD ID: -1
Loaded model
Sample is 29896 1
Sample is 29900 0
Sample is 29900 0
Sample is 29995 %
Sample is 1854 sure
Sample is 393 that
Sample is 306 I
Sample is 626 am
Sample is 451 not
Sample is 263 a
Sampled output: Hello, I am 100% sure that I am not a

/ai/llama2-burn$ python3 llama-py/dump_model.py /modeldisk/storage/models/llama-2-7b-chat/ /modeldisk/storage/models/llama-2-7b-chat/tokenizer.model 
#words: 32000 BOS ID: 1 EOS ID: 2 PAD ID: -1
Loaded model
Dumping model...
Dump saved in params folder.

/ai/llama2-burn$ cargo run --release --bin convert params llama2-7b-chat
   Compiling libc v0.2.147
   Compiling cfg-if v1.0.0

...
warning: `llama` (bin "convert") generated 4 warnings (run `cargo fix --bin "convert"` to apply 3 suggestions)
    Finished release [optimized] target(s) in 40.04s
warning: the following packages contain code that will be rejected by a future version of Rust: nom v1.2.4, nom v3.2.1
note: to see what the problems were, use the option `--future-incompat-report`, or run `cargo report future-incompatibilities --id 1`
     Running `target/release/convert params llama2-7b-chat`
params/n_layer.npy
params/layer0/attention/wq/weight.npy
params/layer0/attention/wk/weight.npy
params/layer0/attention/wv/weight.npy

...
params/layer31/ffn_norm/eps.npy
params/n_ctx.npy
params/theta.npy
params/multiple_of.npy
thread 'main' panicked at 'wgpu error: Validation Error

Caused by:
    In Device::create_buffer
      note: label = `Buffer Src`
    Buffer size 524288000 is greater than the maximum buffer size (268435456)

', /home/bx/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.17.0/src/backend/direct.rs:3056:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: core::ops::function::Fn::call
   3: <wgpu::backend::direct::Context as wgpu::context::Context>::device_create_buffer
   4: <T as wgpu::context::DynContext>::device_create_buffer
   5: <wgpu::Device as wgpu::util::device::DeviceExt>::create_buffer_init
   6: burn_wgpu::context::base::Context::create_buffer_with_data_options
   7: burn_wgpu::ops::base::from_data
   8: burn_tensor::tensor::api::float::<impl burn_tensor::tensor::api::base::Tensor<B,_>>::from_floats
   9: llama::model::load_tensor
  10: llama::model::load_llama_dump
  11: convert::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Perhaps a general log message about the devices and size might help?

Gadersd commented 1 year ago

I looked into it and the loaded tensors were being created on the GPU and then transferred onto the CPU. I just modified the loading functions to directly load the tensors onto the CPU. I don't have time right now to test it on a cloud instance, but it should work.

iwarp commented 1 year ago

I've just cloned the repo to give it a try, I get this with your changes CPU only with no conversion taking place, switching the wgpu device to best available returns the error above.

Running target\debug\convert.exe C:\git\llama2-burn\llama-py\params chat

thread 'main' panicked at 'No CPU device found, adapters [], other adapters []', C:\Users\iwarp.cargo\registry\src\index.crates.io-6f17d22bba15001f\burn-wgpu-0.8.0\src\context\base.rs:292:17 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace error: process didn't exit successfully: target\debug\convert.exe C:\git\llama2-burn\llama-py\params chat (exit code: 101)

Gadersd commented 1 year ago

It looks like the wgpu backend has some bugs. I suppose I'll have to switch it back to torch.

vsndev3 commented 1 year ago

Working with torch backend update