Gadersd / llama2-burn

Llama2 LLM ported to Rust burn
MIT License
269 stars 17 forks source link

How can I run this with wgpu? #3

Open majian4work opened 1 year ago

majian4work commented 1 year ago

I want to test this project on my laptop with Intel Iris Xe Graphics, how can I achieve that? my cpu memory is 16G.

Gadersd commented 1 year ago

burn-wgpu currently doesn't use the full device memory available so llama2 can't run with it just yet but I am working on a solution. Hopefully within the next few days I'll have it working with wgpu.

majian4work commented 1 year ago

Thank you for your effort. It may be necessary to implement quantization for clients with less than 16GB of memory.

smallstepman commented 1 year ago

burn-wgpu currently doesn't use the full device memory available

Could you please explain what exactly is the current limitation and maybe you also if you know if there are plans to solve it in burn-wgpu or wgpu? Is there anything I could do to help?

majian4work commented 1 year ago

I try to some modification

    type GraphicsApi = AutoGraphicsApi;
    type Backend = WgpuBackend<GraphicsApi, Elem, i32>;
    let device = WgpuDevice::default();

found some problem: 1) K::repeat default implementation limit base dim must be 1 before repeat after quick fix 1. got another error: 2)

In Device::create_bind_group
    Buffer binding 0 range 524288000 exceeds `max_*_buffer_binding_size` limit 134217728
majian4work commented 1 year ago

By the way, I just load one layer transformer block because there wasn't enough memory available.

Gadersd commented 12 months ago

burn-wgpu has been updated to utilize the full GPU memory so it should now work as long as your GPU has enough memory.

smallstepman commented 11 months ago

@Ma-Jian1 how did you fix issue No.1 ("Can only repeat dimension with dim=1")?

majian4work commented 11 months ago

@Ma-Jian1 how did you fix issue No.1 ("Can only repeat dimension with dim=1")?

I attempted to modify the code directly, but I am unsure if it is correct. I just want to test whether or not it will run on my laptop, without caring about the result.

hlhr202 commented 4 months ago

@Ma-Jian1 how did you fix issue No.1 ("Can only repeat dimension with dim=1")?

I have the same problem. I m using stas/tiny-random-llama-2 This probably caused by RotaryEncodingConfig::init when repeat the freq_cis, the shape of freq_cis is [256, 2, 2]

the jit of burn has this repeat function

pub(crate) fn repeat<R: Runtime, E: JitElement, const D1: usize>(
    input: JitTensor<R, E, D1>,
    dim: usize,
    times: usize,
) -> JitTensor<R, E, D1> {
    let mut shape = input.shape.clone();
    if shape.dims[dim] != 1 {
        panic!("Can only repeat dimension with dim=1");
    }

@Gadersd could you suggest any fix here? thx