Werid behaviour on Sampling Text.

sonng commented 1 year ago

Hi,

Context;

Mac M1 Pro 32GB
Model: llama2-7b

I followed through from Step 1 and Step 2. Everything seemed to work fine apart from Sampling Text where I get a bunch of random completions.

This is the output for cargo run --bin convert ../Llama-2-7b/params/ llama2-7b

../Llama-2-7b/params//layer31/ffn_norm/eps.npy
../Llama-2-7b/params//n_ctx.npy
../Llama-2-7b/params//theta.npy
../Llama-2-7b/params//multiple_of.npy
../Llama-2-7b/params//tok_embeddings/weight.npy
../Llama-2-7b/params//norm/weight.npy
../Llama-2-7b/params//norm/eps.npy
../Llama-2-7b/params//output/weight.npy
Successfully converted ../Llama-2-7b/params/ to llama2-7b

This is the tail of the output for cargo run --bin test ../Llama-2-7b/tokenizer.model ../Llama-2-7b/params/

2
0
 years
 old
 and
 I
 am
 a
 student
 at
Prompt: Hello, I am
Sample: 20 years old and I am a student at
Combined: Hello, I am 20 years old and I am a student at

However, when it comes to sample (cargo run --release --bin sample llama2-7b ../Llama-2-7b/tokenizer.model "Hello, I am " 10 gpu), it outputs this instead.

     Running `target/release/sample llama2-7b ../Llama-2-7b/tokenizer.model 'Hello, I am ' 10 gpu`
Llama config: LlamaConfig { n_vocab: 32000, n_ctx: 2048, n_state: 4096, multiple_of: 256, ffn_dim_multiplier: None, n_head: 32, n_kv_head: 32, n_layer: 32, norm_eps: 9.999999747378752e-6 }
2
<0x0A>
 Unterscheidung
<s>
<0x0A>
MW
<0x0A>
MW
 Cal
</s>
Prompt: Hello, I am
Output: Hello, I am 2<0x0A> Unterscheidung<s><0x0A>MW<0x0A>MW Cal</s>

I'm not sure where it went wrong, since the previous steps ran with no trouble.

Gadersd commented 1 year ago

The test binary uses the cpu. There might be a issue with running it on the gpu. Try the sampling binary on the cpu instead and see if the result is different.

sonng commented 1 year ago

Hmm, I tried with cpu and got a very similar result, the great thing is that it's consistent.

 Running `target/release/sample llama2-7b ../Llama-2-7b/tokenizer.model 'Hello, I am ' 10 cpu`
Llama config: LlamaConfig { n_vocab: 32000, n_ctx: 2048, n_state: 4096, multiple_of: 256, ffn_dim_multiplier: None, n_head: 32, n_kv_head: 32, n_layer: 32, norm_eps: 9.999999747378752e-6 }
2
<0x0A>
 Unterscheidung
<s>
<0x0A>
MW
<0x0A>
MW
 Cal
</s>
Prompt: Hello, I am
Output: Hello, I am 2<0x0A> Unterscheidung<s><0x0A>MW<0x0A>MW Cal</s>

Gadersd commented 1 year ago

I just tested it on a linux machine and got the desired output. There must be an issue with tch torch on macs.

Tommy-ASD commented 9 months ago

Just tested on an Ubuntu desktop using CPU and got the exact same result as sonng

Gadersd / llama2-burn

Werid behaviour on Sampling Text. #6