Open sonng opened 1 year ago
The test binary uses the cpu. There might be a issue with running it on the gpu. Try the sampling binary on the cpu instead and see if the result is different.
Hmm, I tried with cpu and got a very similar result, the great thing is that it's consistent.
Running `target/release/sample llama2-7b ../Llama-2-7b/tokenizer.model 'Hello, I am ' 10 cpu`
Llama config: LlamaConfig { n_vocab: 32000, n_ctx: 2048, n_state: 4096, multiple_of: 256, ffn_dim_multiplier: None, n_head: 32, n_kv_head: 32, n_layer: 32, norm_eps: 9.999999747378752e-6 }
2
<0x0A>
Unterscheidung
<s>
<0x0A>
MW
<0x0A>
MW
Cal
</s>
Prompt: Hello, I am
Output: Hello, I am 2<0x0A> Unterscheidung<s><0x0A>MW<0x0A>MW Cal</s>
I just tested it on a linux machine and got the desired output. There must be an issue with tch torch on macs.
Just tested on an Ubuntu desktop using CPU and got the exact same result as sonng
Hi,
Context;
I followed through from Step 1 and Step 2. Everything seemed to work fine apart from Sampling Text where I get a bunch of random completions.
This is the output for
cargo run --bin convert ../Llama-2-7b/params/ llama2-7b
This is the tail of the output for
cargo run --bin test ../Llama-2-7b/tokenizer.model ../Llama-2-7b/params/
However, when it comes to sample (
cargo run --release --bin sample llama2-7b ../Llama-2-7b/tokenizer.model "Hello, I am " 10 gpu
), it outputs this instead.I'm not sure where it went wrong, since the previous steps ran with no trouble.