Closed hansejo closed 7 months ago
If you can, please post main.xxxx.log
file from the failing run.
If you can, please post
main.xxxx.log
file from the failing run.
@staviq Sorry I forgot. Here is a failed run:
If you can, please post
main.xxxx.log
file from the failing run.@staviq Sorry I forgot. Here is a failed run:
Thank you.
I can see anything obviously wrong, can you check if it's reproducible if you use the exact seed this happened with ( add -s 1697866302
to main
arguments ) ?
I can see anything obviously wrong, can you check if it's reproducible if you use the exact seed this happened with ( add
-s 1697866302
tomain
arguments ) ?
I can reproduce the same issue on many different seeds.
user
Explain what Linux is.
assistant
S. S. S. S. S. S. S. S. S. S. S. S. S. S.S.S.SS.SSS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.S
S.S.S.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.S.S.S.SS.SS.SS.SS.SS.SS.SS.SS.SS.SS.S.S.S.SS.SS.SS.SS.SS.
user
Explain what Linux is.
assistant
EDCV, I, I, I, I, I, I, I, I, I, I, I, I, Io, İ, İ, İ, İ, Í, İ, I, İ, İ, İ, IC, İ, İ [end of text]
user
Explain what Linux is.
assistant
toga;s, the 1930's of this generation; that year he was born, he has been born again. The first time he was was born, and so on... The 2048's of this generation, we will call him Mark Galski. He was born in 1956.
I am the worst at what I say about me, but i guess that is true about me too
"I was born in 1978." [end of text]
user
Explain how Linux can win in the desktop space Apple and Microsoft invest more money into their desktop systems.
assistant
############################################################################################################################################################################################################################################
system
You are a helpful assistant
user
Explain what linux is
assistant
############################################################################################################################################################################################################################
P.S. I am switching between two models mentioned in OP, so please let me know if you'd want me to stick with one. I am using an RX 580 8GB, and all the above work fine on CPU, and seem to work with via OpenCL. So I am narrowing this down to AMD's HIP
I am using an RX 580 8GB, and all the above work fine on CPU, and seem to work with via OpenCL. So I am narrowing this down to AMD's HIP
Yeah, I couldn't think of any reason for why would this happen and this was one of my guesses. I'm gonna label this as AMD specific.
I have the same problem with dual 7900 XTX:
....................................................................................................
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: offloading v cache to GPU
llama_kv_cache_init: offloading k cache to GPU
llama_kv_cache_init: VRAM kv self = 160.00 MB
llama_new_context_with_model: kv self size = 160.00 MB
llama_build_graph: non-view tensors processed: 1844/1844
llama_new_context_with_model: compute buffer total size = 151.63 MB
llama_new_context_with_model: VRAM scratch buffer: 145.00 MB
llama_new_context_with_model: total VRAM used: 39703.71 MB (model: 39398.70 MB, context: 305.00 MB)
system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
sampling:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
generate: n_ctx = 512, n_batch = 512, n_predict = 100, n_keep = 0
tell me a long story순#########################################################################################
I am having this issue with an AMD 6650M on gfx 10.3.0 in 2024-01-19 and ROCm 6.0 on Linux (POP OS 22.04).
Nearly all models produce extensive garbage output with either # or \n characters. I sample with PHI, Mistral, and Llama2-chat.
A fairly simple prompt may result in hundreds or more newline (\n) responses and might fail due to length.
This issue was closed because it has been inactive for 14 days since being marked as stale.
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
Running models with special tokens (e.g. ChatML) with GPU offload via HIPBLAS should produce output similar to running pure CPU
Current Behavior
Instead running with -ngl 35 and -ngl 32 causes the model to fill the context with hashes "#"
Environment and Context
$ lscpu
$ uname -a
Artix Linux (Arch-based):
$ pacman -Qi rocm-hip-sdk
Failure Information (for bugs)
Building with AMD HIPBlas, and enabling gpu offload (-ngl 32 and -ngl 35 tested) and using models with special tokenizers will cause the following
Current model's I've tested that this affects:
Failure Logs
Example running openhermes-2-mistral-7b.Q5_K_M.gguf, but happens with dolphin 2.1 as well:
output:
Example environment info: