Vulkan: possible NaN propagation on llama-3 8B (more testing required)

stduhpf commented 6 months ago

Sometimes when playing around with the new Llama-3 models with the Vulkan backend (on the server example) I ended up in a situation where the model would suddenly start generating complete gibberish. Once this happens, the server keeps generating garbage only, even when evaluating a new prompt that used to work before.

A server restart fixes the output. (until the next time it happens)

My setup: GPU: Vulkan device: AMD Radeon RX 5700 XT | uma: 0 | fp16: 1 | warp size: 64 (gfx 1010), OS: Windows 10 22H2

I suspect some operations are randomly generating NaNs, which stay even after clearing the KV cache. Reminds me a bit of https://github.com/ggerganov/llama.cpp/issues/5243, except it doesn't always happen.

I'll try to build a simple setup to consistently cause this issue.

Edit: I can't find a new prompt that causes that problem, and I can't really share the one I already have, if I try to remove the sensitive information, it doesn't cause the issue anymore... The one I have consiently crashes the Llama-3-8B base model (tested with Q3_K_S/Q3_K_M/Q4_K_S) , but not the instruct model. No issue with the same prompt on other backends.

Kartoffelsaft commented 5 months ago

I have encountered this both with both mistral-7b-instruct-v0.1.Q4_K_M.gguf running on an Intel UHD 620 (Manjaro Linux) and mistral-7b-instruct-v0.2-code-ft.Q4_K_M.gguf on a GTX 1070 (Arch Linux).

Some example output of the former:

This is a conversation between User and Llama, a friendly chatbot. Llama is helpful, kind, honest, good at writing, and never fails to answer any requests immediately and with precision.User: Hey Llama, do you think I should go to the gym today?

Llama: Of course! Going to the gym can help improve your physical health and overall well-being. It's also a great way to relieve stress and boost your mood. So if you have time and feel like it, definitely go to the gym today!

User: That sounds good! But I'm not sure what exercises to do.

Llama: No worries! There are many different types of exercises that can benefit your body in various ways. Some popular options include cardio (like running or cycling), strength training (using weights or resistance bands), and flexibility exercises (such as yoga or stretching). It's best to try a mix of these types of activities to get the most benefits from your workouts.

User: Okay, I think I'll start with some cardio and strength training today.

Llama: Great choice! Cardiovascular exercise will helpMASKironmentrieienenpositoryMillis‟ntilienenienenjesjerTransitionmansienenCHANTnageikirencynvieler cumironmentpanisysynersiiiernaMM‟nergy地animTransition Danslingsienenielpanic naturanimoganMENT pananim庄artersienenimpseventanimeedironmentzasventatonlgalanersribleyygy Hijpushilleryironmentanelsubscribe memorpositorymansbrisironmentminipageathonvent candMENTMASK Hawaii cumismusielnvpanic togetsisrieisyielsironmentCHANTikinagesy‟nergyTabIndex predictionssubscribeCHANTielruppeiel DansernaMASKnagenersmy cumventienennageielTransitionjesanimartersventiiipanawieedjerpanel paniiielalalersoganMASKienenanimielventpositoryrencyrible naturanellingsntilMENTminipagemansienenieler HawaiimansimpsebrisanimMMisyanimienenironmentzasikiMillisienenyyironment庄‟nvgy Dansielsruppe地 predictionsTabIndexathonCHANTanimernasubscribe cum cumienennersrie

I compiled with vulkan (no docker, if that happens to matter; doubt it though) and passing -ngl 9999. I however don't need to fully restart the server to fix it, restarting the prompt works just fine. It however always does eventually generate pure gibberish though.

github-actions[bot] commented 4 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

ggerganov / llama.cpp

Vulkan: possible NaN propagation on llama-3 8B (more testing required) #6874