Closed r78v10a07 closed 7 months ago
I've no experience with fine tuning, just a few bits: 1) The issue is not a segfault (illegal memory access), it's an assert (a planned hard-abort for unsupported condition) 2) The problem comes from a ADD operation with an unsupported input, in this case a Q6K tensor as src0 The ADD operation supports only fp16 and fp32 as source0 Usually there is a internal conversion step that ensures such operations are always presented with the tensor they support.
Unrelated, I wonder why are some attn_v weights Q5_K and some Q6_K ?
I run into this assert as well trying to offload to the GPU when fine-tuning (on a T4 on Google Colab). I'm guessing this is supposed to work - the -ngl flag exists after all.
I am also facing same issue on Nvidia A10G card on AWS G5, @r78v10a07 have you found any work around for it??
This issue was closed because it has been inactive for 14 days since being marked as stale.
Expected Behavior
I'm trying to fine tune a model on a AWS g5 machine with Nvidia A10 card but I'm getting seg fault.
Current Behavior
Same files are processed in macos M1 and it works fine
Environment and Context
Linux ip-10-8-10-56 6.2.0-1011-aws #11~22.04.1-Ubuntu SMP Mon Aug 21 16:27:59 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Failure Information (for bugs)
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
Model: https://huggingface.co/TheBloke/sqlcoder-7B-GGUF/blob/main/sqlcoder-7b.Q5_K_M.gguf
Train data: schema.sql.zip
Command line:
Failure Logs
Please include any relevant log snippets or files. If it works under one configuration but not under another, please provide logs for both configurations and their corresponding outputs so it is easy to see where behavior changes.
Also, please try to avoid using screenshots if at all possible. Instead, copy/paste the console output and use Github's markdown to cleanly format your logs for easy readability.
Example environment info:
Log