Bug: GGML_ASSERT: ggml.c:12793: ne2 == ne02 zsh: abort ./finetune --model-base --train-data ./Llama3-8B-Chinese-Chat-fintune/111.tx

During the process of fine-tuning LLama3 using LLama.cpp on my Mac, I encountered this error. I'm a beginner and don't know what caused this issue. I hope an expert can help me.

The model used is: Llama3-8B-Chinese-Chat-q8_0-v2_1.gguf The dataset is in the following format: From fairest creatures we desire increase, That thereby beauty's rose might never die, But as the riper should by time decease, His tender heir might bear his memory: But thou contracted to thine own bright eyes, Feed'st thy light's flame with self-substantial fuel, Making a famine where abundance lies, Thy self thy foe, to thy sweet self too cruel: Thou that art now the world's fresh ornament, And only herald to the gaudy spring, Within thine own bud buriest thy content, And tender churl mak'st waste in niggarding: Pity the world, or else this glutton be, To eat the world's due, by the grave and thee.

When forty winters shall besiege thy brow, And dig deep trenches in thy beauty's field, Thy youth's proud livery so gazed on now, Will be a tattered weed of small worth held:
Then being asked, where all thy beauty lies, Where all the treasure of thy lusty days; To say within thine own deep sunken eyes, Were an all-eating shame, and thriftless praise. How much more praise deserved thy beauty's use, If thou couldst answer 'This fair child of mine Shall sum my count, and make my old excuse' Proving his beauty by succession thine. This were to be new made when thou art old, And see thy blood warm when thou feel'st it cold.

Fine-tuning command is: (llama3_env) myhome@MacBook-Pro llama.cpp % ./finetune \
--model-base ./Llama3-8B-Chinese-Chat-GGUF-8bit/Llama3-8B-Chinese-Chat-q8_0-v2_1.gguf \ --train-data ./Llama3-8B-Chinese-Chat-fintune/111.txt \ --threads 1 \ --sample-start "~~" \ --ctx 128~~

Name and Version

version: 3124 (fd5ea0f8) built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.2.0

What operating system are you seeing the problem on?

Mac

Relevant log output

(llama3_env) myhome@MacBook-Pro llama.cpp % ./finetune \ --model-base ./Llama3-8B-Chinese-Chat-GGUF-8bit/Llama3-8B-Chinese-Chat-q8_0-v2_1.gguf \ --train-data ./Llama3-8B-Chinese-Chat-fintune/111.txt \ --threads 1 \ --sample-start "<s>" \ --ctx 128 main: seed: 1718105709 main: model base = './Llama3-8B-Chinese-Chat-GGUF-8bit/Llama3-8B-Chinese-Chat-q8_0-v2_1.gguf' llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from ./Llama3-8B-Chinese-Chat-GGUF-8bit/Llama3-8B-Chinese-Chat-q8_0-v2_1.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = outputs llama_model_loader: - kv 2: llama.vocab_size u32 = 128256 llama_model_loader: - kv 3: llama.context_length u32 = 8192 llama_model_loader: - kv 4: llama.embedding_length u32 = 4096 llama_model_loader: - kv 5: llama.block_count u32 = 32 llama_model_loader: - kv 6: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 7: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 8: llama.attention.head_count u32 = 32 llama_model_loader: - kv 9: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 10: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 11: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 12: general.file_type u32 = 7 llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 14: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 15: tokenizer.ggml.scores arr[f32,128256] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 128009 llama_model_loader: - kv 21: tokenizer.chat_template str = {% set system_message = 'You are a he... llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q8_0: 226 tensors llm_load_vocab: missing pre-tokenizer type, using: 'default' llm_load_vocab: llm_load_vocab: ************************************ llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED! llm_load_vocab: CONSIDER REGENERATING THE MODEL llm_load_vocab: ************************************ llm_load_vocab: llm_load_vocab: special tokens cache size = 256 llm_load_vocab: token to piece cache size = 0.8000 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: n_ctx_train = 8192 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 8192 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 8B llm_load_print_meta: model ftype = Q8_0 llm_load_print_meta: model params = 8.03 B llm_load_print_meta: model size = 7.95 GiB (8.50 BPW) llm_load_print_meta: general.name = outputs llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128009 '<|eot_id|>' llm_load_print_meta: PAD token = 128009 '<|eot_id|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_tensors: ggml ctx size = 0.15 MiB llm_load_tensors: offloading 0 repeating layers to GPU llm_load_tensors: offloaded 0/33 layers to GPU llm_load_tensors: CPU buffer size = 8137.64 MiB ......................................................................................... llama_new_context_with_model: n_ctx = 512 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CPU KV buffer size = 64.00 MiB llama_new_context_with_model: KV self size = 64.00 MiB, K (f16): 32.00 MiB, V (f16): 32.00 MiB llama_new_context_with_model: CPU output buffer size = 0.49 MiB llama_new_context_with_model: CPU compute buffer size = 258.50 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 1 main: init model print_params: n_vocab : 128256 print_params: n_ctx : 128 print_params: n_embd : 4096 print_params: n_ff : 14336 print_params: n_head : 32 print_params: n_head_kv : 8 print_params: n_layer : 32 print_params: norm_rms_eps : 0.000010 print_params: rope_freq_base : 500000.000000 print_params: rope_freq_scale : 1.000000 print_lora_params: n_rank_attention_norm : 1 print_lora_params: n_rank_wq : 4 print_lora_params: n_rank_wk : 4 print_lora_params: n_rank_wv : 4 print_lora_params: n_rank_wo : 4 print_lora_params: n_rank_ffn_norm : 1 print_lora_params: n_rank_ffn_gate : 4 print_lora_params: n_rank_ffn_down : 4 print_lora_params: n_rank_ffn_up : 4 print_lora_params: n_rank_tok_embeddings : 4 print_lora_params: n_rank_norm : 1 print_lora_params: n_rank_output : 4 main: total train_iterations 0 main: seen train_samples 0 main: seen train_tokens 0 main: completed train_epochs 0 main: lora_size = 94956320 bytes (90.6 MB) main: opt_size = 141731824 bytes (135.2 MB) main: opt iter 0 main: input_size = 525340704 bytes (501.0 MB) main: compute_size = 17265869408 bytes (16466.0 MB) main: evaluation order = RIGHT_TO_LEFT main: tokenize training data from ./Llama3-8B-Chinese-Chat-fintune/111.txt main: sample-start: <s> main: include-sample-start: false tokenize_file: warning: found 5 samples (max length 171) that exceed context length of 128. samples will be cut off. tokenize_file: total number of samples: 5 main: number of training tokens: 812 main: number of unique tokens: 388 main: train data seems to have changed. restarting shuffled epoch. main: begin training main: work_size = 513060 bytes (0.5 MB) train_opt_callback: iter= 0 sample=1/5 sched=0.000000 loss=0.000000 |-> train_opt_callback: reshuffle samples. completed epochs: 1 GGML_ASSERT: ggml.c:12793: ne2 == ne02 zsh: abort ./finetune --model-base --train-data ./Llama3-8B-Chinese-Chat-fintune/111.tx

ggerganov / llama.cpp

Bug: GGML_ASSERT: ggml.c:12793: ne2 == ne02 zsh: abort ./finetune --model-base --train-data ./Llama3-8B-Chinese-Chat-fintune/111.tx #7877

Name and Version

What operating system are you seeing the problem on?

Relevant log output