abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
7.24k stars 863 forks source link

Llama 3 8b Instruct giving garbage response #1577

Closed gformcreation closed 1 week ago

gformcreation commented 2 weeks ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

Expected Behavior

The model should have provided a normal response.

Current Behavior

The model is providing just random garbage values.

Environment and Context

python --version Python 3.11.9

nvidia-smi Sat Jul 6 12:18:14 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 555.85 Driver Version: 555.85 CUDA Version: 12.5 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 Tesla T4 TCC | 00000001:00:00.0 Off | Off | | N/A 48C P0 26W / 70W | 11436MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 10128 C ...rograms\Python\Python311\python.exe 11416MiB | +-----------------------------------------------------------------------------------------+

pip show llama-cpp-python Name: llama_cpp_python Version: 0.2.81 Summary: Python bindings for the llama.cpp library Home-page: Author: Author-email: Andrei Betlen <abetlen@gmail.com> License: MIT Location: C:\Users\user123\AppData\Local\Programs\Python\Python311\Lib\site-packages Requires: diskcache, jinja2, numpy, typing-extensions Required-by: FlashRank


llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from C:\Users\user123\Downloads\Meta-Llama-3-8B-Instruct.Q5_K_M.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = models llama_model_loader: - kv 2: llama.block_count u32 = 32 llama_model_loader: - kv 3: llama.context_length u32 = 8192 llama_model_loader: - kv 4: llama.embedding_length u32 = 4096 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 6: llama.attention.head_count u32 = 32 llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: general.file_type u32 = 17 llama_model_loader: - kv 11: llama.vocab_size u32 = 128256 llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128001 llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ... llama_model_loader: - kv 21: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q5_K: 193 tensors llama_model_loader: - type q6_K: 33 tensors llm_load_vocab: special tokens cache size = 256 llm_load_vocab: token to piece cache size = 0.8000 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: n_ctx_train = 8192 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 8192 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 8B llm_load_print_meta: model ftype = Q5_K - Medium llm_load_print_meta: model params = 8.03 B llm_load_print_meta: model size = 5.33 GiB (5.70 BPW) llm_load_print_meta: general.name = models llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128001 '<|end_of_text|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_print_meta: max token length = 256 llm_load_tensors: ggml ctx size = 0.27 MiB llm_load_tensors: offloading 32 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 33/33 layers to GPU llm_load_tensors: CPU buffer size = 344.44 MiB llm_load_tensors: CUDA0 buffer size = 5115.49 MiB ......................................................................................... llama_new_context_with_model: n_batch is less than GGML_KQ_MASK_PAD - increasing to 32 llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: n_batch = 32 llama_new_context_with_model: n_ubatch = 32 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CUDA0 KV buffer size = 256.00 MiB llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB llama_new_context_with_model: CUDA_Host output buffer size = 0.49 MiB llama_new_context_with_model: CUDA0 compute buffer size = 16.16 MiB llama_new_context_with_model: CUDA_Host compute buffer size = 0.75 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 2 AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 0 | Model metadata: {'general.name': 'models', 'general.architecture': 'llama', 'llama.block_count': '32', 'llama.context_length': '8192', 'tokenizer.ggml.eos_token_id': '128001', 'general.file_type': '17', 'llama.attention.head_count_kv': '8', 'llama.embedding_length': '4096', 'llama.feed_forward_length': '14336', 'llama.attention.head_count': '32', 'llama.rope.freq_base': '500000.000000', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.vocab_size': '128256', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.model': 'gpt2', 'tokenizer.ggml.pre': 'llama-bpe', 'general.quantization_version': '2', 'tokenizer.ggml.bos_token_id': '128000', 'tokenizer.chat_template': "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}"} Available chat formats from metadata: chat_template.default Guessed chat format: llama-3


Code :-

from langchain_core.callbacks import CallbackManager, StreamingStdOutCallbackHandler from langchain_community.llms import LlamaCpp callback_manager = CallbackManager([StreamingStdOutCallbackHandler()]) llm = LlamaCpp( model_path="C:\Users\user123\Downloads\Meta-Llama-3-8B-Instruct.Q5_K_M.gguf", callback_manager = callback_manager, n_ctx=2048, max_tokens = 4096, n_gpu_layers=-1, verbose=True, top_p = 0.95, min_p = 0.05, frequency_penalty = 0.0, presence_penalty = 0.0, repeat_penalty = 1.1, top_k = 40, f16_kv=True, stop = ["\\"], temperature = 0, seed = 2000, ) prompt_template = """ <|begin_of_text|><|start_header_id|>system<|end_header_id|> Use the following pieces of information to answer the user's question. If you don't know the answer, just say that you don't know, don't try to make up an answer. Context: {context} Question: {question} Answer the question and provide additional helpful information, based on the pieces of information, if applicable. Be succinct. End the response with tag which is must. Responses should be properly formatted and structured so that it can be easily read. Helpful and the most accurate response from the context and relevant to question :- <|eot_id|> <|start_header_id|>assistant<|end_header_id|> """ prompt = PromptTemplate( template=prompt_template, input_variables=["context", "question"] )


> Entering new StuffDocumentsChain chain...

> Entering new LLMChain chain... Prompt after formatting:  <|begin_of_text|><|start_header_id|>system<|end_header_id|>

Use the following pieces of information to answer the user's question. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: Instructions for Use Attachment B Reference Database 2 for Healthy Eyes Study–CIRRUS 6000 CIRRUS HD-OCT B.8 ONH and RNFL Images 2660021178037 Rev C 465 / 488B.7.2.1 Ganglion Cell Data Type Variablen Mean (SD) 95% CI Min Max Ganglion Cell Thickness Average Thickness (µm) 826 80.46 (6.36) [80.02, 80.89] 61.00 100.00 Temporal Superior (µm) 826 79.60 (6.25) [79.17, 80.03] 60.00 100.00 Superior (µm) 826 80.98 (6.73) [80.52, 81.44] 59.00 102.00 Nasal Superior (µm) 826 82.16 (7.12) [81.68, 82.65] 58.00 100.00 Nasal Inferior (µm) 826 80.49 (7.13) [80.00, 80.97] 59.00 99.00 Inferior (µm) 826 78.84 (6.83) [78.37, 79.30] 58.00 100.00 Temporal Inferior (µm) 826 80.71 (6.36) [80.28, 81.15] 61.00 103.00 Table 93: Summary of variables, Macular Cube 200x200 B.8ONH and RNFL Images NOTE! These parameters are adjusted for age and optic disc area. Measurement Parameters Analyses RNFL Thickness • RNFL Summary Parameters: –Average RNFL Thickness –Temporal Average RNFL Thickness –Superior Average RNFL Thickness –Nasal Average RNFL Thickness –Inferior Average RNFL Thickness –Inter-eye symmetry (RNFL) –Clock hour parameters –TSNIT Profiles –Neuro-Retinal Rim –RNFL Thickness –RNFL Deviation Map• ONH/RNFL OU • Panomap • Single Eye Summary • Wellness Report ONH Features • Rim Area (mm2) • Average Cup-to-Disc Ratio • Vertical Cup-to-Disc Ratio • Cup Volume (mm3)• ONH/RNFL OU • Panomap • Single Eye Summary Table 94: RNFL thickness and ONH Parameters and Analyses with RDB Data Also see 2Analyze ONH/RNFL OU [ } 276]

Instructions for Use 8 Analyzing Exam Data and Creating Reports CIRRUS HD-OCT 8.2 Posterior Segment Scan Analysis 2660021178037 Rev C 279 / 488# Symbol Explanation Rotation Tool Changes the angle of the ONH spoke. 8.2.3.3.3 Analyzing ONH and RNFL The ONH and RNFL OU Analysis uses two kinds of thickness measurements: •RNFL grid When you move the RNFL grid, the thickness maps, deviation maps, and ONH calculations update automatically. •Super-pixels A total of 50 x 50 (2500) super-pixels are analyzed (optic disc excluded). RNFL Thickness Maps report thickness by showing blue or green for thinner areas and yellow or red for thicker areas (the optic disc appears solid blue). Deviation Maps compare the reference range for healthy eyes for the patient's age and show yellow and red areas that are thinner than 95% and 99% of the (age and optic disc size-adjusted) population, respectively. Interpretation Considerations For some patients, deviation maps can show decrease due to reasons other than pathology. such as: •The patient has strongly myopic or hyperopic eyes •The patient has split-bundle anatomy •The patient has a tilted RNFL bundle pattern If the patient's temporal RNFL that is very thin or absent, the maps might show thickened RNFL. CIRRUS™ HD-OCT compares the RNFL thickness and symmetry of the scan(s) with the reference range for healthy eyes for the patient's age.

Attachment B Reference Database 2 for Healthy Eyes Study–CIRRUS 6000 Instructions for Use CIRRUS HD-OCT B.8 ONH and RNFL Images 466 / 488 2660021178037 Rev CB.8.1 ONH Parameters The data originally collected for the database was analyzed to create the reference ranges for the following ONH parameters: •Rim Area (mm2) •Average Cup to Disc Ratio (square root of cup area over disc area) •Vertical Cup to Disc Ratio (cup height over disc height) •Cup Volume (mm3) B.8.2 RNFL Parameters This study determined the reference range for healthy eyes for the following the retinal nerve fiber layer (RNFL) parameters in healthy subjects ages 18 to 88: •Average RNFL Thickness •Temporal Average RNFL Thickness •Superior Average RNFL Thickness •Nasal Average RNFL Thickness •Inferior Average RNFL Thickness •Inter-eye symmetry (RNFL) •Clock hour parameters •TSNIT Profiles •Neuro-Retinal Rim •RNFL Thickness •RNFL Deviation Map B.8.3 ONH and RNFL Data Type Variablen Mean (SD) 95% CI Min Max ONH Rim Area (mm2) 854 1.30 (0.23) [1.28, 1.31] 0.74 2.21 Average Cup/Disc Ratio 854 0.46 (0.17) [0.45, 0.47] 0.06 0.74 Vertical C/D Ratio 854 0.44 (0.16) [0.43, 0.45] 0.05 0.75 Cup Volume (mm3) 854 0.13 (0.12) [0.12, 0.14] 0.00 0.70 RNFL Thickness Average RNFL Thickness (µm) 854 93.19 (9.28) [92.57, 93.81] 69.00 126.00 Temporal (µm) 854 64.04 (11.53) [63.26, 64.81] 40.00 120.00 Superior (µm) 854 115.05 (15.92) [113.98, 116.12] 72.00 162.00 Nasal (µm) 854 72.35 (11.54) [71.58, 73.13] 41.00 115.00 Inferior (µm) 854 121.37 (16.08) [120.29, 122.45] 72.00 188.00 Clock hour 1 (µm) 854 105.56 (23.11) [104.01, 107.11] 51.00 185.00 Clock hour 2 (µm) 854 88.60 (18.10) [87.39, 89.82] 43.00 160.00 Clock hour 3 (µm) 854 60.73 (10.53) [60.02, 61.44] 38.00 103.00 Clock hour 4 (µm) 854 67.78 (13.79) [66.85, 68.70] 37.00 129.00 Question: What is RNFL Clock hours in ONH and RNFL OU Analysis?

Answer the question and provide additional helpful information, based on the pieces of information, if applicable. Be succinct. End the response with tag which is must.

Responses should be properly formatted and structured so that it can be easily read. Helpful and the most accurate response from the context and relevant to question :- <|eot_id|> <|start_header_id|>assistant<|end_header_id|>  Llama.generate: prefix-match hit • 2   }

  }  B }  }  }  }  }  }

}  F   }  }  }  }  }  }  }  }  }  }  }  [C

 }

  }  } •   }   }  } 2.  }  }  }  }  }  }  }  }  }  }  }  }  }   }  }  }  }  }  }  }  }  }  }  }  }  }  }  } } •     }        }   }  }  }

  }  }  } }

  }  }   } .  }  .  •  } &  }  } .  } • • • }  } •   } } •   }  } •

llama_print_timings: load time = 124.22 ms llama_print_timings: sample time = 206.91 ms / 224 runs ( 0.92 ms per token, 1082.61 tokens per second) llama_print_timings: prompt eval time = 19358.81 ms / 1776 tokens ( 10.90 ms per token, 91.74 tokens per second) llama_print_timings: eval time = 7789.25 ms / 224 runs ( 34.77 ms per token, 28.76 tokens per second) llama_print_timings: total time = 27901.02 ms / 2000 tokens

> Finished chain.

> Finished chain. {'query': 'What is RNFL Clock hours in ONH and RNFL OU Analysis?', 'result': '•\n2\n\xa0 }\n\n\xa0 }\xa0 B\n }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\n\n\n\n\n }\xa0 F\n\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 [C\n\n\xa0}\n\n\xa0 }\xa0 }\xa0•\n\xa0 }\n\xa0 }\xa0 }\xa02.\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0 }\xa0}\xa0•\n\xa0\xa0\xa0 }\xa0\xa0\xa0\xa0\xa0\xa0\xa0 }\xa0\xa0 }\xa0 }\xa0 }\n\n\xa0 }\xa0 }\xa0 }\xa0}\n\n\xa0 }\xa0 }\xa0\xa0 }\xa0.\xa0 }\xa0\xa0.\n\xa0•\xa0 }\xa0&\xa0 }\xa0 }\xa0.\xa0 }\xa0•\n•\n•\n }\xa0 }\xa0•\n\xa0 }\xa0}\xa0•\n\xa0 }\xa0 }\xa0•\n\n', 'source_documents': [Document(page_content='Instructions for Use Attachment B Reference Database 2 for Healthy Eyes Study–CIRRUS 6000\nCIRRUS HD-OCT B.8 ONH and RNFL Images\n2660021178037 Rev C 465 / 488B.7.2.1 Ganglion Cell Data\nType\nVariablen Mean (SD) 95% CI Min Max\nGanglion Cell Thickness\nAverage Thickness (µm) 826 80.46 (6.36) [80.02, 80.89] 61.00 100.00\nTemporal Superior (µm) 826 79.60 (6.25) [79.17, 80.03] 60.00 100.00\nSuperior (µm) 826 80.98 (6.73) [80.52, 81.44] 59.00 102.00\nNasal Superior (µm) 826 82.16 (7.12) [81.68, 82.65] 58.00 100.00\nNasal Inferior (µm) 826 80.49 (7.13) [80.00, 80.97] 59.00 99.00\nInferior (µm) 826 78.84 (6.83) [78.37, 79.30] 58.00 100.00\nTemporal Inferior (µm) 826 80.71 (6.36) [80.28, 81.15] 61.00 103.00\nTable\xa093: Summary of variables, Macular Cube 200x200\nB.8ONH and RNFL Images\nNOTE!\xa0These parameters are adjusted for age and optic disc\narea.\nMeasurement Parameters Analyses\nRNFL Thickness • RNFL Summary Parameters:\n–Average RNFL Thickness\n–Temporal Average RNFL\nThickness\n–Superior Average RNFL\nThickness\n–Nasal Average RNFL\nThickness\n–Inferior Average RNFL\nThickness\n–Inter-eye symmetry (RNFL)\n–Clock hour parameters\n–TSNIT Profiles\n–Neuro-Retinal Rim\n–RNFL Thickness\n–RNFL Deviation Map• ONH/RNFL OU\n• Panomap\n• Single Eye Summary\n• Wellness Report\nONH Features • Rim Area (mm2)\n• Average Cup-to-Disc Ratio\n• Vertical Cup-to-Disc Ratio\n• Cup Volume (mm3)• ONH/RNFL OU\n• Panomap\n• Single Eye Summary\nTable\xa094: RNFL thickness and ONH Parameters and Analyses with RDB Data\nAlso see\n2Analyze ONH/RNFL OU [ }\xa0276]', metadata={'source': 'C:\Users\user123\Downloads\content\2660021178037_c_cirrus_6000_11.7_ifu_enus.pdf', 'page': 464, '_id': '077141c62da74d258f7a468285030b8f', '_collection_name': 'document_embeddings', 'relevance_score': 0.9270478}), Document(page_content="Instructions for Use 8 Analyzing Exam Data and Creating Reports\nCIRRUS HD-OCT 8.2 Posterior Segment Scan Analysis\n2660021178037 Rev C 279 / 488# Symbol Explanation\nRotation Tool\n Changes the angle of the ONH spoke.\n8.2.3.3.3 Analyzing ONH and RNFL\nThe ONH and RNFL OU Analysis uses two kinds of thickness\nmeasurements:\n•RNFL grid\nWhen you move the RNFL grid, the thickness maps, deviation\nmaps, and ONH calculations update automatically.\n•Super-pixels\nA total of 50 x 50 (2500) super-pixels are analyzed (optic disc\nexcluded).\nRNFL Thickness Maps report thickness by showing blue or green\nfor thinner areas and yellow or red for thicker areas (the optic disc\nappears solid blue).\nDeviation Maps compare the reference range for healthy eyes for\nthe patient's age and show yellow and red areas that are thinner\nthan 95% and 99% of the (age and optic disc size-adjusted)\npopulation, respectively.\nInterpretation Considerations\nFor some patients, deviation maps can show decrease due to\nreasons other than pathology. such as:\n•The patient has strongly myopic or hyperopic eyes\n•The patient has split-bundle anatomy\n•The patient has a tilted RNFL bundle pattern\nIf the patient's temporal RNFL that is very thin or absent, the maps\nmight show thickened RNFL.\nCIRRUS™ HD-OCT compares the RNFL thickness and symmetry of\nthe scan(s) with the reference range for healthy eyes for the\npatient's age.", metadata={'source': 'C:\Users\user123\Downloads\content\2660021178037_c_cirrus_6000_11.7_ifu_enus.pdf', 'page': 278, '_id': '253d8f9d11694dec87f1b6374ee4baa8', '_collection_name': 'document_embeddings', 'relevance_score': 0.82976836}), Document(page_content='Attachment B Reference Database 2 for Healthy Eyes Study–CIRRUS 6000 Instructions for Use\nCIRRUS HD-OCT B.8 ONH and RNFL Images\n466 / 488 2660021178037 Rev CB.8.1 ONH Parameters\nThe data originally collected for the database was analyzed to\ncreate the reference ranges for the following ONH parameters:\n•Rim Area (mm2)\n•Average Cup to Disc Ratio (square root of cup area over disc\narea)\n•Vertical Cup to Disc Ratio (cup height over disc height)\n•Cup Volume (mm3)\nB.8.2 RNFL Parameters\nThis study determined the reference range for healthy eyes for the\nfollowing the retinal nerve fiber layer (RNFL) parameters in healthy\nsubjects ages 18 to 88:\n•Average RNFL Thickness\n•Temporal Average RNFL Thickness\n•Superior Average RNFL Thickness\n•Nasal Average RNFL Thickness\n•Inferior Average RNFL Thickness\n•Inter-eye symmetry (RNFL)\n•Clock hour parameters\n•TSNIT Profiles\n•Neuro-Retinal Rim\n•RNFL Thickness\n•RNFL Deviation Map\nB.8.3 ONH and RNFL Data\nType\nVariablen Mean (SD) 95% CI Min Max\nONH\nRim Area (mm2) 854 1.30 (0.23) [1.28, 1.31] 0.74 2.21\nAverage Cup/Disc Ratio 854 0.46 (0.17) [0.45, 0.47] 0.06 0.74\nVertical C/D Ratio 854 0.44 (0.16) [0.43, 0.45] 0.05 0.75\nCup Volume (mm3) 854 0.13 (0.12) [0.12, 0.14] 0.00 0.70\nRNFL Thickness\nAverage RNFL Thickness (µm) 854 93.19 (9.28) [92.57, 93.81] 69.00 126.00\nTemporal (µm) 854 64.04 (11.53) [63.26, 64.81] 40.00 120.00\nSuperior (µm) 854 115.05 (15.92) [113.98, 116.12] 72.00 162.00\nNasal (µm) 854 72.35 (11.54) [71.58, 73.13] 41.00 115.00\nInferior (µm) 854 121.37 (16.08) [120.29, 122.45] 72.00 188.00\nClock hour 1 (µm) 854 105.56 (23.11) [104.01, 107.11] 51.00 185.00\nClock hour 2 (µm) 854 88.60 (18.10) [87.39, 89.82] 43.00 160.00\nClock hour 3 (µm) 854 60.73 (10.53) [60.02, 61.44] 38.00 103.00\nClock hour 4 (µm) 854 67.78 (13.79) [66.85, 68.70] 37.00 129.00', metadata={'source': 'C:\Users\user123\Downloads\content\xyz.pdf', 'page': 465, '_id': '0c53f682055f48a7a373a0f5543ba7df', '_collection_name': 'document_embeddings', 'relevance_score': 0.75868905})]}

Quantized model :- https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/tree/main https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct.Q5_K_M.gguf

i486 commented 1 week ago

I obtained following using your values. Try removing <|begin_of_text|> as it should be handled internally by llama. Additionally, f16_kv=True appears to be deprecated since version 0.2.24

RNFL Clock Hours in ONH and RNFL OU Analysis:

In the CIRRUS HD-OCT system, RNFL Clock hours refer to a specific analysis of the retinal nerve fiber layer (RNFL) thickness measurements. This analysis is part of the ONH and RNFL OU Analysis.

The RNFL Clock hours analysis provides information on the thickness of the RNFL at different clock hour positions around the optic disc. The clock hour positions are:

  • Clock hour 1: Superior nasal
  • Clock hour 2: Superior temporal
  • Clock hour 3: Inferior temporal
  • Clock hour 4: Inferior nasal

This analysis helps to assess the symmetry and thickness of the RNFL, which can be useful in diagnosing and monitoring various retinal diseases.

Additional Information:

The RNFL Clock hours analysis is based on a reference database of healthy eyes, which provides a range of normal values for each clock hour position. The system compares the patient's measurements to these reference values to provide an assessment of their RNFL thickness and symmetry.

Tag: #CIRRUSHD-OCT

i486 commented 1 week ago

Something interesting happened. The model I used to generate the response in my previous post was Meta-Llama-3-8B-Instruct-old-GGUF (Q4_K_M), which has an issue with BPE tokens and displays the following warning:

llm_load_vocab: missing pre-tokenizer type, using: 'default'
llm_load_vocab:
llm_load_vocab: ************************************
llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED!
llm_load_vocab: CONSIDER REGENERATING THE MODEL
llm_load_vocab: ************************************

I replaced that model with the updated version, Meta-Llama-3-8B-Instruct-GGUF (Q4_K_M), which has the BPE tokenizers fixed. And here's the response generated with the exact same code:

RNFL Clock Hours are a type of measurement parameter in ONH and RNFL OU Analysis, which refers to the thickness measurements taken at specific clock hour positions around the optic disc.

Additional helpful information:
* The CIRRUS HD-OCT system uses 12 clock hours (0-11) to measure RNFL thickness.
* Each clock hour position corresponds to a specific sector of the optic disc.
* The RNFL Clock Hours measurement is used to assess the symmetry and thickness of the retinal nerve fiber layer.

Tag: #ONHandRNFLAnalysis

I'm not exactly an ophthalmologist, but it seems like the response quality has degraded, right?

gformcreation commented 1 week ago

Hi @i486, While i was checking this issue, i had build llama.cpp server with the exact same model and didn't notice any of these issue's there.

gformcreation commented 1 week ago

I'm not exactly an ophthalmologist, but it seems like the response quality has degraded, right?

Well looking at the responses the first one is more accurate than second one. In the second one the additional details should be inside the main answer.

gformcreation commented 1 week ago

Hi @i486 did tried again the below was the prompt.

<|start_header_id|>system<|end_header_id|> Use the following pieces of information to answer the user's question. If you don't know the answer, just say that you don't know, don't try to make up an answer. Context: Instructions for Use 8 Analyzing Exam Data and Creating Reports CIRRUS HD-OCT 8.2 Posterior Segment Scan Analysis 2660021178037 Rev C 279 / 488# Symbol Explanation Rotation Tool Changes the angle of the ONH spoke. 8.2.3.3.3 Analyzing ONH and RNFL The ONH and RNFL OU Analysis uses two kinds of thickness measurements: •RNFL grid When you move the RNFL grid, the thickness maps, deviation maps, and ONH calculations update automatically. •Super-pixels A total of 50 x 50 (2500) super-pixels are analyzed (optic disc excluded). RNFL Thickness Maps report thickness by showing blue or green for thinner areas and yellow or red for thicker areas (the optic disc appears solid blue). Deviation Maps compare the reference range for healthy eyes for the patient's age and show yellow and red areas that are thinner than 95% and 99% of the (age and optic disc size-adjusted) population, respectively. Interpretation Considerations For some patients, deviation maps can show decrease due to reasons other than pathology. such as: •The patient has strongly myopic or hyperopic eyes •The patient has split-bundle anatomy •The patient has a tilted RNFL bundle pattern If the patient's temporal RNFL that is very thin or absent, the maps might show thickened RNFL. CIRRUS™ HD-OCT compares the RNFL thickness and symmetry of the scan(s) with the reference range for healthy eyes for the patient's age. Instructions for Use 8 Analyzing Exam Data and Creating Reports CIRRUS HD-OCT 8.2 Posterior Segment Scan Analysis 2660021178037 Rev C 275 / 488Shape Indication Quadrants (Superior, Nasal, Temporal, Inferior) Clock Hours Table 63: Shape Key for the RNFL reference database for healthy eyes Analysis Interpretation Thickness Map Two examples of thickness maps; interpreted as: •blue and green = thinner areas •yellow and red = thicker areas •solid blue circle = optic disc Deviation from Reference Map Two examples that show: Red and yellow areas shows where this scan has areas that are thinner than the reference range for healthy eyes. Thinner regions do not necessary indicate pathological loss of RNFL. Red and yellow areas can also appear for: •Strongly myopic or hyperopic eyes (which may have a different distribution of measured RNFL thickness values) •Split-bundle anatomy •A very tilted RNFL bundle Quadrant Average (more detailed comparison) Right Eye •Superior quadrant average is 86 µm and Borderline Thin •Nasal quadrant average is 45 µm and Borderline Thin •Inferior quadrant average is 81 µm and Thin •Temporal quadrant average is 79 µm and Healthy Left Eye •Superior quadrant average is 77 µm and Thin •Temporal quadrant average is 70 µm and Healthy •Inferior quadrant average is 50 µm and Thin •Nasal quadrant average is 45 µm and Borderline Thin Clock Hour Average (most detailed comparison)Shows the measurement for each clock hour and indicates whether the measurement is within 90% of reference limits of healthy eyes (green), within 5% of reference limits of healthy eyes (yellow) or within 1% of reference limits of healthy eyes (red). 8 Analyzing Exam Data and Creating Reports Instructions for Use CIRRUS HD-OCT 8.2 Posterior Segment Scan Analysis 278 / 488 2660021178037 Rev C# Symbol Explanation 3 Reference Range Comparison Displays measurements with color-coded comparison to the reference range for the patient's age. 4 Quadrant Averages Shows the RNFL thickness for each eye in four quadrants (Superior, Nasal, Temporal, Inferior) RNFL Thickness Chart Displays thickness profiles. •TSNIT 5 RNFL Clock Hours Shows the measurement for each clock hour and indicates whether the measurement is within 90% of the reference limit range of healthy eyes (green), within 5% of the reference limit range of healthy eyes (yellow) or within 1% of the reference limit range of healthy eyes (red). 6 RNFL thickness map. 7 Vertical B-scan: slice through cube top to bottom relative to LSO fundus image. 8 Horizontal B-Scan: slice through cube side to side relative to LSO fundus image. RNFL circle scan extracted along 3.46 mm diameter Calculation Circle 9 Angle Indicator Shows the angle of the ONH spoke. Question: What is RNFL Clock hours in ONH and RNFL OU Analysis? Answer the question and provide additional helpful information, based on the pieces of information, if applicable. Be succinct. End the response with tag which is must. Responses should be properly formatted and structured so that it can be easily read. Helpful and the most accurate response from the context and relevant to question :- <|eot_id|> <|start_header_id|>assistant<|end_header_id|>

Response :-
8 "you\n8 Post Post Post 8 C- up a 8 HD-9 8 HD 8 HD 8 HD 8 HD 8 HD 8 HD 8 HD 8 HD 8 HD 8 HD 8 HD 0.0 symbol post\nC\n Symbol\n C\n C 8 HDT Post Post Post Post S Post Post Post Post Post the Post Post: Post Post-Post Post and Post Post ON 8 8 Post Post Post Post Post\n C\nC\n Post Post Post on the Post ON\n\n C/ Post On "C\n C\n C\n C\n C\n Rot \n C\n The post ON the ON the post the ON the ON the post the post on the ON post post the ON ON HD Post ON post post post ON post post post the post post the post to the post ON the post the post ON the post the post the ON up the post 8 HD Post ON the post the ON the On the ON the ON the post on the ON S ON C\nC\n\n Symbol or the ON C\nC ON Post On Post Post Post Post On Post On Post ON C on the ON C On the ON C On the ON C ON C ON C On the ON Post

Now The params llm = LlamaCpp( model_path="/kaggle/input/llama3-8b-q4/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf", n_ctx=4096, n_gpu_layers=-1, verbose=True, stop = ["\\"], temperature = 0, seed = 2000, )

Here there are two things to notice :- 1) When i lower the amount of token in the input i get normal response. But when increased the vice-versa happens. 2) The same was checked with llama.cpp with default params and just the -c 4096 and it was working absolutely great.

i486 commented 1 week ago

Hi, @gformcreation

I just realized (sry for not catching it earlier) that your codes doesn't have user prompt, which is likely causing the gibberish responses. Give this one a try and see if you get a normal response.

from llama_cpp import Llama

llm = Llama(
    # model_path=r"G:\Models\gemma-2-9b-it-IQ3_XXS\gemma-2-9b-it-IQ3_XXS.gguf",
    # model_path=r"G:\Models\gemma-2-9b-it-Q3_K_M\gemma-2-9b-it-Q3_K_M.gguf",
    # model_path=r"G:\Models\gemma-2-9b-it-IQ4_XS\gemma-2-9b-it-IQ4_XS.gguf",
    model_path=r"G:\Models\Meta-Llama-3-8B-Instruct-Q4_K_M\Meta-Llama-3-8B-Instruct-Q4_K_M.gguf",
    # model_path=r"G:\Models\",
    # model_path=r"G:\Models\",
    n_ctx=4096,
    n_threads=6,
    n_gpu_layers=-1,
    # use_mmap=False,
    # use_mlock=False,
    # offload_kqv=True,
    # offload_kqv=False,
    # flash_attn=True,
    # flash_attn=False,
    # n_batch=32,
    # rope_freq_base=10000,
    verbose=True,
    )

prompt = """
<|start_header_id|>system<|end_header_id|>

Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: Instructions for Use Attachment B Reference Database 2 for Healthy Eyes Study–CIRRUS 6000
CIRRUS HD-OCT B.8 ONH and RNFL Images
2660021178037 Rev C 465 / 488B.7.2.1 Ganglion Cell Data
Type
Variablen Mean (SD) 95% CI Min Max
Ganglion Cell Thickness
Average Thickness (µm) 826 80.46 (6.36) [80.02, 80.89] 61.00 100.00
Temporal Superior (µm) 826 79.60 (6.25) [79.17, 80.03] 60.00 100.00
Superior (µm) 826 80.98 (6.73) [80.52, 81.44] 59.00 102.00
Nasal Superior (µm) 826 82.16 (7.12) [81.68, 82.65] 58.00 100.00
Nasal Inferior (µm) 826 80.49 (7.13) [80.00, 80.97] 59.00 99.00
Inferior (µm) 826 78.84 (6.83) [78.37, 79.30] 58.00 100.00
Temporal Inferior (µm) 826 80.71 (6.36) [80.28, 81.15] 61.00 103.00
Table 93: Summary of variables, Macular Cube 200x200
B.8ONH and RNFL Images
NOTE! These parameters are adjusted for age and optic disc
area.
Measurement Parameters Analyses
RNFL Thickness • RNFL Summary Parameters:
–Average RNFL Thickness
–Temporal Average RNFL
Thickness
–Superior Average RNFL
Thickness
–Nasal Average RNFL
Thickness
–Inferior Average RNFL
Thickness
–Inter-eye symmetry (RNFL)
–Clock hour parameters
–TSNIT Profiles
–Neuro-Retinal Rim
–RNFL Thickness
–RNFL Deviation Map• ONH/RNFL OU
• Panomap
• Single Eye Summary
• Wellness Report
ONH Features • Rim Area (mm2)
• Average Cup-to-Disc Ratio
• Vertical Cup-to-Disc Ratio
• Cup Volume (mm3)• ONH/RNFL OU
• Panomap
• Single Eye Summary
Table 94: RNFL thickness and ONH Parameters and Analyses with RDB Data
Also see
2Analyze ONH/RNFL OU [ } 276]

Instructions for Use 8 Analyzing Exam Data and Creating Reports
CIRRUS HD-OCT 8.2 Posterior Segment Scan Analysis
2660021178037 Rev C 279 / 488# Symbol Explanation
Rotation Tool
Changes the angle of the ONH spoke.
8.2.3.3.3 Analyzing ONH and RNFL
The ONH and RNFL OU Analysis uses two kinds of thickness
measurements:
•RNFL grid
When you move the RNFL grid, the thickness maps, deviation
maps, and ONH calculations update automatically.
•Super-pixels
A total of 50 x 50 (2500) super-pixels are analyzed (optic disc
excluded).
RNFL Thickness Maps report thickness by showing blue or green
for thinner areas and yellow or red for thicker areas (the optic disc
appears solid blue).
Deviation Maps compare the reference range for healthy eyes for
the patient's age and show yellow and red areas that are thinner
than 95% and 99% of the (age and optic disc size-adjusted)
population, respectively.
Interpretation Considerations
For some patients, deviation maps can show decrease due to
reasons other than pathology. such as:
•The patient has strongly myopic or hyperopic eyes
•The patient has split-bundle anatomy
•The patient has a tilted RNFL bundle pattern
If the patient's temporal RNFL that is very thin or absent, the maps
might show thickened RNFL.
CIRRUS™ HD-OCT compares the RNFL thickness and symmetry of
the scan(s) with the reference range for healthy eyes for the
patient's age.

Attachment B Reference Database 2 for Healthy Eyes Study–CIRRUS 6000 Instructions for Use
CIRRUS HD-OCT B.8 ONH and RNFL Images
466 / 488 2660021178037 Rev CB.8.1 ONH Parameters
The data originally collected for the database was analyzed to
create the reference ranges for the following ONH parameters:
•Rim Area (mm2)
•Average Cup to Disc Ratio (square root of cup area over disc
area)
•Vertical Cup to Disc Ratio (cup height over disc height)
•Cup Volume (mm3)
B.8.2 RNFL Parameters
This study determined the reference range for healthy eyes for the
following the retinal nerve fiber layer (RNFL) parameters in healthy
subjects ages 18 to 88:
•Average RNFL Thickness
•Temporal Average RNFL Thickness
•Superior Average RNFL Thickness
•Nasal Average RNFL Thickness
•Inferior Average RNFL Thickness
•Inter-eye symmetry (RNFL)
•Clock hour parameters
•TSNIT Profiles
•Neuro-Retinal Rim
•RNFL Thickness
•RNFL Deviation Map
B.8.3 ONH and RNFL Data
Type
Variablen Mean (SD) 95% CI Min Max
ONH
Rim Area (mm2) 854 1.30 (0.23) [1.28, 1.31] 0.74 2.21
Average Cup/Disc Ratio 854 0.46 (0.17) [0.45, 0.47] 0.06 0.74
Vertical C/D Ratio 854 0.44 (0.16) [0.43, 0.45] 0.05 0.75
Cup Volume (mm3) 854 0.13 (0.12) [0.12, 0.14] 0.00 0.70
RNFL Thickness
Average RNFL Thickness (µm) 854 93.19 (9.28) [92.57, 93.81] 69.00 126.00
Temporal (µm) 854 64.04 (11.53) [63.26, 64.81] 40.00 120.00
Superior (µm) 854 115.05 (15.92) [113.98, 116.12] 72.00 162.00
Nasal (µm) 854 72.35 (11.54) [71.58, 73.13] 41.00 115.00
Inferior (µm) 854 121.37 (16.08) [120.29, 122.45] 72.00 188.00
Clock hour 1 (µm) 854 105.56 (23.11) [104.01, 107.11] 51.00 185.00
Clock hour 2 (µm) 854 88.60 (18.10) [87.39, 89.82] 43.00 160.00
Clock hour 3 (µm) 854 60.73 (10.53) [60.02, 61.44] 38.00 103.00
Clock hour 4 (µm) 854 67.78 (13.79) [66.85, 68.70] 37.00 129.00"<|eot_id|><|start_header_id|>user<|end_header_id|>

What is RNFL Clock hours in ONH and RNFL OU Analysis?

Answer the question and provide additional helpful information,
based on the pieces of information, if applicable. Be succinct.
End the response with tag which is must.

Responses should be properly formatted and structured so that it can be easily read.
Helpful and the most accurate response from the context and relevant to question :- <|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
# print("Full Prompt:")
# print(prompt)
for chunk in llm.create_completion(
                        prompt,
                        max_tokens=4096,
                        # top_k=40,
                        # top_p=0.95,
                        # min_p=0.05,
                        # frequency_penalty=0.0,
                        # presence_penalty=0.0,
                        # repeat_penalty=1.1,
                        temperature=0,
                        # seed=2000,
                        stream=True,
                        echo=True,
                        ):
    print(chunk["choices"][0]["text"],end="", flush=True)

Relevant parts <|start_header_id|>system<|end_header_id|> new line here < important Use the following pieces of information to answer the user's question. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: Instructions for Use Attachment B Reference Database 2 for Healthy Eyes Study–CIRRUS 6000 .................................................................................................................................................................................................................. Clock hour 4 (µm) 854 67.78 (13.79) [66.85, 68.70] 37.00 129.00"<|eot_id|><|start_header_id|>user<|end_header_id|> new line < important What is RNFL Clock hours in ONH and RNFL OU Analysis?

Answer the question and provide additional helpful information, based on the pieces of information, if applicable. Be succinct. End the response with tag which is must.

Helpful and the most accurate response from the context and relevant to question :- <|eot_id|><|start_header_id|>assistant<|end_header_id|>

gformcreation commented 1 week ago

Thanks a lot @i486 ,This solved my issues. This was a great learning.