Closed janjoy closed 8 months ago
@janjoy It seems that your vector database is not populated. I tried the same query from my end and it is working here (please see the attached snippet).
I would recommend to try the following steps: Step 1: git pull and update your repo
Step 2: try run the following from KG_RAG folder
python -m kg_rag.test.test_vectordb
Step 3: If Step 2 returns: 'vectorDB is correctly populated and is good to go!' everything is good with regard to the vector database.
Otherwise, you need to run setup script. This link will walk you through the steps for that.
Feel free to let me know how this goes. Incase if you hit any wall we can try to figure that out :)
https://github.com/BaranziniLab/KG_RAG/assets/42702311/a2362869-9c85-4264-9324-a82e874c6b51
Thanks @karthiksoman for replying.
python -m kg_rag.test.test_vectordb
did not run successfully, so I had to rerun the setup script. I believe the way I have changed the config.yaml file is creating issues. Could you please confirm if this is correct? what changes should I make?
# KG-RAG hyperparameters
CONTEXT_VOLUME : 150
QUESTION_VS_CONTEXT_SIMILARITY_PERCENTILE_THRESHOLD : 75
QUESTION_VS_CONTEXT_MINIMUM_SIMILARITY : 0.5
SENTENCE_EMBEDDING_MODEL_FOR_NODE_RETRIEVAL : 'sentence-transformers/all-MiniLM-L6-v2'
SENTENCE_EMBEDDING_MODEL_FOR_CONTEXT_RETRIEVAL : 'pritamdeka/S-PubMedBert-MS-MARCO'
# VectorDB hyperparameters
VECTOR_DB_DISEASE_ENTITY_PATH : 'data/disease_with_relation_to_genes.pickle'
VECTOR_DB_PATH : 'data/vectorDB/disease_nodes_db'
VECTOR_DB_CHUNK_SIZE : 650
VECTOR_DB_CHUNK_OVERLAP : 200
VECTOR_DB_BATCH_SIZE : 200
VECTOR_DB_SENTENCE_EMBEDDING_MODEL : 'sentence-transformers/all-MiniLM-L6-v2'
# Path for context file from SPOKE KG
NODE_CONTEXT_PATH : 'data/context_of_disease_which_has_relation_to_genes.csv'
# Just note that, this assumes your GPT config file is in the $HOME path, if not, change it accordingly
# Also, GPT '.env' file should contain values for API_KEY, and optionally API_VERSION and RESOURCE_ENDPOINT. We are not including those parameters in this yaml file
GPT_CONFIG_FILE : '$HOME/.gpt_config.env'
# Can be 'azure' or 'open_ai'.
GPT_API_TYPE : 'azure'
# Llama model name (Refer Hugging face to get the correct name for the model version you would like to use, also make sure you have the right permission to use the model)
LLAMA_MODEL_NAME : 'meta-llama/Llama-2-13b-chat-hf'
LLAMA_MODEL_BRANCH : 'main'
# Path for caching LLM model files (When the model gets downloaded from hugging face, it will be saved in this path)
LLM_CACHE_DIR : 'data/llm_data/llm_models/huggingface'
LLM_TEMPERATURE : 0
# Path to save results
SAVE_RESULTS_PATH : 'data/analysis_results'
# File paths for test questions
DRUG_REPURPOSING_PATH : 'data/drug_repurposing_questions_v2.csv'
MCQ_PATH : 'data/test_questions_two_hop_mcq_from_monarch_and_robokop.csv'
TRUE_FALSE_PATH : 'data/test_questions_one_hop_true_false_v2.csv'
ONE_HOP_GRAPH_TRAVERSAL : 'data/one_hop_graph_traversal_questions_v2.csv'
TWO_HOP_GRAPH_TRAVERSAL : 'data/two_hop_graph_traversal_questions.csv'
# SPOKE-API params
BASE_URI : 'https://spoke.rbvi.ucsf.edu'
cutoff_Compound_max_phase : 3
cutoff_Protein_source : ['SwissProt']
cutoff_DaG_diseases_sources : ['knowledge', 'experiments']
cutoff_DaG_textmining : 3
cutoff_CtD_phase : 3
cutoff_PiP_confidence : 0.7
cutoff_ACTeG_level : ['Low', 'Medium', 'High']
I changed the paths as the files were found here:
@janjoy I notice some couple of things here:
Try the following:
python -m kg_rag.run_setup
)data/vectorDB/disease_nodes_db
and see if the folder inside that path is empty or not. It should have files as shown below:
Let me know how it goes!
Thank you @karthiksoman for your suggestions, this helped in creating the bin files in data/vectorDB/disease_nodes_db
.
Currently getting error in Step 5 LLM prompting.
python -m kg_rag.rag_based_generation.GPT.text_generation -i True -g "gpt-35-turbo"
Enter your question : what gene is associated with hypochondrogenesis?
Press enter for Step 1 - Disease entity extraction using GPT-3.5-Turbo
Processing ...
Extracted entity from the prompt = 'hypochondrogenesis'
Press enter for Step 2 - Match extracted Disease entity to SPOKE nodes
Finding vector similarity ...
Matched entities from SPOKE = 'hypochondrogenesis'
Press enter for Step 3 - Context extraction from SPOKE
Extracted Context is :
Disease hypochondrogenesis isa Disease osteochondrodysplasia and Provenance of this association is Disease Ontology. Disease hypochondrogenesis isa Disease monogenic disease and Provenance of this association is Disease Ontology. Disease hypochondrogenesis associates Gene COL2A1 and Provenance of this association is DISEASES. hypochondrogenesis has a Disease Ontology identifier of DOID:0080044 and Provenance of this association is Disease Ontology.
Press enter for Step 4 - Context pruning
Pruned Context is :
Disease hypochondrogenesis associates Gene COL2A1 and Provenance of this association is DISEASES.
Press enter for Step 5 - LLM prompting
Prompting gpt-35-turbo
Traceback (most recent call last):
File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/site-packages/tenacity/__init__.py", line 382, in __call__
result = fn(*args, **kwargs)
File "/home/jjoy/sulab_projects/KG_RAG/kg_rag/utility.py", line 186, in fetch_GPT_response
response = openai.ChatCompletion.create(
File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/site-packages/openai/api_resources/chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 155, in create
response, _, api_key = requestor.request(
File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/site-packages/openai/api_requestor.py", line 299, in request
resp, got_stream = self._interpret_response(result, stream)
File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/site-packages/openai/api_requestor.py", line 710, in _interpret_response
self._interpret_response_line(
File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/site-packages/openai/api_requestor.py", line 775, in _interpret_response_line
raise self.handle_error_response(
openai.error.InvalidRequestError: Invalid URL (POST /v1/engines/gpt-35-turbo/chat/completions)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/jjoy/sulab_projects/KG_RAG/kg_rag/rag_based_generation/GPT/text_generation.py", line 56, in <module>
main()
File "/home/jjoy/sulab_projects/KG_RAG/kg_rag/rag_based_generation/GPT/text_generation.py", line 51, in main
interactive(question, vectorstore, node_context_df, embedding_function_for_context_retrieval, CHAT_MODEL_ID)
File "/home/jjoy/sulab_projects/KG_RAG/kg_rag/utility.py", line 365, in interactive
output = get_GPT_response(enriched_prompt, system_prompts["KG_RAG_BASED_TEXT_GENERATION"], llm_type, llm_type, temperature=config_data["LLM_TEMPERATURE"])
File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/site-packages/joblib/memory.py", line 655, in __call__
return self._cached_call(args, kwargs)[0]
File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/site-packages/joblib/memory.py", line 598, in _cached_call
out, metadata = self.call(*args, **kwargs)
File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/site-packages/joblib/memory.py", line 856, in call
output = self.func(*args, **kwargs)
File "/home/jjoy/sulab_projects/KG_RAG/kg_rag/utility.py", line 206, in get_GPT_response
return fetch_GPT_response(instruction, system_prompt, chat_model_id, chat_deployment_id, temperature)
File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/site-packages/tenacity/__init__.py", line 289, in wrapped_f
return self(f, *args, **kw)
File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__
do = self.iter(retry_state=retry_state)
File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/site-packages/tenacity/__init__.py", line 326, in iter
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7ffa356d13c0 state=finished raised `InvalidRequestError>]````
@janjoy I see this as a problem with the OpenAI API call. Could you please verify or substitute your OpenAI API key with one that you know functions properly in the .gpt_config.env file, and then retry the same?
@janjoy FYI. Please refer this thread where a similar issue was discussed and fixed.
@karthiksoman Thank you so much for the solution. This worked. I had tried changing the API keys before but was getting stuck.
@janjoy awesome! since this is resolved, I am closing this issue!
Hello,
I am encountering this context retrieval error while running KG RAG. What could be the possible solution? Below are 2 examples:
Example 1: (kg_rag) jjoy@jjoy:~/sulab_projects/KG_RAG$ python -m kg_rag.rag_based_generation.GPT.text_generation -g "gpt-4" Enter your question : what gene is associated with hypochondrogenesis? Retrieving context from SPOKE graph... Traceback (most recent call last): File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/jjoy/sulab_projects/KG_RAG/kg_rag/rag_based_generation/GPT/text_generation.py", line 56, in
main()
File "/home/jjoy/sulab_projects/KG_RAG/kg_rag/rag_based_generation/GPT/text_generation.py", line 44, in main
context = retrieve_context(question, vectorstore, embedding_function_for_context_retrieval, node_context_df, CONTEXT_VOLUME, QUESTION_VS_CONTEXT_SIMILARITY_PERCENTILE_THRESHOLD, QUESTION_VS_CONTEXT_MINIMUM_SIMILARITY)
File "/home/jjoy/sulab_projects/KG_RAG/kg_rag/utility.py", line 254, in retrieve_context
node_hits.append(node_search_result[0][0].page_content)
IndexError: list index out of range
Example 2: (kg_rag) jjoy@jjoy:~/sulab_projects/KG_RAG$ python -m kg_rag.rag_based_generation.GPT.text_generation -i True -g "gpt-4"
Enter your question : Are there any genes that are commonly shared between parkinsons disease and rem sleep disorder?
Press enter for Step 1 - Disease entity extraction using GPT-3.5-Turbo Processing ... Extracted entity from the prompt = 'Parkinson's disease, REM sleep disorder'
Press enter for Step 2 - Match extracted Disease entity to SPOKE nodes Finding vector similarity ... Traceback (most recent call last): File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/jjoy/sulab_projects/KG_RAG/kg_rag/rag_based_generation/GPT/text_generation.py", line 56, in
main()
File "/home/jjoy/sulab_projects/KG_RAG/kg_rag/rag_based_generation/GPT/text_generation.py", line 51, in main
interactive(question, vectorstore, node_context_df, embedding_function_for_context_retrieval, CHAT_MODEL_ID)
File "/home/jjoy/sulab_projects/KG_RAG/kg_rag/utility.py", line 313, in interactive
node_hits.append(node_search_result[0][0].page_content)
IndexError: list index out of range
Thanks for your help!