[NeurIPS'24] HippoRAG is a novel RAG framework inspired by human long-term memory that enables LLMs to continuously integrate knowledge across external documents. RAG + Knowledge Graphs + Personalized PageRank.
When i paper to indexing with sample data with step in Indexingoffline ,it has erro happen in the begging of process(i have finished the base requrements.txt libary) ........
Traceback (most recent call last):
File "/home/ai/example/HippoRAG/src/openie_with_retrieval_option_parallel.py", line 5, in
from langchain_community.chat_models import ChatOllama, ChatLlamaCpp
ImportError: cannot import name 'ChatLlamaCpp' from 'langchain_community.chat_models' (/home/ai/example/.venv/lib/python3.10/site-packages/langchain_community/chat_models/init.py)
Traceback (most recent call last):
File "/home/ai/example/HippoRAG/src/named_entity_extraction_parallel.py", line 7, in
from langchain_community.chat_models import ChatOllama, ChatLlamaCpp
ImportError: cannot import name 'ChatLlamaCpp' from 'langchain_community.chat_models' (/home/ai/example/.venv/lib/python3.10/site-packages/langchain_community/chat_models/init.py)
Traceback (most recent call last):
File "/home/ai/example/HippoRAG/src/create_graph.py", line 366, in
create_graph(dataset, extraction_type, extraction_model, retriever_name, processed_retriever_name, threshold, create_graph_flag, cosine_sim_edges)
File "/home/ai/example/HippoRAG/src/create_graph.py", line 23, in create_graph
maxsamples = np.max([int(file.split('{}'.format(extraction_model))[1].split('.json')[0]) for file in possible_files])
File "/home/ai/example/.venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 2810, in max
return _wrapreduction(a, np.maximum, 'max', axis, None, out,
File "/home/ai/example/.venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 88, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity
Traceback (most recent call last):
File "/home/ai/example/HippoRAG/src/colbertv2_knn.py", line 68, in
string_df = pd.read_csv(string_filename, sep='\t')
File "/home/ai/example/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1024, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/ai/example/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 618, in _read
parser = TextFileReader(filepath_or_buffer, kwds)
File "/home/ai/example/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1618, in init
self._engine = self._make_engine(f, self.engine)
File "/home/ai/example/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1878, in _make_engine
self.handles = get_handle(
File "/home/ai/example/.venv/lib/python3.10/site-packages/pandas/io/common.py", line 873, in get_handle
handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'output/kb_to_kb.tsv'
Traceback (most recent call last):
File "/home/ai/example/HippoRAG/src/colbertv2_knn.py", line 68, in
string_df = pd.read_csv(string_filename, sep='\t')
File "/home/ai/example/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1024, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/ai/example/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 618, in _read
parser = TextFileReader(filepath_or_buffer, kwds)
File "/home/ai/example/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1618, in init
self._engine = self._make_engine(f, self.engine)
File "/home/ai/example/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1878, in _make_engine
self.handles = get_handle(
File "/home/ai/example/.venv/lib/python3.10/site-packages/pandas/io/common.py", line 873, in get_handle
handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'output/query_to_kb.tsv'
Traceback (most recent call last):
File "/home/ai/example/HippoRAG/src/create_graph.py", line 366, in
create_graph(dataset, extraction_type, extraction_model, retriever_name, processed_retriever_name, threshold, create_graph_flag, cosine_sim_edges)
File "/home/ai/example/HippoRAG/src/create_graph.py", line 23, in create_graph
maxsamples = np.max([int(file.split('{}'.format(extraction_model))[1].split('.json')[0]) for file in possible_files])
File "/home/ai/example/.venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 2810, in max
return _wrapreduction(a, np.maximum, 'max', axis, None, out,
File "/home/ai/example/.venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 88, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity
Traceback (most recent call last):
File "/home/ai/example/HippoRAG/src/colbertv2_indexing.py", line 52, in
kb_phrase_dict = pickle.load(open(args.phrase, 'rb'))
FileNotFoundError: [Errno 2] No such file or directory: 'output/sample_facts_and_sim_graph_phrase_dict_ents_only_lower_preprocess_ner.v3.subset.p'
When i paper to indexing with sample data with step in Indexingoffline ,it has erro happen in the begging of process(i have finished the base requrements.txt libary) ........
Traceback (most recent call last): File "/home/ai/example/HippoRAG/src/openie_with_retrieval_option_parallel.py", line 5, in
from langchain_community.chat_models import ChatOllama, ChatLlamaCpp
ImportError: cannot import name 'ChatLlamaCpp' from 'langchain_community.chat_models' (/home/ai/example/.venv/lib/python3.10/site-packages/langchain_community/chat_models/init.py)
Traceback (most recent call last):
File "/home/ai/example/HippoRAG/src/named_entity_extraction_parallel.py", line 7, in
from langchain_community.chat_models import ChatOllama, ChatLlamaCpp
ImportError: cannot import name 'ChatLlamaCpp' from 'langchain_community.chat_models' (/home/ai/example/.venv/lib/python3.10/site-packages/langchain_community/chat_models/init.py)
Traceback (most recent call last):
File "/home/ai/example/HippoRAG/src/create_graph.py", line 366, in
create_graph(dataset, extraction_type, extraction_model, retriever_name, processed_retriever_name, threshold, create_graph_flag, cosine_sim_edges)
File "/home/ai/example/HippoRAG/src/create_graph.py", line 23, in create_graph
maxsamples = np.max([int(file.split('{}'.format(extraction_model))[1].split('.json')[0]) for file in possible_files])
File "/home/ai/example/.venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 2810, in max
return _wrapreduction(a, np.maximum, 'max', axis, None, out,
File "/home/ai/example/.venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 88, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity
Traceback (most recent call last):
File "/home/ai/example/HippoRAG/src/colbertv2_knn.py", line 68, in
string_df = pd.read_csv(string_filename, sep='\t')
File "/home/ai/example/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1024, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/ai/example/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 618, in _read
parser = TextFileReader(filepath_or_buffer, kwds)
File "/home/ai/example/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1618, in init
self._engine = self._make_engine(f, self.engine)
File "/home/ai/example/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1878, in _make_engine
self.handles = get_handle(
File "/home/ai/example/.venv/lib/python3.10/site-packages/pandas/io/common.py", line 873, in get_handle
handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'output/kb_to_kb.tsv'
Traceback (most recent call last):
File "/home/ai/example/HippoRAG/src/colbertv2_knn.py", line 68, in
string_df = pd.read_csv(string_filename, sep='\t')
File "/home/ai/example/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1024, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/ai/example/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 618, in _read
parser = TextFileReader(filepath_or_buffer, kwds)
File "/home/ai/example/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1618, in init
self._engine = self._make_engine(f, self.engine)
File "/home/ai/example/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1878, in _make_engine
self.handles = get_handle(
File "/home/ai/example/.venv/lib/python3.10/site-packages/pandas/io/common.py", line 873, in get_handle
handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'output/query_to_kb.tsv'
Traceback (most recent call last):
File "/home/ai/example/HippoRAG/src/create_graph.py", line 366, in
create_graph(dataset, extraction_type, extraction_model, retriever_name, processed_retriever_name, threshold, create_graph_flag, cosine_sim_edges)
File "/home/ai/example/HippoRAG/src/create_graph.py", line 23, in create_graph
maxsamples = np.max([int(file.split('{}'.format(extraction_model))[1].split('.json')[0]) for file in possible_files])
File "/home/ai/example/.venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 2810, in max
return _wrapreduction(a, np.maximum, 'max', axis, None, out,
File "/home/ai/example/.venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 88, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity
[Aug 28, 11:23:50] #> Note: Output directory data/lm_vectors/colbert/sample/corpus/indexes/exp/colbertv2.0 already exists
[Aug 28, 11:23:50] #> Will delete 10 files already at data/lm_vectors/colbert/sample/corpus/indexes/exp/colbertv2.0 in 20 seconds...
> Starting...
nranks = 1 num_gpus = 1 device=0 { "query_token_id": "[unused0]", "doc_token_id": "[unused1]", "query_token": "[Q]", "doc_token": "[D]", "ncells": null, "centroid_score_threshold": null, "ndocs": null, "load_index_with_mmap": false, "index_path": null, "index_bsize": 64, "nbits": 2, "kmeans_niters": 20, "resume": false, "similarity": "cosine", "bsize": 64, "accumsteps": 1, "lr": 1e-5, "maxsteps": 400000, "save_every": null, "warmup": 20000, "warmup_bert": null, "relu": false, "nway": 64, "use_ib_negatives": true, "reranker": false, "distillation_alpha": 1.0, "ignore_scores": false, "model_name": null, "query_maxlen": 32, "attend_to_mask_tokens": false, "interaction": "colbert", "dim": 128, "doc_maxlen": 180, "mask_punctuation": true, "checkpoint": "exp\/colbertv2.0", "triples": "\/future\/u\/okhattab\/root\/unit\/experiments\/2021.10\/downstream.distillation.round2.2_score\/round2.nway6.cosine.ib\/examples.64.json", "collection": "data\/lm_vectors\/colbert\/sample_corpus_3.tsv", "queries": "\/future\/u\/okhattab\/data\/MSMARCO\/queries.train.tsv", "index_name": "exp\/colbertv2.0", "overwrite": false, "root": "data\/lm_vectors\/colbert\/sample", "experiment": "corpus", "index_root": null, "name": "2024-08\/28\/11.23.48", "rank": 0, "nranks": 1, "amp": true, "gpus": 1, "avoid_fork_if_possible": false } [Aug 28, 11:24:13] #> Loading collection... 0M [Aug 28, 11:24:16] [0] # of sampled PIDs = 3 sampled_pids[:3] = [1, 0, 2] [Aug 28, 11:24:16] [0] #> Encoding 3 passages.. [Aug 28, 11:24:17] [0] avg_doclen_est = 90.33333587646484 len(local_sample) = 3 [Aug 28, 11:24:17] [0] Creating 256 partitions. [Aug 28, 11:24:17] [0] Estimated 271 embeddings. [Aug 28, 11:24:17] [0] #> Saving the indexing plan to data/lm_vectors/colbert/sample/corpus/indexes/exp/colbertv2.0/plan.json .. WARNING clustering 258 points to 256 centroids: please provide at least 9984 training points Clustering 258 points in 128D to 256 clusters, redo 1 times, 20 iterations Preprocessing in 0.00 s Iteration 19 (0.29 s, search 0.05 s): objective=0.0608023 imbalance=1.008 nsplit=0 [Aug 28, 11:24:18] Loading decompress_residuals_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)... [Aug 28, 11:24:18] Loading packbits_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)... [0.024, 0.04, 0.04, 0.033, 0.039, 0.047, 0.022, 0.026, 0.037, 0.056, 0.035, 0.04, 0.041, 0.038, 0.017, 0.035, 0.036, 0.025, 0.044, 0.03, 0.038, 0.039, 0.025, 0.04, 0.03, 0.051, 0.022, 0.043, 0.057, 0.052, 0.036, 0.038, 0.039, 0.042, 0.036, 0.054, 0.017, 0.041, 0.036, 0.02, 0.018, 0.048, 0.046, 0.048, 0.026, 0.042, 0.043, 0.044, 0.031, 0.041, 0.038, 0.039, 0.034, 0.019, 0.028, 0.049, 0.044, 0.024, 0.046, 0.027, 0.019, 0.039, 0.026, 0.033, 0.032, 0.03, 0.05, 0.024, 0.021, 0.023, 0.044, 0.039, 0.037, 0.036, 0.041, 0.026, 0.048, 0.033, 0.034, 0.038, 0.034, 0.033, 0.039, 0.034, 0.044, 0.054, 0.038, 0.028, 0.051, 0.035, 0.037, 0.019, 0.029, 0.034, 0.033, 0.038, 0.024, 0.045, 0.033, 0.049, 0.059, 0.045, 0.023, 0.047, 0.047, 0.03, 0.042, 0.036, 0.023, 0.02, 0.015, 0.025, 0.042, 0.034, 0.029, 0.024, 0.033, 0.027, 0.041, 0.022, 0.02, 0.046, 0.044, 0.047, 0.025, 0.036, 0.025, 0.038] [Aug 28, 11:24:18] #> Got bucket_cutoffs_quantiles = tensor([0.2500, 0.5000, 0.7500], device='cuda:0') and bucket_weights_quantiles = tensor([0.1250, 0.3750, 0.6250, 0.8750], device='cuda:0') [Aug 28, 11:24:18] #> Got bucket_cutoffs = tensor([-2.2236e-02, -7.6294e-06, 2.2995e-02], device='cuda:0') and bucket_weights = tensor([-0.0479, -0.0089, 0.0083, 0.0513], device='cuda:0') [Aug 28, 11:24:18] avg_residual = 0.0355224609375 0it [00:00, ?it/s][Aug 28, 11:24:18] [0] #> Encoding 3 passages.. [Aug 28, 11:24:18] [0] #> Saving chunk 0: 3 passages and 271 embeddings. From #0 onward. 1it [00:00, 17.05it/s] [Aug 28, 11:24:18] [0] #> Checking all files were saved... [Aug 28, 11:24:18] [0] Found all files! [Aug 28, 11:24:18] [0] #> Building IVF... [Aug 28, 11:24:18] [0] #> Loading codes... 100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 3754.97it/s] [Aug 28, 11:24:18] [0] Sorting codes... [Aug 28, 11:24:18] [0] Getting unique codes... [Aug 28, 11:24:18] #> Optimizing IVF to store map from centroids to list of pids.. [Aug 28, 11:24:18] #> Building the emb2pid mapping.. [Aug 28, 11:24:18] len(emb2pid) = 271 100%|█████████████████████████████████████████████████████████████████████████████| 256/256 [00:00<00:00, 284133.85it/s] [Aug 28, 11:24:18] #> Saved optimized IVF to data/lm_vectors/colbert/sample/corpus/indexes/exp/colbertv2.0/ivf.pid.pt [Aug 28, 11:24:18] [0] #> Saving the indexing metadata to data/lm_vectors/colbert/sample/corpus/indexes/exp/colbertv2.0/metadata.json ..
> Joined...
Traceback (most recent call last): File "/home/ai/example/HippoRAG/src/colbertv2_indexing.py", line 52, in
kb_phrase_dict = pickle.load(open(args.phrase, 'rb'))
FileNotFoundError: [Errno 2] No such file or directory: 'output/sample_facts_and_sim_graph_phrase_dict_ents_only_lower_preprocess_ner.v3.subset.p'