OSU-NLP-Group / HippoRAG

HippoRAG is a novel RAG framework inspired by human long-term memory that enables LLMs to continuously integrate knowledge across external documents. RAG + Knowledge Graphs + Personalized PageRank.
https://arxiv.org/abs/2405.14831
MIT License
1.19k stars 94 forks source link

ImportError: cannot import name 'split_torch_state_dict_into_shards' from 'huggingface_hub' #35

Closed 66246764 closed 1 month ago

66246764 commented 1 month ago

thanks for your wonderful work! i installed the env follows your step and the latest requriments.txt. but when the indexing stage,having some prombles,here are the logs:ImportError: cannot import name 'split_torch_state_dict_into_shards' from 'huggingface_hub' image

66246764 commented 1 month ago

it works when update huggingface to 0.23.4

66246764 commented 1 month ago

but something wrong happened ,i used the sample data and default settings,but when the indexing stage,it said:Correct Wiki Format: 0 out of 3,and proccsing are pausing in here for many minutes,and have no change: (hipporag) [user11@localhost HippoRAG-main]$ bash src/setup_hipporag_colbert.sh $DATA $LLM $GPUS $SYNONYM_THRESH $LLM_API ner_gpt-3.5-turbo-1106_3 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 9754.20it/s] 0it [00:00, ?it/s] 0it [00:00, ?it/s] 0it [00:00, ?it/s] 0it [00:00, ?it/s] 0it [00:00, ?it/s] 0it [00:00, ?it/s] | 0/1 [00:00<?, ?it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 6523.02it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 6502.80it/s] 0it [00:00, ?it/s] /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide ret = ret.dtype.type(ret / rcount) OpenIE saved to output/openie_sample_results_ner_gpt-3.5-turbo-1106_3.json No queries will be processed for later retrieval. cannot pickle '_thread.RLock' object 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 10180.35it/s] Correct Wiki Format: 0 out of 3 0it [00:00, ?it/s]

[Jul 11, 00:25:13] #> Creating directory colbert/indexes/nbits_2

> Starting...

nranks = 1 num_gpus = 4 device=0 { "query_token_id": "[unused0]", "doc_token_id": "[unused1]", "query_token": "[Q]", "doc_token": "[D]", "ncells": null, "centroid_score_threshold": null, "ndocs": null, "load_index_with_mmap": false, "index_path": null, "index_bsize": 64, "nbits": 2, "kmeans_niters": 20, "resume": false, "similarity": "cosine", "bsize": 64, "accumsteps": 1, "lr": 1e-5, "maxsteps": 400000, "save_every": null, "warmup": 20000, "warmup_bert": null, "relu": false, "nway": 64, "use_ib_negatives": true, "reranker": false, "distillation_alpha": 1.0, "ignore_scores": false, "model_name": null, "query_maxlen": 32, "attend_to_mask_tokens": false, "interaction": "colbert", "dim": 128, "doc_maxlen": 180, "mask_punctuation": true, "checkpoint": "exp\/colbertv2.0", "triples": "\/future\/u\/okhattab\/root\/unit\/experiments\/2021.10\/downstream.distillation.round2.2_score\/round2.nway6.cosine.ib\/examples.64.json", "collection": "data\/lm_vectors\/colbert\/corpus.tsv", "queries": "\/future\/u\/okhattab\/data\/MSMARCO\/queries.train.tsv", "index_name": "nbits_2", "overwrite": false, "root": "", "experiment": "colbert", "index_root": null, "name": "2024-07\/11\/00.25.12", "rank": 0, "nranks": 1, "amp": true, "gpus": 4, "avoid_fork_if_possible": false } [Jul 11, 00:25:20] #> Loading collection... 0M [Jul 11, 00:25:22] [0] # of sampled PIDs = 28 sampled_pids[:3] = [13, 23, 0] [Jul 11, 00:25:22] [0] #> Encoding 28 passages.. [Jul 11, 00:25:23] [0] avg_doclen_est = 7.5 len(local_sample) = 28 [Jul 11, 00:25:23] [0] Creating 128 partitions. [Jul 11, 00:25:23] [0] Estimated 210 embeddings. [Jul 11, 00:25:23] [0] #> Saving the indexing plan to colbert/indexes/nbits_2/plan.json ..

66246764 commented 1 month ago

i have got the wrong messages,looks like https://github.com/OSU-NLP-Group/HippoRAG/issues/22 RuntimeError: Error building extension 'decompress_residuals_cpp': [1/3] c++ -MMD -MF decompress_residuals.o.d -DTORCH_EXTENSION_NAME=decompress_residuals_cpp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include -isystem /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/TH -isystem /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda-11.7/include -isystem /opt/anaconda3_2022/envs/hipporag/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/colbert/indexing/codecs/decompress_residuals.cpp -o decompress_residuals.o FAILED: decompress_residuals.o c++ -MMD -MF decompress_residuals.o.d -DTORCH_EXTENSION_NAME=decompress_residuals_cpp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include -isystem /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/TH -isystem /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda-11.7/include -isystem /opt/anaconda3_2022/envs/hipporag/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/colbert/indexing/codecs/decompress_residuals.cpp -o decompress_residuals.o c++: error: unrecognized command line option ‘-std=c++14’ [2/3] /usr/local/cuda-11.7/bin/nvcc -DTORCH_EXTENSION_NAME=decompress_residuals_cpp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include -isystem /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/TH -isystem /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda-11.7/include -isystem /opt/anaconda3_2022/envs/hipporag/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -std=c++14 -c /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/colbert/indexing/codecs/decompress_residuals.cu -o decompress_residuals.cuda.o FAILED: decompress_residuals.cuda.o /usr/local/cuda-11.7/bin/nvcc -DTORCH_EXTENSION_NAME=decompress_residuals_cpp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include -isystem /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/TH -isystem /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda-11.7/include -isystem /opt/anaconda3_2022/envs/hipporag/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -std=c++14 -c /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/colbert/indexing/codecs/decompress_residuals.cu -o decompress_residuals.cuda.o nvcc warning : The -std=c++14 flag is not supported with the configured host compiler. Flag will be ignored. In file included from /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/torch/extension.h:4:0, from /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/colbert/indexing/codecs/decompress_residuals.cu:6: /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4:2: error: #error C++14 or later compatible compiler is required to use PyTorch.

error C++14 or later compatible compiler is required to use PyTorch.

……

error C++14 or later compatible compiler is required to use ATen.

^ In file included from /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/ATen/core/dispatch/Dispatcher.h:11:0, from /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/types.h:12, from /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4, from /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3, from /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:4, from /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3, from /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3, from /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/all.h:9, from /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/torch/extension.h:4, from /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/colbert/indexing/codecs/decompress_residuals.cu:6: /opt/anaconda3_2022/envs/hipporag/lib/python3.9/site-packages/torch/include/c10/util/LeftRight.h:8:24: fatal error: shared_mutex: No such file or directory

include

                    ^

compilation terminated. ninja: build stopped: subcommand failed.

yhshu commented 1 month ago

Thanks for your feedback about the requirements. What is your python, torch, cuda versions? If you're using requirements.txt to install environments already, you may need to pay attention to this error:

c++: error: unrecognized command line option ‘-std=c++14’

Is your gcc too old to support this?

66246764 commented 1 month ago

thanks a lot!!!may that's the promblems. i choose "Retrieval Encoder"ways to run, it works!