Closed PrateekSharma007 closed 3 months ago
Thanks for your interest!
We would love to help but need more information to reproduce this error. Did you run the indexing process as explained in the README at least with HotpotQA and ColBERTv2?
Hey! , thanks for responding After doing indexing I am getting this error Can you tell me what's wrong in this , model not found it is showing . thanks!
Hello, have you set colbertv2.0 checkpoints under exp
dir? You could check README.md to do that.
Yeah I already did that but still I am getting the same error like there is no url , invalid username and password .I did what was given in the READ me file.
this is coming basically :
Repository Not Found for url: https://huggingface.co/exp/colbertv2.0/resolve/main/config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
This error is not happening on our side, could you please include the commands you are running in the screenshots so we can better assist you?
I have clone your code, i did the same way which said in the ReadMe.txt, First i have set the environment and then install all the required library: by doing this conda create -n hipporag python=3.9 conda activate hipporag pip install -r requirements.txt
GPU_DEVICES=0,1,2,3 #Replace with your own free GPU Devices export OPENAI_API_KEY='Add your own OpenAI API key here.' export TOGETHER_API_KEY='Add your own TogetherAI API key here.' # If you need to use TogetherAI models such as Llama-3 API
Then download the colbertV2 cd exp wget https://downloads.cs.stanford.edu/nlp/data/colbert/colbertv2/colbertv2.0.tar.gz tar –xvzf colbertv2.0.tar.gz
When i am indexing it with the colbertV2
getting this error:
Repository Not Found for url: https://huggingface.co/exp/colbertv2.0/resolve/main/config.json.
Please make sure you specified the correct repo_id
and repo_type
.
If you are trying to access a private or gated repo, make sure you are authenticated.
exp/colbertv2.0 is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with huggingface-cli login
or by passing token=<your_token>
i have did the huggingface-cli login and set the token then again getting same error.
After that i have tried with the Indexing with HuggingFace Retrieval Encoder for Synonymy Edges (i.e. Contriever) that work properly no error came,
Then i try the run the script of test_hipporag.py which is like import argparse from hipporag import HippoRAG
if name == 'main': parser = argparse.ArgumentParser() parser.add_argument('--dataset', type=str) parser.add_argument('--extraction_model', type=str, default='gpt-3.5-turbo-1106') parser.add_argument('--retrieval_model', type=str, choices=['facebook/contriever', 'colbertv2']) parser.add_argument('--doc_ensemble', action='store_true') args = parser.parse_args()
hipporag = HippoRAG(args.dataset, args.extraction_model, args.retrieval_model, doc_ensemble=args.doc_ensemble)
queries = ["Which Stanford University professor works on Alzheimer's"]
for query in queries:
ranks, scores, logs = hipporag.rank_docs(query, top_k=10)
print(ranks)
print(scores)
but getting error:
(base) drops-ai-model@deeplearning-vm-f2-vm:~/HippoRAG/src$ python3 test_hipporag.py
Traceback (most recent call last):
File "/home/drops-ai-model/HippoRAG/src/test_hipporag.py", line 2, in
This looks like a working directory or environment variable setup issue, where the environment doesn't recognize the HippoRAG root.
E.g., during your log, after you cd exp
, you should return back to the root.
@kartikkMindz can you tell use what command you ran for indexing using ColBERTv2? Did you run bash src/setup_hipporag_colbert.sh $DATA $LLM $GPUS $SYNONYM_THRESH $LLM_API
?
@bernaljg Yes i have run the bash src/setup_hipporag_colbert.sh $DATA $LLM $GPUS $SYNONYM_THRESH $LLM_API for ColBERTv2
Could you send us the whole output which appears after you run bash src/setup_hipporag_colbert.sh $DATA $LLM $GPUS $SYNONYM_THRESH $LLM_API
?
If you can also print out the bash variables using echo $DATA $LLM $GPUS $SYNONYM_THRESH $LLM_API
and send us the output that would be great.
This is the whole output when i am doing the bash src/setup_hipporag_colbert.sh $DATA $LLM $GPUS $SYNONYM_THRESH $LLM_API
Output: (base) drops-ai-model@deeplearning-vm-f2-vm:~/HippoRAG$ bash src/setup_hipporag_colbert.sh $DATA $LLM $GPUS $SYNONYM_THRESH $LLM_API ner_gpt-3.5-turbo-1106_3 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 12052.60it/s] 0it [00:00, ?it/s] | 0/1 [00:00<?, ?it/s] 0it [00:00, ?it/s] 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 15420.24it/s] 0it [00:00, ?it/s] 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 10459.61it/s] 0it [00:00, ?it/s] 0it [00:00, ?it/s] 0it [00:00, ?it/s] 0it [00:00, ?it/s] /opt/conda/lib/python3.10/site-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, /opt/conda/lib/python3.10/site-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide ret = ret.dtype.type(ret / rcount) OpenIE saved to output/openie_sample_results_ner_gpt-3.5-turbo-1106_3.json Passage NER already saved to output/sample_queries.named_entity_output.tsv 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 7719.58it/s] Correct Wiki Format: 0 out of 3 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 9799.78it/s]
[Jun 07, 04:19:18] #> Note: Output directory colbert/indexes/nbits_2 already exists
nranks = 1 num_gpus = 4 device=0 { "query_token_id": "[unused0]", "doc_token_id": "[unused1]", "query_token": "[Q]", "doc_token": "[D]", "ncells": null, "centroid_score_threshold": null, "ndocs": null, "load_index_with_mmap": false, "index_path": null, "index_bsize": 64, "nbits": 2, "kmeans_niters": 4, "resume": false, "similarity": "cosine", "bsize": 64, "accumsteps": 1, "lr": 3e-6, "maxsteps": 500000, "save_every": null, "warmup": null, "warmup_bert": null, "relu": false, "nway": 2, "use_ib_negatives": false, "reranker": false, "distillation_alpha": 1.0, "ignore_scores": false, "model_name": null, "query_maxlen": 32, "attend_to_mask_tokens": false, "interaction": "colbert", "dim": 128, "doc_maxlen": 220, "mask_punctuation": true, "checkpoint": "exp\/colbertv2.0", "triples": null, "collection": "data\/lm_vectors\/colbert\/corpus.tsv", "queries": null, "index_name": "nbits_2", "overwrite": false, "root": "", "experiment": "colbert", "index_root": null, "name": "2024-06\/07\/04.19.16", "rank": 0, "nranks": 1, "amp": true, "gpus": 4, "avoid_fork_if_possible": false } [Jun 07, 04:19:24] #> Loading collection... 0M Process Process-2: Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 286, in hf_raise_for_status response.raise_for_status() File "/opt/conda/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/exp/colbertv2.0/resolve/main/config.json
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 385, in cached_file resolved_file = hf_hub_download( File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn return fn(*args, *kwargs) File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1368, in hf_hub_download raise head_call_error File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1238, in hf_hub_download metadata = get_hf_file_metadata( File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn return fn(args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1631, in get_hf_file_metadata r = _request_wrapper( File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 385, in _request_wrapper response = _request_wrapper( File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 409, in _request_wrapper hf_raise_for_status(response) File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 323, in hf_raise_for_status raise RepositoryNotFoundError(message, response) from e huggingface_hub.utils._errors.RepositoryNotFoundError: 404 Client Error. (Request ID: Root=1-66628a4c-3882575e4704b0d579ddcbc6;5c7c786e-637c-405c-ae02-401d484d99fb)
Repository Not Found for url: https://huggingface.co/exp/colbertv2.0/resolve/main/config.json.
Please make sure you specified the correct repo_id
and repo_type
.
If you are trying to access a private or gated repo, make sure you are authenticated.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, *self._kwargs)
File "/opt/conda/lib/python3.10/site-packages/colbert/infra/launcher.py", line 134, in setup_new_process
return_val = callee(config, args)
File "/opt/conda/lib/python3.10/site-packages/colbert/indexing/collection_indexer.py", line 32, in encode
encoder = CollectionIndexer(config=config, collection=collection, verbose=verbose)
File "/opt/conda/lib/python3.10/site-packages/colbert/indexing/collection_indexer.py", line 52, in init
self.checkpoint = Checkpoint(self.config.checkpoint, colbert_config=self.config)
File "/opt/conda/lib/python3.10/site-packages/colbert/modeling/checkpoint.py", line 19, in init
super().init(name, colbert_config)
File "/opt/conda/lib/python3.10/site-packages/colbert/modeling/colbert.py", line 21, in init
super().init(name, colbert_config)
File "/opt/conda/lib/python3.10/site-packages/colbert/modeling/base_colbert.py", line 36, in init
self.model = HF_ColBERT.from_pretrained(name_or_path, colbert_config=self.colbert_config)
File "/opt/conda/lib/python3.10/site-packages/colbert/modeling/hf_colbert.py", line 133, in from_pretrained
obj = super().from_pretrained(name_or_path, colbert_config=colbert_config)
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2926, in from_pretrained
resolved_config_file = cached_file(
File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 406, in cached_file
raise EnvironmentError(
OSError: exp/colbertv2.0 is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with huggingface-cli login
or by passing token=<your_token>
@kartikkMindz exp/colbertv2.0
is not a HuggingFace model and should be set by ourselves. Could you check if the working directory is set correctly so the transformers
package can find it?
yeah, basically make sure that tar -xvzf colbertv2.0.tar.gz
ran smoothly and created the directory exp/colbertv2.0
with all the necessary model components.
Hi , I am providing you the video link of what i did , i cropped the part of setting of open ai api key . https://drive.google.com/file/d/11d3xlniz7SuR6ku1O7UaWyMXUJy3dbqd/view?usp=sharing
Thank you for your recording.
I think the ColBERT model was not extracted successfully from the tar.gz
file.
This is probably because the command shown in README is using a wrong character -
.
Please get into exp
and extract the model again:
tar -xvzf colbertv2.0.tar.gz
Ohh Thank you soo much
Heyy , the model issue got resolved but I guess I am facing the last error
It's good to hear that.
Please change your working directory to HippoRAG root rather than src
and try again. Thanks!
Yeah , i did this then its showing me that file not found . Test_hipporag is in src folder .
Could you try to add your HippoRAG root path to the Python environment variable?
One way to do this is to add these lines at the top of test_hipporag.py
:
import sys
sys.path.append('.')
Or any other way you'd like to add the path to environment variable PYTHONPATH
.
See I have attached the screenshot . What happening is that it's showing me error in finding the src.hipporag . Even if I change the src.Hipporag there are many files linked which show the same error . I did the change which you said earlier .
Make sure you execute python test_hipporag.py
when your working directory is HippoRAG root, i.e., ~/HippoRAG
in your case.
Yes I did that , it says its unable to find the file . I will add the screenshot
Oh you definitely need to change that to python src/test_hipporag.py
when your dir is HippoRAG root
Sure I will check and update then .
Hey , do I need to change anything in the code or just cloning and running the steps are all good ? I want to try it so used the data which was already given .
For now, I think it's just a matter of the environment in which you're executing the code. Go ahead testing and post any questions you have, please.
I again cloned the repo , so now the errors which were coming earlier are now fixed . This is the last issue i guess I am using colbert so I changed it to that and updated the path of the dataset
This is not a problem with this repo. You must pass the parameter to this program if required is True, default=some value
is just a default value for your reference.
yes the problem is from my side .
when i am running the test_hipporag.py file it give me ranks, scores, log but i want the answer how to print that answer ?
import argparse from src.hipporag import HippoRAG
if name == 'main': parser = argparse.ArgumentParser() parser.add_argument('--dataset', type=str) parser.add_argument('--extraction_model', type=str, default='gpt-3.5-turbo-1106') parser.add_argument('--retrieval_model', type=str, choices=['facebook/contriever', 'colbertv2']) parser.add_argument('--doc_ensemble', action='store_true') args = parser.parse_args()
hipporag = HippoRAG(args.dataset, args.extraction_model, args.retrieval_model, doc_ensemble=args.doc_ensemble)
queries = ["Which Stanford University professor works on Alzheimer's"]
for query in queries:
ranks, scores, logs = hipporag.rank_docs(query, top_k=10)
print(ranks)
print(scores)
@kartikkMindz This is a new issue, and you could start a new post discussing this. I've submitted a PR to update how to use QA. It'll be merged soon. Stay tuned and thanks.
How are you going to use unstructured pdf's , unstructured data ? Right now it's quite specific .
Text is unstructured data. PDF is an important RAG application and we welcome any contributions to that.
When running the test_hipporag.py file , I am getting an error
` python test_hipporag.py gpt-3.5-turbo-1106 colbertv2 hotpotqa ner Traceback (most recent call last): File "/opt/conda/envs/myenv/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 286, in hf_raise_for_status response.raise_for_status() File "/opt/conda/envs/myenv/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/ner/resolve/main/config.json
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/opt/conda/envs/myenv/lib/python3.9/site-packages/transformers/utils/hub.py", line 385, in cached_file resolved_file = hf_hub_download( File "/opt/conda/envs/myenv/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn return fn(*args, *kwargs) File "/opt/conda/envs/myenv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1368, in hf_hub_download raise head_call_error File "/opt/conda/envs/myenv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1238, in hf_hub_download metadata = get_hf_file_metadata( File "/opt/conda/envs/myenv/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn return fn(args, **kwargs) File "/opt/conda/envs/myenv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1631, in get_hf_file_metadata r = _request_wrapper( File "/opt/conda/envs/myenv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 385, in _request_wrapper response = _request_wrapper( File "/opt/conda/envs/myenv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 409, in _request_wrapper hf_raise_for_status(response) File "/opt/conda/envs/myenv/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 323, in hf_raise_for_status raise RepositoryNotFoundError(message, response) from e huggingface_hub.utils._errors.RepositoryNotFoundError: 404 Client Error. (Request ID: Root=1-665ef0e1-047320641ea765c23e417fb3;c716d4af-222e-4a04-8b9c-d50ae2d2ef54)
Repository Not Found for url: https://huggingface.co/ner/resolve/main/config.json. Please make sure you specified the correct
repo_id
andrepo_type
. If you are trying to access a private or gated repo, make sure you are authenticated. ` Can you help me resolve this issue ? I am bit confused .