IntelLabs / RAGFoundry

Framework for enhancing LLMs for RAG tasks using fine-tuning.
https://intellabs.github.io/RAGFoundry/
Apache License 2.0
464 stars 29 forks source link

Repeatedly times same Error occured #4

Closed PrathamKumar125 closed 1 week ago

PrathamKumar125 commented 1 month ago

python processing.py -cp configs/paper -cn processing-asqa-retrieval output_path=/result cache=true [2024-08-09 22:06:34,733][main][INFO] - name: asqa_retrieval cache: true output_path: /result steps:

[2024-08-09 22:06:34,733][root][INFO] - Caching state: True 0it [00:00, ?it/s][2024-08-09 22:06:39,275][root][INFO] - Processing step 0 [2024-08-09 22:06:39,275][root][INFO] - Running processing step: HFLoader 1it [00:07, 7.84s/it][2024-08-09 22:06:47,113][root][INFO] - Processing step 1 [2024-08-09 22:06:47,113][root][INFO] - Running processing step: HFLoader 2it [00:13, 6.31s/it][2024-08-09 22:06:52,357][root][INFO] - Processing step 2 [2024-08-09 22:06:52,357][root][INFO] - Loading cached datasets for ASQA [2024-08-09 22:06:52,357][root][INFO] - Loading dataset from checkpoints {'train': '/result/asqa_retrieval_2_ASQA_train_dcd7520ea2bf04caf1de50e73da892f8.json', 'dev': '/result/asqa_retrieval_2_ASQA_dev_dcd7520ea2bf04caf1de50e73da892f8.json'} 3it [00:14, 4.23s/it][2024-08-09 22:06:54,097][root][INFO] - Processing step 3 [2024-08-09 22:06:54,097][root][INFO] - Running processing step: HaystackRetriever [2024-08-09 22:06:54,129][haystack.core.pipeline.base][INFO] - Warming up component text_embedder...:00<?, ? examples/s] [2024-08-09 22:06:54,129][sentence_transformers.SentenceTransformer][INFO] - Load pretrained SentenceTransformer: BAAI/llm-embedder [2024-08-09 22:06:57,464][haystack.core.pipeline.pipeline][INFO] - Running component text_embedder Batches: 100%|███████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.04it/s] [2024-08-09 22:06:58,436][haystack.core.pipeline.pipeline][INFO] - Running component retriever Map: 0%| | 0/4353 [00:04<?, ? examples/s] 3it [00:19, 6.52s/it] | 0/4353 [00:04<?, ? examples/s] Error executing job with overrides: ['output_path=/result', 'cache=true'] Traceback (most recent call last): File "C:\Users\RAGFoundry\env\Lib\site-packages\httpx_transports\default.py", line 69, in map_httpcore_exceptions yield File "C:\Users\RAGFoundry\env\Lib\site-packages\httpx_transports\default.py", line 233, in handle_request resp = self._pool.handle_request(req) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\httpcore_sync\connection_pool.py", line 216, in handle_request raise exc from None File "C:\Users\RAGFoundry\env\Lib\site-packages\httpcore_sync\connection_pool.py", line 196, in handle_request response = connection.handle_request( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\httpcore_sync\connection.py", line 99, in handle_request raise exc File "C:\Users\RAGFoundry\env\Lib\site-packages\httpcore_sync\connection.py", line 76, in handle_request stream = self._connect(request) ^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\httpcore_sync\connection.py", line 122, in _connect stream = self._network_backend.connect_tcp(**kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\httpcore_backends\sync.py", line 205, in connect_tcp with map_exceptions(exc_map): File "C:\Users\prath\AppData\Local\Programs\Python\Python312\Lib\contextlib.py", line 158, in exit self.gen.throw(value) File "C:\Users\RAGFoundry\env\Lib\site-packages\httpcore_exceptions.py", line 14, in map_exceptions raise to_exc(exc) from exc httpcore.ConnectError: [Errno 11001] getaddrinfo failed

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "C:\Users\RAGFoundry\env\Lib\site-packages\qdrant_client\http\api_client.py", line 106, in send_inner response = self._client.send(request) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\httpx_client.py", line 914, in send response = self._send_handling_auth( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\httpx_client.py", line 942, in _send_handling_auth response = self._send_handling_redirects( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\httpx_client.py", line 979, in _send_handling_redirects response = self._send_single_request(request) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\httpx_client.py", line 1015, in _send_single_request response = transport.handle_request(request) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\httpx_transports\default.py", line 232, in handle_request with map_httpcore_exceptions(): File "C:\Users\prath\AppData\Local\Programs\Python\Python312\Lib\contextlib.py", line 158, in exit self.gen.throw(value) File "C:\Users\RAGFoundry\env\Lib\site-packages\httpx_transports\default.py", line 86, in map_httpcore_exceptions raise mapped_exc(message) from exc httpx.ConnectError: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\RAGFoundry\processing.py", line 16, in main pipeline.process() File "C:\Users\RAGFoundry\ragfoundry\processing\pipeline.py", line 135, in process step(self.datasets) File "C:\Users\RAGFoundry\ragfoundry\processing\step.py", line 55, in call self.process_inputs(datasets, kwargs) File "C:\Users\RAGFoundry\ragfoundry\processing\step.py", line 59, in process_inputs self.process(dataset_name, datasets, kwargs) File "C:\Users\RAGFoundry\ragfoundry\processing\step.py", line 79, in process datasets[dataset_name] = datasets[dataset_name].map( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\datasets\arrow_dataset.py", line 592, in wrapper out: Union["Dataset", "DatasetDict"] = func(self, *args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\datasets\arrow_dataset.py", line 557, in wrapper out: Union["Dataset", "DatasetDict"] = func(self, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\datasets\arrow_dataset.py", line 3093, in map for rank, done, content in Dataset._map_single(dataset_kwargs): File "C:\Users\RAGFoundry\env\Lib\site-packages\datasets\arrow_dataset.py", line 3446, in _map_single example = apply_function_on_filtered_inputs(example, i, offset=offset) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\datasets\arrow_dataset.py", line 3349, in apply_function_on_filtered_inputs processed_inputs = function(fn_args, additional_args, fn_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\ragfoundry\processing\step.py", line 80, in lambda item, index: self.process_item(item, index, datasets, kwargs), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\ragfoundry\processing\local_steps\retrievers\haystack.py", line 73, in process_item item[self.docs_key] = self.query(item[self.query_key]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\ragfoundry\processing\local_steps\retrievers\haystack.py", line 55, in query response = self.pipe.run(structure) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\haystack\core\pipeline\pipeline.py", line 249, in run res: Dict[str, Any] = self._run_component(name, last_inputs[name]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\haystack\core\pipeline\pipeline.py", line 76, in _run_component res: Dict[str, Any] = instance.run(inputs) ^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\haystack_integrations\components\retrievers\qdrant\retriever.py", line 142, in run docs = self._document_store._query_by_embedding( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\haystack_integrations\document_stores\qdrant\document_store.py", line 591, in _query_by_embedding points = self.client.search( ^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\haystack_integrations\document_stores\qdrant\document_store.py", line 282, in client self._set_up_collection( File "C:\Users\RAGFoundry\env\Lib\site-packages\haystack_integrations\document_stores\qdrant\document_store.py", line 774, in _set_up_collection if recreate_collection or not self.client.collection_exists(collection_name): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\qdrant_client\qdrant_client.py", line 1768, in collection_exists return self._client.collection_exists(collection_name=collection_name, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\qdrant_client\qdrant_remote.py", line 2249, in collection_exists result: Optional[models.CollectionExistence] = self.http.collections_api.collection_exists( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\qdrant_client\http\api\collections_api.py", line 1157, in collection_exists return self._build_for_collection_exists( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\qdrant_client\http\api\collections_api.py", line 87, in _build_for_collection_exists return self.api_client.request( ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\qdrant_client\http\apiclient.py", line 79, in request return self.send(request, type) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\qdrant_client\http\api_client.py", line 96, in send response = self.middleware(request, self.send_inner) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\qdrant_client\http\api_client.py", line 205, in call return call_next(request) ^^^^^^^^^^^^^^^^^^ File "C:\Users\RAGFoundry\env\Lib\site-packages\qdrant_client\http\api_client.py", line 108, in send_inner raise ResponseHandlingException(e) qdrant_client.http.exceptions.ResponseHandlingException: [Errno 11001] getaddrinfo failed

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack `trace.

danielfleischer commented 1 month ago

This example showcases a RAG workflow for ASQA. As we mentioned in the paper, we augmented the dataset using an external Wikipedia corpus. The configuration makes use of Qdrant, a vector DB, using Haystack, which defines a simple pipeline comprising an embedder and retriever over Qdrant.

The error probably is due to the library not finding a Qdrant server at the provided URL.

If you want to reproduce the experimentation in the paper, you need to have a Wikipedia corpus. For example, you can create an index using a script described in our fastRAG library. Or you can use other frameworks for creating a corpus.

rogerbarretocode commented 3 weeks ago

@danielfleischer i changed the url with my qdrant url . but i am getting a 403 error . where do i set the api key for qdrant ?

danielfleischer commented 3 weeks ago

Qdrant doesn't require authentication when running with default settings. Please consult their documentation and support.