Marker-Inc-Korea / AutoRAG

RAG AutoML Tool - Find optimal RAG pipeline for your own data.
Apache License 2.0
1.37k stars 120 forks source link

Can I use certain tags from the ollama model in the generator node? (generator node에서 ollama 모델의 특정 태그를 사용 할 수 있나요?) #636

Closed khlee369 closed 2 weeks ago

khlee369 commented 3 weeks ago

I want to use a specific model:tag provided by ollama.

Ref

However, if i use the ollama specific tag as shown below, i got an error.

      modules:
        - module_type: llama_index_llm
          llm: ollama
          model: [llama3, qwen2:72b]
          temperature: 0.7
          batch: 1
error message ``` [08/22/24 12:35:30] ERROR [__init__.py:73] >> Unexpected exception __init__.py:73 ╭───────────────────────────── Traceback (most recent call last) .... ReadTimeout The above exception was the direct cause of the following exception: ╭───────────────────────────── Traceback (most recent call last) ──────────────────────────────╮ │ /root/workspace/corporate-llm/autorag/main.py:27 in │ │ │ │ 24 │ │ 25 │ │ 26 if __name__ == '__main__': │ │ ❱ 27 │ main() │ │ 28 │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/click/core.py:1157 in __call__ │ │ │ │ 1154 │ │ │ 1155 │ def __call__(self, *args: t.Any, **kwargs: t.Any) -> t.Any: │ │ 1156 │ │ """Alias for :meth:`main`.""" │ │ ❱ 1157 │ │ return self.main(*args, **kwargs) │ │ 1158 │ │ 1159 │ │ 1160 class Command(BaseCommand): │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/click/core.py:1078 in main │ │ │ │ 1075 │ │ try: │ │ 1076 │ │ │ try: │ │ 1077 │ │ │ │ with self.make_context(prog_name, args, **extra) as ctx: │ │ ❱ 1078 │ │ │ │ │ rv = self.invoke(ctx) │ │ 1079 │ │ │ │ │ if not standalone_mode: │ │ 1080 │ │ │ │ │ │ return rv │ │ 1081 │ │ │ │ │ # it's not safe to `ctx.exit(rv)` here! │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/click/core.py:1434 in invoke │ │ │ │ 1431 │ │ │ echo(style(message, fg="red"), err=True) │ │ 1432 │ │ │ │ 1433 │ │ if self.callback is not None: │ │ ❱ 1434 │ │ │ return ctx.invoke(self.callback, **ctx.params) │ │ 1435 │ │ │ 1436 │ def shell_complete(self, ctx: Context, incomplete: str) -> t.List["CompletionItem │ │ 1437 │ │ """Return a list of completions for the incomplete value. Looks │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/click/core.py:783 in invoke │ │ │ │ 780 │ │ │ │ 781 │ │ with augment_usage_errors(__self): │ │ 782 │ │ │ with ctx: │ │ ❱ 783 │ │ │ │ return __callback(*args, **kwargs) │ │ 784 │ │ │ 785 │ def forward( │ │ 786 │ │ __self, __cmd: "Command", *args: t.Any, **kwargs: t.Any # noqa: B902 │ │ │ │ /root/workspace/corporate-llm/autorag/main.py:23 in main │ │ │ │ 20 │ if not os.path.exists(project_dir): │ │ 21 │ │ os.makedirs(project_dir) │ │ 22 │ evaluator = Evaluator(qa_data_path, corpus_data_path, project_dir=project_dir) │ │ ❱ 23 │ evaluator.start_trial(config) │ │ 24 │ │ 25 │ │ 26 if __name__ == '__main__': │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/autorag/evaluator.py:98 in start_trial │ │ │ │ 95 │ │ │ if i == 0: │ │ 96 │ │ │ │ previous_result = self.qa_data │ │ 97 │ │ │ logger.info(f'Running node line {node_line_name}...') │ │ ❱ 98 │ │ │ previous_result = run_node_line(node_line, node_line_dir, previous_result) │ │ 99 │ │ │ │ │ 100 │ │ │ trial_summary_df = self._append_node_line_summary(node_line_name, │ │ node_line_dir, trial_summary_df) │ │ 101 │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/autorag/node_line.py:45 in │ │ run_node_line │ │ │ │ 42 │ │ │ 43 │ summary_lst = [] │ │ 44 │ for node in nodes: │ │ ❱ 45 │ │ previous_result = node.run(previous_result, node_line_dir) │ │ 46 │ │ node_summary_df = load_summary_file(os.path.join(node_line_dir, node.node_type, │ │ 'summary.csv')) │ │ 47 │ │ best_node_row = node_summary_df.loc[node_summary_df['is_best']] │ │ 48 │ │ summary_lst.append({ │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/autorag/schema/node.py:57 in run │ │ │ │ 54 │ def run(self, previous_result: pd.DataFrame, node_line_dir: str) -> pd.DataFrame: │ │ 55 │ │ logger.info(f'Running node {self.node_type}...') │ │ 56 │ │ input_modules, input_params = self.get_param_combinations() │ │ ❱ 57 │ │ return self.run_node(modules=input_modules, │ │ 58 │ │ │ │ │ │ │ module_params=input_params, │ │ 59 │ │ │ │ │ │ │ previous_result=previous_result, │ │ 60 │ │ │ │ │ │ │ node_line_dir=node_line_dir, │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/autorag/nodes/generator/run.py:43 in │ │ run_generator_node │ │ │ │ 40 │ │ raise ValueError("You must have 'generation_gt' column in qa.parquet.") │ │ 41 │ generation_gt = list(map(lambda x: x.tolist(), qa_data['generation_gt'].tolist())) │ │ 42 │ │ │ ❱ 43 │ results, execution_times = zip(*map(lambda x: measure_speed( │ │ 44 │ │ x[0], project_dir=project_dir, previous_result=previous_result, **x[1]), │ │ 45 │ │ │ │ │ │ │ │ │ │ zip(modules, module_params))) │ │ 46 │ average_times = list(map(lambda x: x / len(results[0]), execution_times)) │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/autorag/nodes/generator/run.py:43 in │ │ │ │ │ │ 40 │ │ raise ValueError("You must have 'generation_gt' column in qa.parquet.") │ │ 41 │ generation_gt = list(map(lambda x: x.tolist(), qa_data['generation_gt'].tolist())) │ │ 42 │ │ │ ❱ 43 │ results, execution_times = zip(*map(lambda x: measure_speed( │ │ 44 │ │ x[0], project_dir=project_dir, previous_result=previous_result, **x[1]), │ │ 45 │ │ │ │ │ │ │ │ │ │ zip(modules, module_params))) │ │ 46 │ average_times = list(map(lambda x: x / len(results[0]), execution_times)) │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/autorag/strategy.py:14 in │ │ measure_speed │ │ │ │ 11 │ Method for measuring execution speed of the function. │ │ 12 │ """ │ │ 13 │ start_time = time.time() │ │ ❱ 14 │ result = func(*args, **kwargs) │ │ 15 │ end_time = time.time() │ │ 16 │ return result, end_time - start_time │ │ 17 │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/autorag/utils/util.py:56 in wrapper │ │ │ │ 53 │ def decorator_result_to_dataframe(func: Callable): │ │ 54 │ │ @functools.wraps(func) │ │ 55 │ │ def wrapper(*args, **kwargs) -> pd.DataFrame: │ │ ❱ 56 │ │ │ results = func(*args, **kwargs) │ │ 57 │ │ │ if len(column_names) == 1: │ │ 58 │ │ │ │ df_input = {column_names[0]: results} │ │ 59 │ │ │ else: │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/autorag/nodes/generator/base.py:43 in │ │ wrapper │ │ │ │ 40 │ │ │ │ │ │ │ │ "You can check valid llm names from │ │ autorag.generator_models.") │ │ 41 │ │ │ batch = kwargs.pop('batch', 16) │ │ 42 │ │ │ llm_instance = generator_models[llm](**kwargs) │ │ ❱ 43 │ │ │ result = func(prompts=prompts, llm=llm_instance, batch=batch) │ │ 44 │ │ │ del llm_instance │ │ 45 │ │ │ return result │ │ 46 │ │ else: │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/autorag/nodes/generator/llama_index_ll │ │ m.py:30 in llama_index_llm │ │ │ │ 27 │ """ │ │ 28 │ tasks = [llm.acomplete(prompt) for prompt in prompts] │ │ 29 │ loop = asyncio.get_event_loop() │ │ ❱ 30 │ results = loop.run_until_complete(process_batch(tasks, batch_size=batch)) │ │ 31 │ │ │ 32 │ generated_texts = list(map(lambda x: x.text, results)) │ │ 33 │ tokenizer = AutoTokenizer.from_pretrained("gpt2", use_fast=False) │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/asyncio/base_events.py:649 in run_until_complete │ │ │ │ 646 │ │ if not future.done(): │ │ 647 │ │ │ raise RuntimeError('Event loop stopped before Future completed.') │ │ 648 │ │ │ │ ❱ 649 │ │ return future.result() │ │ 650 │ │ │ 651 │ def stop(self): │ │ 652 │ │ """Stop running the event loop. │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/autorag/utils/util.py:273 in │ │ process_batch │ │ │ │ 270 │ │ │ 271 │ for i in range(0, len(tasks), batch_size): │ │ 272 │ │ batch = tasks[i:i + batch_size] │ │ ❱ 273 │ │ batch_results = await asyncio.gather(*batch) │ │ 274 │ │ results.extend(batch_results) │ │ 275 │ │ │ 276 │ return results │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/llama_index/core/instrumentation/dispa │ │ tcher.py:290 in async_wrapper │ │ │ │ 287 │ │ │ │ tags=tags, │ │ 288 │ │ │ ) │ │ 289 │ │ │ try: │ │ ❱ 290 │ │ │ │ result = await func(*args, **kwargs) │ │ 291 │ │ │ except BaseException as e: │ │ 292 │ │ │ │ self.event(SpanDropEvent(span_id=id_, err_str=str(e))) │ │ 293 │ │ │ │ self.span_drop(id_=id_, bound_args=bound_args, instance=instance, err= │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/llama_index/core/llms/callbacks.py:334 │ │ in wrapped_async_llm_predict │ │ │ │ 331 │ │ │ │ ) │ │ 332 │ │ │ │ │ │ 333 │ │ │ │ try: │ │ ❱ 334 │ │ │ │ │ f_return_val = await f(_self, *args, **kwargs) │ │ 335 │ │ │ │ except BaseException as e: │ │ 336 │ │ │ │ │ callback_manager.on_event_end( │ │ 337 │ │ │ │ │ │ CBEventType.LLM, │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/llama_index/llms/ollama/base.py:399 in │ │ acomplete │ │ │ │ 396 │ async def acomplete( │ │ 397 │ │ self, prompt: str, formatted: bool = False, **kwargs: Any │ │ 398 │ ) -> CompletionResponse: │ │ ❱ 399 │ │ return await achat_to_completion_decorator(self.achat)(prompt, **kwargs) │ │ 400 │ │ │ 401 │ @llm_completion_callback() │ │ 402 │ def stream_complete( │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/llama_index/core/base/llms/generic_uti │ │ ls.py:221 in wrapper │ │ │ │ 218 │ async def wrapper(prompt: str, **kwargs: Any) -> CompletionResponse: │ │ 219 │ │ # normalize input │ │ 220 │ │ messages = prompt_to_messages(prompt) │ │ ❱ 221 │ │ chat_response = await func(messages, **kwargs) │ │ 222 │ │ # normalize output │ │ 223 │ │ return chat_response_to_completion_response(chat_response) │ │ 224 │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/llama_index/core/instrumentation/dispa │ │ tcher.py:290 in async_wrapper │ │ │ │ 287 │ │ │ │ tags=tags, │ │ 288 │ │ │ ) │ │ 289 │ │ │ try: │ │ ❱ 290 │ │ │ │ result = await func(*args, **kwargs) │ │ 291 │ │ │ except BaseException as e: │ │ 292 │ │ │ │ self.event(SpanDropEvent(span_id=id_, err_str=str(e))) │ │ 293 │ │ │ │ self.span_drop(id_=id_, bound_args=bound_args, instance=instance, err= │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/llama_index/core/llms/callbacks.py:76 │ │ in wrapped_async_llm_chat │ │ │ │ 73 │ │ │ │ │ }, │ │ 74 │ │ │ │ ) │ │ 75 │ │ │ │ try: │ │ ❱ 76 │ │ │ │ │ f_return_val = await f(_self, messages, **kwargs) │ │ 77 │ │ │ │ except BaseException as e: │ │ 78 │ │ │ │ │ callback_manager.on_event_end( │ │ 79 │ │ │ │ │ │ CBEventType.LLM, │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/llama_index/llms/ollama/base.py:369 in │ │ achat │ │ │ │ 366 │ │ │ │ 367 │ │ tools = kwargs.pop("tools", None) │ │ 368 │ │ │ │ ❱ 369 │ │ response = await self.async_client.chat( │ │ 370 │ │ │ model=self.model, │ │ 371 │ │ │ messages=ollama_messages, │ │ 372 │ │ │ stream=False, │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/ollama/_client.py:653 in chat │ │ │ │ 650 │ if images := message.get('images'): │ │ 651 │ │ message['images'] = [_encode_image(image) for image in images] │ │ 652 │ │ │ ❱ 653 │ return await self._request_stream( │ │ 654 │ 'POST', │ │ 655 │ '/api/chat', │ │ 656 │ json={ │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/ollama/_client.py:517 in │ │ _request_stream │ │ │ │ 514 │ if stream: │ │ 515 │ return await self._stream(*args, **kwargs) │ │ 516 │ │ │ ❱ 517 │ response = await self._request(*args, **kwargs) │ │ 518 │ return response.json() │ │ 519 │ │ 520 @overload │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/ollama/_client.py:482 in _request │ │ │ │ 479 │ super().__init__(httpx.AsyncClient, host, **kwargs) │ │ 480 │ │ 481 async def _request(self, method: str, url: str, **kwargs) -> httpx.Response: │ │ ❱ 482 │ response = await self._client.request(method, url, **kwargs) │ │ 483 │ │ │ 484 │ try: │ │ 485 │ response.raise_for_status() │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/httpx/_client.py:1574 in request │ │ │ │ 1571 │ │ │ timeout=timeout, │ │ 1572 │ │ │ extensions=extensions, │ │ 1573 │ │ ) │ │ ❱ 1574 │ │ return await self.send(request, auth=auth, follow_redirects=follow_redirects) │ │ 1575 │ │ │ 1576 │ @asynccontextmanager │ │ 1577 │ async def stream( │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/httpx/_client.py:1661 in send │ │ │ │ 1658 │ │ │ │ 1659 │ │ auth = self._build_request_auth(request, auth) │ │ 1660 │ │ │ │ ❱ 1661 │ │ response = await self._send_handling_auth( │ │ 1662 │ │ │ request, │ │ 1663 │ │ │ auth=auth, │ │ 1664 │ │ │ follow_redirects=follow_redirects, │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/httpx/_client.py:1689 in │ │ _send_handling_auth │ │ │ │ 1686 │ │ │ request = await auth_flow.__anext__() │ │ 1687 │ │ │ │ │ 1688 │ │ │ while True: │ │ ❱ 1689 │ │ │ │ response = await self._send_handling_redirects( │ │ 1690 │ │ │ │ │ request, │ │ 1691 │ │ │ │ │ follow_redirects=follow_redirects, │ │ 1692 │ │ │ │ │ history=history, │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/httpx/_client.py:1726 in │ │ _send_handling_redirects │ │ │ │ 1723 │ │ │ for hook in self._event_hooks["request"]: │ │ 1724 │ │ │ │ await hook(request) │ │ 1725 │ │ │ │ │ ❱ 1726 │ │ │ response = await self._send_single_request(request) │ │ 1727 │ │ │ try: │ │ 1728 │ │ │ │ for hook in self._event_hooks["response"]: │ │ 1729 │ │ │ │ │ await hook(response) │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/httpx/_client.py:1763 in │ │ _send_single_request │ │ │ │ 1760 │ │ │ ) │ │ 1761 │ │ │ │ 1762 │ │ with request_context(request=request): │ │ ❱ 1763 │ │ │ response = await transport.handle_async_request(request) │ │ 1764 │ │ │ │ 1765 │ │ assert isinstance(response.stream, AsyncByteStream) │ │ 1766 │ │ response.request = request │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/httpx/_transports/default.py:372 in │ │ handle_async_request │ │ │ │ 369 │ │ │ content=request.stream, │ │ 370 │ │ │ extensions=request.extensions, │ │ 371 │ │ ) │ │ ❱ 372 │ │ with map_httpcore_exceptions(): │ │ 373 │ │ │ resp = await self._pool.handle_async_request(req) │ │ 374 │ │ │ │ 375 │ │ assert isinstance(resp.stream, typing.AsyncIterable) │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/contextlib.py:153 in __exit__ │ │ │ │ 150 │ │ │ │ # tell if we get the same exception back │ │ 151 │ │ │ │ value = typ() │ │ 152 │ │ │ try: │ │ ❱ 153 │ │ │ │ self.gen.throw(typ, value, traceback) │ │ 154 │ │ │ except StopIteration as exc: │ │ 155 │ │ │ │ # Suppress StopIteration *unless* it's the same exception that │ │ 156 │ │ │ │ # was passed to throw(). This prevents a StopIteration │ │ │ │ /opt/conda/envs/llm_hugg/lib/python3.10/site-packages/httpx/_transports/default.py:86 in │ │ map_httpcore_exceptions │ │ │ │ 83 │ │ │ raise │ │ 84 │ │ │ │ 85 │ │ message = str(exc) │ │ ❱ 86 │ │ raise mapped_exc(message) from exc │ │ 87 │ │ 88 │ │ 89 HTTPCORE_EXC_MAP = { │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ ReadTimeout sys:1: RuntimeWarning: coroutine 'Dispatcher.span..async_wrapper' was never awaited ```

autorag works when i use untagged model

      modules:
        - module_type: llama_index_llm
          llm: ollama
          model: llama3
          temperature: 0.7
          batch: 1
vkehfdl1 commented 3 weeks ago

@khlee369 Thanks for the bug report. We will investigate why this isn't working

khlee369 commented 2 weeks ago

It seems that the issue is related to the default setting of DEFAULT_REQUEST_TIMEOUT = 30.0 in llama_index.llms.ollama. When loading a model with ollama and performing inference, if the process takes longer than 30 seconds, the coroutine fails to await and the request is immediately terminated.

To temporarily resolve this issue, you can either modify DEFAULT_REQUEST_TIMEOUT = 30.0 in llama_index.llms.ollama.base (e.g., change it to 3000.0), or set the request_timeout in the config.yaml file under the generator node.

      modules:
        - module_type: llama_index_llm
          llm: ollama
          model: qwen2:72b
          temperature: 0.7
          request_timeout: 3000
          batch: 1

Ref

vkehfdl1 commented 2 weeks ago

Close this issue since this is not the code problem. You have to set request_timeout more higher value.