NVIDIA / workbench-example-hybrid-rag

An NVIDIA AI Workbench example project for Retrieval Augmented Generation (RAG)
Apache License 2.0
252 stars 642 forks source link

Searches all fail if vector database enbabled #13

Closed freemansoft closed 3 months ago

freemansoft commented 3 months ago

This is a regression.

. Problem Report

I cleared the cache and rebuilt the container to pick up the latest huggingface update with a fix for running the Microsoft model.

  1. Runs fine if the vector database is disabled.
  2. It always throws an error if the vector database is enabled.
  3. Symptoms are the same for all models

` ERR: Unable to process query.

Message: Expecting value: line 1 column 1 (char 0) `

NFO:     127.0.0.1:54790 - "POST /generate HTTP/1.1" 200 OK
2024-07-17T01:08:03.213875Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("1-nvidia-geforce-rtx-3080-ti"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.0), frequency_penalty: None, top_k: Some(10), top_p: Some(0.999), typical_p: Some(0.95), do_sample: false, max_new_tokens: Some(256), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None, adapter_id: None } total_time="2.912333311s" validation_time="622.048µs" queue_time="59.019µs" inference_time="2.911652384s" time_per_token="43.457498ms" seed="Some(11963795027966861836)"}: text_generation_router::server: router/src/server.rs:511: Success
RPC error: [search], <MilvusException: (code=1, message=failed to search: attempt #0: failed to search/query delegator 14 for channel by-dev-rootcoord-dml_0_451127928589389281v0: fail to Search, QueryNode ID=14, reason=Timestamp lag too large lag(26h11m50.855s) max(24h0m0s): attempt #1: no available shard delegator found: service unavailable)>, <Time:{'RPC start': '2024-07-17 01:08:23.721437', 'RPC error': '2024-07-17 01:08:24.326594'}>
Failed to search collection: llamalection
INFO:     127.0.0.1:44818 - "POST /documentSearch HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application

and

  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 92, in _retrieve
    return self._get_nodes_with_embeddings(query_bundle)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 168, in _get_nodes_with_embeddings
    query_result = self._vector_store.query(query, **self._kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/vector_stores/milvus.py", line 277, in query
    res = self.milvusclient.search(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 259, in search
    raise ex from ex
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 246, in search
    res = conn.search(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 127, in handler
    raise e from e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 123, in handler
    return func(*args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 162, in handler
    return func(self, *args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 102, in handler
    raise e from e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 68, in handler
    return func(*args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 774, in search
    return self._execute_search_requests(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 735, in _execute_search_requests
    raise pre_err from pre_err
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 726, in _execute_search_requests
    raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=failed to search: attempt #0: failed to search/query delegator 14 for channel by-dev-rootcoord-dml_0_451127928589389281v0: fail to Search, QueryNode ID=14, reason=Timestamp lag too large lag(26h11m50.855s) max(24h0m0s): attempt #1: no available shard delegator found: service unavailable)>
freemansoft commented 3 months ago

Just tried to start up a new container and got this in the log files

stat: cannot statx '/var/host-run/docker.sock': No such file or directory
groupadd: invalid group ID 'docker'
usermod: group 'docker' does not exist
Starting Milvus
Starting API
Polling inference server. Awaiting status 200; trying again in 5s. 
Polling inference server. Awaiting status 200; trying again in 5s. 
/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pydantic/_internal/_fields.py:161: UserWarning: Field "model_id" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
Polling inference server. Awaiting status 200; trying again in 5s. 
INFO:     Started server process [527]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     127.0.0.1:56938 - "GET /health HTTP/1.1" 200 OK
Service reachable. Happy chatting!
Detected system cuda
Files are already present on the host. Skipping download.
INFO:     127.0.0.1:33868 - "GET /health HTTP/1.1" 200 OK
Error: 'http://localhost:19530/v1/vector/collections' returned HTTP code 200
Polling inference server. Awaiting status 200; trying again in 5s. 
2024-07-17T00:59:41.839679Z  INFO text_generation_launcher: Args {
    model_id: "microsoft/Phi-3-mini-128k-instruct",
    revision: None,
    validation_workers: 2,
    sharded: None,
    num_shard: None,
    quantize: Some(
        BitsandbytesNF4,
    ),
    speculate: None,
    dtype: None,
    trust_remote_code: true,
    max_concurrent_requests: 128,
    max_best_of: 2,
    max_stop_sequences: 4,
    max_top_n_tokens: 5,
    max_input_tokens: None,
    max_input_length: Some(
        4000,
    ),
    max_total_tokens: Some(
        5000,
    ),
    waiting_served_ratio: 0.3,
    max_batch_prefill_tokens: None,
    max_batch_total_tokens: None,
    max_waiting_tokens: 20,
    max_batch_size: None,
    cuda_graphs: None,
    hostname: "project-hybrid-rag",
    port: 9090,
    shard_uds_path: "/tmp/text-generation-server",
    master_addr: "localhost",
    master_port: 29500,
    huggingface_hub_cache: Some(
        "/data/",
    ),
    weights_cache_override: None,
    disable_custom_kernels: false,
    cuda_memory_fraction: 0.85,
    rope_scaling: None,
    rope_factor: None,
    json_output: false,
    otlp_endpoint: None,
    otlp_service_name: "text-generation-inference.router",
    cors_allow_origin: [],
    watermark_gamma: None,
    watermark_delta: None,
    ngrok: false,
    ngrok_authtoken: None,
    ngrok_edge: None,
    tokenizer_config_path: None,
    disable_grammar_support: false,
    env: false,
    max_client_batch_size: 4,
    lora_adapters: None,
}
2024-07-17T00:59:41.840001Z  INFO hf_hub: Token file not found "/home/workbench/.cache/huggingface/token"    
2024-07-17T00:59:41.978960Z  INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 4050
2024-07-17T00:59:41.978993Z  INFO text_generation_launcher: Bitsandbytes doesn't work with cuda graphs, deactivating them
2024-07-17T00:59:41.978998Z  WARN text_generation_launcher: `trust_remote_code` is set. Trusting that model `microsoft/Phi-3-mini-128k-instruct` do not contain malicious code.
2024-07-17T00:59:41.979124Z  INFO download: text_generation_launcher: Starting check and download process for microsoft/Phi-3-mini-128k-instruct
2024-07-17T00:59:43.077673Z  INFO text_generation_launcher: Detected system cuda
2024-07-17T00:59:44.391796Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-07-17T00:59:44.981997Z  INFO download: text_generation_launcher: Successfully downloaded weights for microsoft/Phi-3-mini-128k-instruct
2024-07-17T00:59:44.982184Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-07-17T00:59:46.299519Z  INFO text_generation_launcher: Detected system cuda
Polling inference server. Awaiting status 200; trying again in 5s. 
Polling inference server. Awaiting status 200; trying again in 5s. 
2024-07-17T00:59:53.870205Z  INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2024-07-17T00:59:53.891604Z  INFO shard-manager: text_generation_launcher: Shard ready in 8.908719315s rank=0
2024-07-17T00:59:53.989970Z  INFO text_generation_launcher: Starting Webserver
2024-07-17T00:59:54.042832Z  INFO text_generation_router: router/src/main.rs:221: Using the Hugging Face API
2024-07-17T00:59:54.043554Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/home/workbench/.cache/huggingface/token"    
2024-07-17T00:59:54.281214Z  INFO text_generation_router: router/src/main.rs:497: Serving revision d548c233192db00165d842bf8edff054bb3212f8 of model microsoft/Phi-3-mini-128k-instruct
2024-07-17T00:59:54.333862Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|endoftext|>' was expected to have ID '32000' but was given ID 'None'    
2024-07-17T00:59:54.333902Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|assistant|>' was expected to have ID '32001' but was given ID 'None'    
2024-07-17T00:59:54.333906Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder1|>' was expected to have ID '32002' but was given ID 'None'    
2024-07-17T00:59:54.333907Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder2|>' was expected to have ID '32003' but was given ID 'None'    
2024-07-17T00:59:54.333908Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder3|>' was expected to have ID '32004' but was given ID 'None'    
2024-07-17T00:59:54.333910Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder4|>' was expected to have ID '32005' but was given ID 'None'    
2024-07-17T00:59:54.333911Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|system|>' was expected to have ID '32006' but was given ID 'None'    
2024-07-17T00:59:54.333918Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|end|>' was expected to have ID '32007' but was given ID 'None'    
2024-07-17T00:59:54.333919Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder5|>' was expected to have ID '32008' but was given ID 'None'    
2024-07-17T00:59:54.333920Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder6|>' was expected to have ID '32009' but was given ID 'None'    
2024-07-17T00:59:54.333922Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|user|>' was expected to have ID '32010' but was given ID 'None'    
2024-07-17T00:59:54.334189Z  INFO text_generation_router: router/src/main.rs:334: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205
2024-07-17T00:59:54.334208Z  INFO text_generation_router: router/src/main.rs:349: Using config Some(Phi3)
2024-07-17T00:59:54.334211Z  WARN text_generation_router: router/src/main.rs:376: Invalid hostname, defaulting to 0.0.0.0
2024-07-17T00:59:54.341850Z  INFO text_generation_router::server: router/src/server.rs:1577: Warming up model
2024-07-17T00:59:55.465534Z  INFO text_generation_launcher: Cuda Graphs are disabled (CUDA_GRAPHS=None).
2024-07-17T00:59:55.466324Z  INFO text_generation_router::server: router/src/server.rs:1604: Using scheduler V3
2024-07-17T00:59:55.466344Z  INFO text_generation_router::server: router/src/server.rs:1656: Setting max batch total tokens to 15104
2024-07-17T00:59:55.484829Z  INFO text_generation_router::server: router/src/server.rs:1894: Connected
Service reachable. Happy chatting!
RPC error: [search], <MilvusException: (code=1, message=failed to search: attempt #0: collection=451127928589389281: collection not loaded: unrecoverable error)>, <Time:{'RPC start': '2024-07-17 01:00:20.654016', 'RPC error': '2024-07-17 01:00:20.658720'}>
Failed to search collection: llamalection
INFO:     127.0.0.1:49716 - "POST /documentSearch HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 758, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 778, in app
    await route.handle(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 299, in handle
    await self.app(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 79, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
    response = await func(request)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 299, in app
    raise e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 294, in app
    raw_response = await run_endpoint_function(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 193, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/project/code/chain_server/server.py", line 134, in document_search
    nodes = retriever.retrieve(data.content)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/core/base_retriever.py", line 224, in retrieve
    nodes = self._retrieve(query_bundle)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 92, in _retrieve
    return self._get_nodes_with_embeddings(query_bundle)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 168, in _get_nodes_with_embeddings
    query_result = self._vector_store.query(query, **self._kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/vector_stores/milvus.py", line 277, in query
    res = self.milvusclient.search(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 259, in search
    raise ex from ex
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 246, in search
    res = conn.search(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 127, in handler
    raise e from e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 123, in handler
    return func(*args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 162, in handler
    return func(self, *args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 102, in handler
    raise e from e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 68, in handler
    return func(*args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 774, in search
    return self._execute_search_requests(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 735, in _execute_search_requests
    raise pre_err from pre_err
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 726, in _execute_search_requests
    raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=failed to search: attempt #0: collection=451127928589389281: collection not loaded: unrecoverable error)>
INFO:     127.0.0.1:50486 - "POST /generate HTTP/1.1" 200 OK
2024-07-17T01:00:44.037608Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("1-nvidia-geforce-rtx-3080-ti"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.0), frequency_penalty: None, top_k: Some(10), top_p: Some(0.999), typical_p: Some(0.95), do_sample: false, max_new_tokens: Some(256), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None, adapter_id: None } total_time="795.391785ms" validation_time="1.243147ms" queue_time="379.27µs" inference_time="793.769498ms" time_per_token="52.917966ms" seed="Some(14154443803876825602)"}: text_generation_router::server: router/src/server.rs:511: Success
INFO:     127.0.0.1:45718 - "POST /generate HTTP/1.1" 200 OK
2024-07-17T01:00:55.506821Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("1-nvidia-geforce-rtx-3080-ti"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.0), frequency_penalty: None, top_k: Some(10), top_p: Some(0.999), typical_p: Some(0.95), do_sample: false, max_new_tokens: Some(256), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None, adapter_id: None } total_time="2.33083849s" validation_time="461.861µs" queue_time="53.164µs" inference_time="2.330323585s" time_per_token="30.263942ms" seed="Some(7620285655049830806)"}: text_generation_router::server: router/src/server.rs:511: Success
RPC error: [search], <MilvusException: (code=1, message=failed to search: attempt #0: collection=451127928589389281: collection not loaded: unrecoverable error)>, <Time:{'RPC start': '2024-07-17 01:01:19.640950', 'RPC error': '2024-07-17 01:01:19.643889'}>
Failed to search collection: llamalection
INFO:     127.0.0.1:47148 - "POST /documentSearch HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 758, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 778, in app
    await route.handle(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 299, in handle
    await self.app(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 79, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
    response = await func(request)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 299, in app
    raise e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 294, in app
    raw_response = await run_endpoint_function(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 193, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/project/code/chain_server/server.py", line 134, in document_search
    nodes = retriever.retrieve(data.content)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/core/base_retriever.py", line 224, in retrieve
    nodes = self._retrieve(query_bundle)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 92, in _retrieve
    return self._get_nodes_with_embeddings(query_bundle)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 168, in _get_nodes_with_embeddings
    query_result = self._vector_store.query(query, **self._kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/vector_stores/milvus.py", line 277, in query
    res = self.milvusclient.search(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 259, in search
    raise ex from ex
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 246, in search
    res = conn.search(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 127, in handler
    raise e from e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 123, in handler
    return func(*args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 162, in handler
    return func(self, *args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 102, in handler
    raise e from e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 68, in handler
    return func(*args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 774, in search
    return self._execute_search_requests(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 735, in _execute_search_requests
    raise pre_err from pre_err
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 726, in _execute_search_requests
    raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=failed to search: attempt #0: collection=451127928589389281: collection not loaded: unrecoverable error)>
[nltk_data] Downloading package punkt to /home/workbench/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/workbench/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
Traceback (most recent call last):
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/connectionpool.py", line 536, in _make_request
    response = conn.getresponse()
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/connection.py", line 464, in getresponse
    httplib_response = super().getresponse()
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/http/client.py", line 1375, in getresponse
    response.begin()
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/http/client.py", line 279, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/socket.py", line 705, in readinto
    return self._sock.recv_into(b)
TimeoutError: timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/connectionpool.py", line 843, in urlopen
    retries = retries.increment(
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/util/retry.py", line 474, in increment
    raise reraise(type(error), error, _stacktrace)
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/util/util.py", line 39, in reraise
    raise value
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/connectionpool.py", line 789, in urlopen
    response = self._make_request(
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/connectionpool.py", line 538, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/connectionpool.py", line 369, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='localhost', port=8000): Read timed out. (read timeout=120)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/workbench/.local/lib/python3.10/site-packages/gradio/queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
  File "/home/workbench/.local/lib/python3.10/site-packages/gradio/route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/workbench/.local/lib/python3.10/site-packages/gradio/blocks.py", line 1561, in process_api
    result = await self.call_function(
  File "/home/workbench/.local/lib/python3.10/site-packages/gradio/blocks.py", line 1179, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "/home/workbench/.local/lib/python3.10/site-packages/gradio/utils.py", line 678, in wrapper
    response = f(*args, **kwargs)
  File "/project/code/chatui/pages/converse.py", line 978, in _document_upload
    file_paths = utils.upload_file(files, client)
  File "/project/code/chatui/pages/utils.py", line 47, in upload_file
    client.upload_documents(file_paths)
  File "/project/code/chatui/chat_client.py", line 118, in upload_documents
    _ = requests.post(
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/requests/adapters.py", line 713, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='localhost', port=8000): Read timed out. (read timeout=120)
INFO:     127.0.0.1:35626 - "POST /uploadDocument HTTP/1.1" 500 Internal Server Error
RPC error: [flush], <MilvusException: (code=1, message=attempt #0: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #1: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #2: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #3: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #4: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #5: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #6: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #7: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #8: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #9: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #10: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #11: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #12: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #13: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #14: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #15: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #16: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #17: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #18: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #19: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #20: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #21: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #22: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #23: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #24: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #25: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #26: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #27: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #28: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #29: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #30: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #31: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #32: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #33: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #34: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #35: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #36: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #37: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #38: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #39: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #40: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #41: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #42: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #43: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #44: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #45: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #46: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #47: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #48: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #49: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #50: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #51: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #52: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #53: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #54: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #55: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #56: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #57: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #58: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #59: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found)>, <Time:{'RPC start': '2024-07-17 01:02:37.353672', 'RPC error': '2024-07-17 01:05:28.429121'}>
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 758, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 778, in app
    await route.handle(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 299, in handle
    await self.app(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 79, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
    response = await func(request)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 299, in app
    raise e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 294, in app
    raw_response = await run_endpoint_function(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/project/code/chain_server/server.py", line 89, in upload_document
    chains.ingest_docs(file_path, upload_file)
  File "/project/code/chain_server/chains.py", line 402, in ingest_docs
    index.insert_nodes(nodes)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 287, in insert_nodes
    self._insert(nodes, **insert_kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 278, in _insert
    self._add_nodes_to_index(self._index_struct, nodes, **insert_kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 200, in _add_nodes_to_index
    new_ids = self._vector_store.add(nodes_batch, **insert_kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/vector_stores/milvus.py", line 201, in add
    self.collection.flush()
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/orm/collection.py", line 314, in flush
    conn.flush([self.name], timeout=timeout, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 127, in handler
    raise e from e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 123, in handler
    return func(*args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 162, in handler
    return func(self, *args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 102, in handler
    raise e from e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 68, in handler
    return func(*args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 1283, in flush
    raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=attempt #0: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #1: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #2: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #3: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #4: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #5: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #6: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #7: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #8: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #9: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #10: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #11: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #12: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #13: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #14: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #15: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #16: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #17: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #18: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #19: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #20: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #21: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #22: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #23: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #24: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #25: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #26: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #27: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #28: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #29: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #30: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #31: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #32: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #33: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #34: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #35: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #36: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #37: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #38: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #39: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #40: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #41: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #42: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #43: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #44: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #45: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #46: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #47: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #48: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #49: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #50: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #51: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #52: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #53: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #54: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #55: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #56: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #57: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #58: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #59: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found)>
INFO:     127.0.0.1:57544 - "POST /generate HTTP/1.1" 200 OK
2024-07-17T01:05:34.575055Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("1-nvidia-geforce-rtx-3080-ti"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.0), frequency_penalty: None, top_k: Some(10), top_p: Some(0.999), typical_p: Some(0.95), do_sample: false, max_new_tokens: Some(256), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None, adapter_id: None } total_time="6.135800332s" validation_time="447.356µs" queue_time="47.348µs" inference_time="6.135305778s" time_per_token="32.985514ms" seed="Some(9174620117177924397)"}: text_generation_router::server: router/src/server.rs:511: Success
2024-07-17T01:05:37.492691Z  INFO text_generation_router::server: router/src/server.rs:1948: signal received, starting graceful shutdown
2024-07-17T01:05:37.504578Z  INFO text_generation_launcher: Terminating webserver
2024-07-17T01:05:37.504611Z  INFO text_generation_launcher: Waiting for webserver to gracefully shutdown
2024-07-17T01:05:37.504629Z  INFO text_generation_launcher: webserver terminated
2024-07-17T01:05:37.504631Z  INFO text_generation_launcher: Shutting down shards
2024-07-17T01:05:37.504927Z  INFO shard-manager: text_generation_launcher: Terminating shard rank=0
2024-07-17T01:05:37.504969Z  INFO shard-manager: text_generation_launcher: Waiting for shard to gracefully shutdown rank=0
2024-07-17T01:05:38.605920Z  INFO shard-manager: text_generation_launcher: shard terminated rank=0
INFO:     127.0.0.1:48004 - "GET /health HTTP/1.1" 200 OK
All URLs returned HTTP code 200
Detected system cuda
Files are already present on the host. Skipping download.
INFO:     127.0.0.1:49270 - "GET /health HTTP/1.1" 200 OK
All URLs returned HTTP code 200
2024-07-17T01:06:48.495811Z  INFO text_generation_launcher: Args {
    model_id: "microsoft/Phi-3-mini-128k-instruct",
    revision: None,
    validation_workers: 2,
    sharded: None,
    num_shard: None,
    quantize: Some(
        BitsandbytesNF4,
    ),
    speculate: None,
    dtype: None,
    trust_remote_code: true,
    max_concurrent_requests: 128,
    max_best_of: 2,
    max_stop_sequences: 4,
    max_top_n_tokens: 5,
    max_input_tokens: None,
    max_input_length: Some(
        4000,
    ),
    max_total_tokens: Some(
        5000,
    ),
    waiting_served_ratio: 0.3,
    max_batch_prefill_tokens: None,
    max_batch_total_tokens: None,
    max_waiting_tokens: 20,
    max_batch_size: None,
    cuda_graphs: None,
    hostname: "project-hybrid-rag",
    port: 9090,
    shard_uds_path: "/tmp/text-generation-server",
    master_addr: "localhost",
    master_port: 29500,
    huggingface_hub_cache: Some(
        "/data/",
    ),
    weights_cache_override: None,
    disable_custom_kernels: false,
    cuda_memory_fraction: 0.85,
    rope_scaling: None,
    rope_factor: None,
    json_output: false,
    otlp_endpoint: None,
    otlp_service_name: "text-generation-inference.router",
    cors_allow_origin: [],
    watermark_gamma: None,
    watermark_delta: None,
    ngrok: false,
    ngrok_authtoken: None,
    ngrok_edge: None,
    tokenizer_config_path: None,
    disable_grammar_support: false,
    env: false,
    max_client_batch_size: 4,
    lora_adapters: None,
}
2024-07-17T01:06:48.495873Z  INFO hf_hub: Token file not found "/home/workbench/.cache/huggingface/token"    
2024-07-17T01:06:48.495939Z  INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 4050
2024-07-17T01:06:48.495956Z  INFO text_generation_launcher: Bitsandbytes doesn't work with cuda graphs, deactivating them
2024-07-17T01:06:48.495958Z  WARN text_generation_launcher: `trust_remote_code` is set. Trusting that model `microsoft/Phi-3-mini-128k-instruct` do not contain malicious code.
2024-07-17T01:06:48.496019Z  INFO download: text_generation_launcher: Starting check and download process for microsoft/Phi-3-mini-128k-instruct
Polling inference server. Awaiting status 200; trying again in 5s. 
2024-07-17T01:06:49.639523Z  INFO text_generation_launcher: Detected system cuda
2024-07-17T01:06:50.981563Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-07-17T01:06:51.499219Z  INFO download: text_generation_launcher: Successfully downloaded weights for microsoft/Phi-3-mini-128k-instruct
2024-07-17T01:06:51.499394Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-07-17T01:06:52.831384Z  INFO text_generation_launcher: Detected system cuda
Polling inference server. Awaiting status 200; trying again in 5s. 
2024-07-17T01:06:57.461016Z  INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2024-07-17T01:06:57.507517Z  INFO shard-manager: text_generation_launcher: Shard ready in 6.007455596s rank=0
2024-07-17T01:06:57.606256Z  INFO text_generation_launcher: Starting Webserver
2024-07-17T01:06:57.627085Z  INFO text_generation_router: router/src/main.rs:221: Using the Hugging Face API
2024-07-17T01:06:57.627118Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/home/workbench/.cache/huggingface/token"    
2024-07-17T01:06:57.831817Z  INFO text_generation_router: router/src/main.rs:497: Serving revision d548c233192db00165d842bf8edff054bb3212f8 of model microsoft/Phi-3-mini-128k-instruct
2024-07-17T01:06:57.871924Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|endoftext|>' was expected to have ID '32000' but was given ID 'None'    
2024-07-17T01:06:57.871947Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|assistant|>' was expected to have ID '32001' but was given ID 'None'    
2024-07-17T01:06:57.871951Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder1|>' was expected to have ID '32002' but was given ID 'None'    
2024-07-17T01:06:57.871952Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder2|>' was expected to have ID '32003' but was given ID 'None'    
2024-07-17T01:06:57.871953Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder3|>' was expected to have ID '32004' but was given ID 'None'    
2024-07-17T01:06:57.871955Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder4|>' was expected to have ID '32005' but was given ID 'None'    
2024-07-17T01:06:57.871962Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|system|>' was expected to have ID '32006' but was given ID 'None'    
2024-07-17T01:06:57.871963Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|end|>' was expected to have ID '32007' but was given ID 'None'    
2024-07-17T01:06:57.871964Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder5|>' was expected to have ID '32008' but was given ID 'None'    
2024-07-17T01:06:57.871966Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder6|>' was expected to have ID '32009' but was given ID 'None'    
2024-07-17T01:06:57.871967Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|user|>' was expected to have ID '32010' but was given ID 'None'    
2024-07-17T01:06:57.872152Z  INFO text_generation_router: router/src/main.rs:334: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205
2024-07-17T01:06:57.872168Z  INFO text_generation_router: router/src/main.rs:349: Using config Some(Phi3)
2024-07-17T01:06:57.872172Z  WARN text_generation_router: router/src/main.rs:376: Invalid hostname, defaulting to 0.0.0.0
2024-07-17T01:06:57.874313Z  INFO text_generation_router::server: router/src/server.rs:1577: Warming up model
Polling inference server. Awaiting status 200; trying again in 5s. 
2024-07-17T01:06:58.809627Z  INFO text_generation_launcher: Cuda Graphs are disabled (CUDA_GRAPHS=None).
2024-07-17T01:06:58.809944Z  INFO text_generation_router::server: router/src/server.rs:1604: Using scheduler V3
2024-07-17T01:06:58.809966Z  INFO text_generation_router::server: router/src/server.rs:1656: Setting max batch total tokens to 15104
2024-07-17T01:06:58.823937Z  INFO text_generation_router::server: router/src/server.rs:1894: Connected
Service reachable. Happy chatting!
INFO:     127.0.0.1:54790 - "POST /generate HTTP/1.1" 200 OK
2024-07-17T01:08:03.213875Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("1-nvidia-geforce-rtx-3080-ti"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.0), frequency_penalty: None, top_k: Some(10), top_p: Some(0.999), typical_p: Some(0.95), do_sample: false, max_new_tokens: Some(256), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None, adapter_id: None } total_time="2.912333311s" validation_time="622.048µs" queue_time="59.019µs" inference_time="2.911652384s" time_per_token="43.457498ms" seed="Some(11963795027966861836)"}: text_generation_router::server: router/src/server.rs:511: Success
RPC error: [search], <MilvusException: (code=1, message=failed to search: attempt #0: failed to search/query delegator 14 for channel by-dev-rootcoord-dml_0_451127928589389281v0: fail to Search, QueryNode ID=14, reason=Timestamp lag too large lag(26h11m50.855s) max(24h0m0s): attempt #1: no available shard delegator found: service unavailable)>, <Time:{'RPC start': '2024-07-17 01:08:23.721437', 'RPC error': '2024-07-17 01:08:24.326594'}>
Failed to search collection: llamalection
INFO:     127.0.0.1:44818 - "POST /documentSearch HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 758, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 778, in app
    await route.handle(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 299, in handle
    await self.app(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 79, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
    response = await func(request)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 299, in app
    raise e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 294, in app
    raw_response = await run_endpoint_function(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 193, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/project/code/chain_server/server.py", line 134, in document_search
    nodes = retriever.retrieve(data.content)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/core/base_retriever.py", line 224, in retrieve
    nodes = self._retrieve(query_bundle)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 92, in _retrieve
    return self._get_nodes_with_embeddings(query_bundle)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 168, in _get_nodes_with_embeddings
    query_result = self._vector_store.query(query, **self._kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/vector_stores/milvus.py", line 277, in query
    res = self.milvusclient.search(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 259, in search
    raise ex from ex
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 246, in search
    res = conn.search(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 127, in handler
    raise e from e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 123, in handler
    return func(*args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 162, in handler
    return func(self, *args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 102, in handler
    raise e from e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 68, in handler
    return func(*args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 774, in search
    return self._execute_search_requests(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 735, in _execute_search_requests
    raise pre_err from pre_err
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 726, in _execute_search_requests
    raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=failed to search: attempt #0: failed to search/query delegator 14 for channel by-dev-rootcoord-dml_0_451127928589389281v0: fail to Search, QueryNode ID=14, reason=Timestamp lag too large lag(26h11m50.855s) max(24h0m0s): attempt #1: no available shard delegator found: service unavailable)>
INFO:     127.0.0.1:45204 - "POST /generate HTTP/1.1" 200 OK
2024-07-17T01:18:50.611618Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("1-nvidia-geforce-rtx-3080-ti"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.0), frequency_penalty: None, top_k: Some(10), top_p: Some(0.999), typical_p: Some(0.95), do_sample: false, max_new_tokens: Some(256), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None, adapter_id: None } total_time="4.686537286s" validation_time="554.105µs" queue_time="41.218µs" inference_time="4.685942523s" time_per_token="47.332752ms" seed="Some(14024934939529428721)"}: text_generation_router::server: router/src/server.rs:511: Success
Terminated
stat: cannot statx '/var/host-run/docker.sock': No such file or directory
groupadd: invalid group ID 'docker'
usermod: group 'docker' does not exist
freemansoft commented 3 months ago

Either this incremental library version update fixed it or I smashed enough buttons and don't know what happened.

I was finally able to get this working by changing milvus[client] to from version 2.3.2 to version 2.3.5