Searches all fail if vector database enbabled

Just tried to start up a new container and got this in the log files
stat: cannot statx '/var/host-run/docker.sock': No such file or directory
groupadd: invalid group ID 'docker'
usermod: group 'docker' does not exist
Starting Milvus
Starting API
Polling inference server. Awaiting status 200; trying again in 5s. 
Polling inference server. Awaiting status 200; trying again in 5s. 
/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pydantic/_internal/_fields.py:161: UserWarning: Field "model_id" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
Polling inference server. Awaiting status 200; trying again in 5s. 
INFO:     Started server process [527]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     127.0.0.1:56938 - "GET /health HTTP/1.1" 200 OK
Service reachable. Happy chatting!
Detected system cuda
Files are already present on the host. Skipping download.
INFO:     127.0.0.1:33868 - "GET /health HTTP/1.1" 200 OK
Error: 'http://localhost:19530/v1/vector/collections' returned HTTP code 200
Polling inference server. Awaiting status 200; trying again in 5s. 
2024-07-17T00:59:41.839679Z  INFO text_generation_launcher: Args {
    model_id: "microsoft/Phi-3-mini-128k-instruct",
    revision: None,
    validation_workers: 2,
    sharded: None,
    num_shard: None,
    quantize: Some(
        BitsandbytesNF4,
    ),
    speculate: None,
    dtype: None,
    trust_remote_code: true,
    max_concurrent_requests: 128,
    max_best_of: 2,
    max_stop_sequences: 4,
    max_top_n_tokens: 5,
    max_input_tokens: None,
    max_input_length: Some(
        4000,
    ),
    max_total_tokens: Some(
        5000,
    ),
    waiting_served_ratio: 0.3,
    max_batch_prefill_tokens: None,
    max_batch_total_tokens: None,
    max_waiting_tokens: 20,
    max_batch_size: None,
    cuda_graphs: None,
    hostname: "project-hybrid-rag",
    port: 9090,
    shard_uds_path: "/tmp/text-generation-server",
    master_addr: "localhost",
    master_port: 29500,
    huggingface_hub_cache: Some(
        "/data/",
    ),
    weights_cache_override: None,
    disable_custom_kernels: false,
    cuda_memory_fraction: 0.85,
    rope_scaling: None,
    rope_factor: None,
    json_output: false,
    otlp_endpoint: None,
    otlp_service_name: "text-generation-inference.router",
    cors_allow_origin: [],
    watermark_gamma: None,
    watermark_delta: None,
    ngrok: false,
    ngrok_authtoken: None,
    ngrok_edge: None,
    tokenizer_config_path: None,
    disable_grammar_support: false,
    env: false,
    max_client_batch_size: 4,
    lora_adapters: None,
}
2024-07-17T00:59:41.840001Z  INFO hf_hub: Token file not found "/home/workbench/.cache/huggingface/token"    
2024-07-17T00:59:41.978960Z  INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 4050
2024-07-17T00:59:41.978993Z  INFO text_generation_launcher: Bitsandbytes doesn't work with cuda graphs, deactivating them
2024-07-17T00:59:41.978998Z  WARN text_generation_launcher: `trust_remote_code` is set. Trusting that model `microsoft/Phi-3-mini-128k-instruct` do not contain malicious code.
2024-07-17T00:59:41.979124Z  INFO download: text_generation_launcher: Starting check and download process for microsoft/Phi-3-mini-128k-instruct
2024-07-17T00:59:43.077673Z  INFO text_generation_launcher: Detected system cuda
2024-07-17T00:59:44.391796Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-07-17T00:59:44.981997Z  INFO download: text_generation_launcher: Successfully downloaded weights for microsoft/Phi-3-mini-128k-instruct
2024-07-17T00:59:44.982184Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-07-17T00:59:46.299519Z  INFO text_generation_launcher: Detected system cuda
Polling inference server. Awaiting status 200; trying again in 5s. 
Polling inference server. Awaiting status 200; trying again in 5s. 
2024-07-17T00:59:53.870205Z  INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2024-07-17T00:59:53.891604Z  INFO shard-manager: text_generation_launcher: Shard ready in 8.908719315s rank=0
2024-07-17T00:59:53.989970Z  INFO text_generation_launcher: Starting Webserver
2024-07-17T00:59:54.042832Z  INFO text_generation_router: router/src/main.rs:221: Using the Hugging Face API
2024-07-17T00:59:54.043554Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/home/workbench/.cache/huggingface/token"    
2024-07-17T00:59:54.281214Z  INFO text_generation_router: router/src/main.rs:497: Serving revision d548c233192db00165d842bf8edff054bb3212f8 of model microsoft/Phi-3-mini-128k-instruct
2024-07-17T00:59:54.333862Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|endoftext|>' was expected to have ID '32000' but was given ID 'None'    
2024-07-17T00:59:54.333902Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|assistant|>' was expected to have ID '32001' but was given ID 'None'    
2024-07-17T00:59:54.333906Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder1|>' was expected to have ID '32002' but was given ID 'None'    
2024-07-17T00:59:54.333907Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder2|>' was expected to have ID '32003' but was given ID 'None'    
2024-07-17T00:59:54.333908Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder3|>' was expected to have ID '32004' but was given ID 'None'    
2024-07-17T00:59:54.333910Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder4|>' was expected to have ID '32005' but was given ID 'None'    
2024-07-17T00:59:54.333911Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|system|>' was expected to have ID '32006' but was given ID 'None'    
2024-07-17T00:59:54.333918Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|end|>' was expected to have ID '32007' but was given ID 'None'    
2024-07-17T00:59:54.333919Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder5|>' was expected to have ID '32008' but was given ID 'None'    
2024-07-17T00:59:54.333920Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder6|>' was expected to have ID '32009' but was given ID 'None'    
2024-07-17T00:59:54.333922Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|user|>' was expected to have ID '32010' but was given ID 'None'    
2024-07-17T00:59:54.334189Z  INFO text_generation_router: router/src/main.rs:334: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205
2024-07-17T00:59:54.334208Z  INFO text_generation_router: router/src/main.rs:349: Using config Some(Phi3)
2024-07-17T00:59:54.334211Z  WARN text_generation_router: router/src/main.rs:376: Invalid hostname, defaulting to 0.0.0.0
2024-07-17T00:59:54.341850Z  INFO text_generation_router::server: router/src/server.rs:1577: Warming up model
2024-07-17T00:59:55.465534Z  INFO text_generation_launcher: Cuda Graphs are disabled (CUDA_GRAPHS=None).
2024-07-17T00:59:55.466324Z  INFO text_generation_router::server: router/src/server.rs:1604: Using scheduler V3
2024-07-17T00:59:55.466344Z  INFO text_generation_router::server: router/src/server.rs:1656: Setting max batch total tokens to 15104
2024-07-17T00:59:55.484829Z  INFO text_generation_router::server: router/src/server.rs:1894: Connected
Service reachable. Happy chatting!
RPC error: [search], <MilvusException: (code=1, message=failed to search: attempt #0: collection=451127928589389281: collection not loaded: unrecoverable error)>, <Time:{'RPC start': '2024-07-17 01:00:20.654016', 'RPC error': '2024-07-17 01:00:20.658720'}>
Failed to search collection: llamalection
INFO:     127.0.0.1:49716 - "POST /documentSearch HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 758, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 778, in app
    await route.handle(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 299, in handle
    await self.app(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 79, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
    response = await func(request)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 299, in app
    raise e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 294, in app
    raw_response = await run_endpoint_function(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 193, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/project/code/chain_server/server.py", line 134, in document_search
    nodes = retriever.retrieve(data.content)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/core/base_retriever.py", line 224, in retrieve
    nodes = self._retrieve(query_bundle)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 92, in _retrieve
    return self._get_nodes_with_embeddings(query_bundle)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 168, in _get_nodes_with_embeddings
    query_result = self._vector_store.query(query, **self._kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/vector_stores/milvus.py", line 277, in query
    res = self.milvusclient.search(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 259, in search
    raise ex from ex
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 246, in search
    res = conn.search(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 127, in handler
    raise e from e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 123, in handler
    return func(*args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 162, in handler
    return func(self, *args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 102, in handler
    raise e from e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 68, in handler
    return func(*args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 774, in search
    return self._execute_search_requests(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 735, in _execute_search_requests
    raise pre_err from pre_err
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 726, in _execute_search_requests
    raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=failed to search: attempt #0: collection=451127928589389281: collection not loaded: unrecoverable error)>
INFO:     127.0.0.1:50486 - "POST /generate HTTP/1.1" 200 OK
2024-07-17T01:00:44.037608Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("1-nvidia-geforce-rtx-3080-ti"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.0), frequency_penalty: None, top_k: Some(10), top_p: Some(0.999), typical_p: Some(0.95), do_sample: false, max_new_tokens: Some(256), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None, adapter_id: None } total_time="795.391785ms" validation_time="1.243147ms" queue_time="379.27µs" inference_time="793.769498ms" time_per_token="52.917966ms" seed="Some(14154443803876825602)"}: text_generation_router::server: router/src/server.rs:511: Success
INFO:     127.0.0.1:45718 - "POST /generate HTTP/1.1" 200 OK
2024-07-17T01:00:55.506821Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("1-nvidia-geforce-rtx-3080-ti"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.0), frequency_penalty: None, top_k: Some(10), top_p: Some(0.999), typical_p: Some(0.95), do_sample: false, max_new_tokens: Some(256), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None, adapter_id: None } total_time="2.33083849s" validation_time="461.861µs" queue_time="53.164µs" inference_time="2.330323585s" time_per_token="30.263942ms" seed="Some(7620285655049830806)"}: text_generation_router::server: router/src/server.rs:511: Success
RPC error: [search], <MilvusException: (code=1, message=failed to search: attempt #0: collection=451127928589389281: collection not loaded: unrecoverable error)>, <Time:{'RPC start': '2024-07-17 01:01:19.640950', 'RPC error': '2024-07-17 01:01:19.643889'}>
Failed to search collection: llamalection
INFO:     127.0.0.1:47148 - "POST /documentSearch HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 758, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 778, in app
    await route.handle(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 299, in handle
    await self.app(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 79, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
    response = await func(request)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 299, in app
    raise e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 294, in app
    raw_response = await run_endpoint_function(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 193, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/project/code/chain_server/server.py", line 134, in document_search
    nodes = retriever.retrieve(data.content)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/core/base_retriever.py", line 224, in retrieve
    nodes = self._retrieve(query_bundle)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 92, in _retrieve
    return self._get_nodes_with_embeddings(query_bundle)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 168, in _get_nodes_with_embeddings
    query_result = self._vector_store.query(query, **self._kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/vector_stores/milvus.py", line 277, in query
    res = self.milvusclient.search(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 259, in search
    raise ex from ex
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 246, in search
    res = conn.search(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 127, in handler
    raise e from e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 123, in handler
    return func(*args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 162, in handler
    return func(self, *args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 102, in handler
    raise e from e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 68, in handler
    return func(*args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 774, in search
    return self._execute_search_requests(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 735, in _execute_search_requests
    raise pre_err from pre_err
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 726, in _execute_search_requests
    raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=failed to search: attempt #0: collection=451127928589389281: collection not loaded: unrecoverable error)>
[nltk_data] Downloading package punkt to /home/workbench/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/workbench/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
Traceback (most recent call last):
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/connectionpool.py", line 536, in _make_request
    response = conn.getresponse()
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/connection.py", line 464, in getresponse
    httplib_response = super().getresponse()
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/http/client.py", line 1375, in getresponse
    response.begin()
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/http/client.py", line 279, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/socket.py", line 705, in readinto
    return self._sock.recv_into(b)
TimeoutError: timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/connectionpool.py", line 843, in urlopen
    retries = retries.increment(
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/util/retry.py", line 474, in increment
    raise reraise(type(error), error, _stacktrace)
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/util/util.py", line 39, in reraise
    raise value
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/connectionpool.py", line 789, in urlopen
    response = self._make_request(
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/connectionpool.py", line 538, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/connectionpool.py", line 369, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='localhost', port=8000): Read timed out. (read timeout=120)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/workbench/.local/lib/python3.10/site-packages/gradio/queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
  File "/home/workbench/.local/lib/python3.10/site-packages/gradio/route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/workbench/.local/lib/python3.10/site-packages/gradio/blocks.py", line 1561, in process_api
    result = await self.call_function(
  File "/home/workbench/.local/lib/python3.10/site-packages/gradio/blocks.py", line 1179, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "/home/workbench/.local/lib/python3.10/site-packages/gradio/utils.py", line 678, in wrapper
    response = f(*args, **kwargs)
  File "/project/code/chatui/pages/converse.py", line 978, in _document_upload
    file_paths = utils.upload_file(files, client)
  File "/project/code/chatui/pages/utils.py", line 47, in upload_file
    client.upload_documents(file_paths)
  File "/project/code/chatui/chat_client.py", line 118, in upload_documents
    _ = requests.post(
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/requests/adapters.py", line 713, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='localhost', port=8000): Read timed out. (read timeout=120)
INFO:     127.0.0.1:35626 - "POST /uploadDocument HTTP/1.1" 500 Internal Server Error
RPC error: [flush], <MilvusException: (code=1, message=attempt #0: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #1: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #2: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #3: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #4: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #5: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #6: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #7: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #8: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #9: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #10: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #11: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #12: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #13: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #14: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #15: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #16: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #17: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #18: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #19: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #20: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #21: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #22: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #23: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #24: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #25: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #26: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #27: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #28: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #29: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #30: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #31: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #32: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #33: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #34: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #35: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #36: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #37: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #38: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #39: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #40: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #41: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #42: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #43: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #44: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #45: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #46: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #47: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #48: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #49: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #50: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #51: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #52: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #53: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #54: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #55: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #56: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #57: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #58: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #59: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found)>, <Time:{'RPC start': '2024-07-17 01:02:37.353672', 'RPC error': '2024-07-17 01:05:28.429121'}>
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 758, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 778, in app
    await route.handle(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 299, in handle
    await self.app(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 79, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
    response = await func(request)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 299, in app
    raise e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 294, in app
    raw_response = await run_endpoint_function(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/project/code/chain_server/server.py", line 89, in upload_document
    chains.ingest_docs(file_path, upload_file)
  File "/project/code/chain_server/chains.py", line 402, in ingest_docs
    index.insert_nodes(nodes)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 287, in insert_nodes
    self._insert(nodes, **insert_kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 278, in _insert
    self._add_nodes_to_index(self._index_struct, nodes, **insert_kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 200, in _add_nodes_to_index
    new_ids = self._vector_store.add(nodes_batch, **insert_kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/vector_stores/milvus.py", line 201, in add
    self.collection.flush()
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/orm/collection.py", line 314, in flush
    conn.flush([self.name], timeout=timeout, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 127, in handler
    raise e from e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 123, in handler
    return func(*args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 162, in handler
    return func(self, *args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 102, in handler
    raise e from e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 68, in handler
    return func(*args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 1283, in flush
    raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=attempt #0: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #1: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #2: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #3: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #4: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #5: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #6: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #7: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #8: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #9: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #10: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #11: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #12: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #13: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #14: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #15: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #16: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #17: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #18: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #19: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #20: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #21: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #22: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #23: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #24: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #25: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #26: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #27: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #28: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #29: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #30: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #31: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #32: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #33: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #34: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #35: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #36: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #37: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #38: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #39: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #40: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #41: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #42: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #43: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #44: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #45: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #46: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #47: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #48: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #49: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #50: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #51: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #52: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #53: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #54: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #55: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #56: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #57: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #58: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #59: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found)>
INFO:     127.0.0.1:57544 - "POST /generate HTTP/1.1" 200 OK
2024-07-17T01:05:34.575055Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("1-nvidia-geforce-rtx-3080-ti"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.0), frequency_penalty: None, top_k: Some(10), top_p: Some(0.999), typical_p: Some(0.95), do_sample: false, max_new_tokens: Some(256), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None, adapter_id: None } total_time="6.135800332s" validation_time="447.356µs" queue_time="47.348µs" inference_time="6.135305778s" time_per_token="32.985514ms" seed="Some(9174620117177924397)"}: text_generation_router::server: router/src/server.rs:511: Success
2024-07-17T01:05:37.492691Z  INFO text_generation_router::server: router/src/server.rs:1948: signal received, starting graceful shutdown
2024-07-17T01:05:37.504578Z  INFO text_generation_launcher: Terminating webserver
2024-07-17T01:05:37.504611Z  INFO text_generation_launcher: Waiting for webserver to gracefully shutdown
2024-07-17T01:05:37.504629Z  INFO text_generation_launcher: webserver terminated
2024-07-17T01:05:37.504631Z  INFO text_generation_launcher: Shutting down shards
2024-07-17T01:05:37.504927Z  INFO shard-manager: text_generation_launcher: Terminating shard rank=0
2024-07-17T01:05:37.504969Z  INFO shard-manager: text_generation_launcher: Waiting for shard to gracefully shutdown rank=0
2024-07-17T01:05:38.605920Z  INFO shard-manager: text_generation_launcher: shard terminated rank=0
INFO:     127.0.0.1:48004 - "GET /health HTTP/1.1" 200 OK
All URLs returned HTTP code 200
Detected system cuda
Files are already present on the host. Skipping download.
INFO:     127.0.0.1:49270 - "GET /health HTTP/1.1" 200 OK
All URLs returned HTTP code 200
2024-07-17T01:06:48.495811Z  INFO text_generation_launcher: Args {
    model_id: "microsoft/Phi-3-mini-128k-instruct",
    revision: None,
    validation_workers: 2,
    sharded: None,
    num_shard: None,
    quantize: Some(
        BitsandbytesNF4,
    ),
    speculate: None,
    dtype: None,
    trust_remote_code: true,
    max_concurrent_requests: 128,
    max_best_of: 2,
    max_stop_sequences: 4,
    max_top_n_tokens: 5,
    max_input_tokens: None,
    max_input_length: Some(
        4000,
    ),
    max_total_tokens: Some(
        5000,
    ),
    waiting_served_ratio: 0.3,
    max_batch_prefill_tokens: None,
    max_batch_total_tokens: None,
    max_waiting_tokens: 20,
    max_batch_size: None,
    cuda_graphs: None,
    hostname: "project-hybrid-rag",
    port: 9090,
    shard_uds_path: "/tmp/text-generation-server",
    master_addr: "localhost",
    master_port: 29500,
    huggingface_hub_cache: Some(
        "/data/",
    ),
    weights_cache_override: None,
    disable_custom_kernels: false,
    cuda_memory_fraction: 0.85,
    rope_scaling: None,
    rope_factor: None,
    json_output: false,
    otlp_endpoint: None,
    otlp_service_name: "text-generation-inference.router",
    cors_allow_origin: [],
    watermark_gamma: None,
    watermark_delta: None,
    ngrok: false,
    ngrok_authtoken: None,
    ngrok_edge: None,
    tokenizer_config_path: None,
    disable_grammar_support: false,
    env: false,
    max_client_batch_size: 4,
    lora_adapters: None,
}
2024-07-17T01:06:48.495873Z  INFO hf_hub: Token file not found "/home/workbench/.cache/huggingface/token"    
2024-07-17T01:06:48.495939Z  INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 4050
2024-07-17T01:06:48.495956Z  INFO text_generation_launcher: Bitsandbytes doesn't work with cuda graphs, deactivating them
2024-07-17T01:06:48.495958Z  WARN text_generation_launcher: `trust_remote_code` is set. Trusting that model `microsoft/Phi-3-mini-128k-instruct` do not contain malicious code.
2024-07-17T01:06:48.496019Z  INFO download: text_generation_launcher: Starting check and download process for microsoft/Phi-3-mini-128k-instruct
Polling inference server. Awaiting status 200; trying again in 5s. 
2024-07-17T01:06:49.639523Z  INFO text_generation_launcher: Detected system cuda
2024-07-17T01:06:50.981563Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-07-17T01:06:51.499219Z  INFO download: text_generation_launcher: Successfully downloaded weights for microsoft/Phi-3-mini-128k-instruct
2024-07-17T01:06:51.499394Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-07-17T01:06:52.831384Z  INFO text_generation_launcher: Detected system cuda
Polling inference server. Awaiting status 200; trying again in 5s. 
2024-07-17T01:06:57.461016Z  INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2024-07-17T01:06:57.507517Z  INFO shard-manager: text_generation_launcher: Shard ready in 6.007455596s rank=0
2024-07-17T01:06:57.606256Z  INFO text_generation_launcher: Starting Webserver
2024-07-17T01:06:57.627085Z  INFO text_generation_router: router/src/main.rs:221: Using the Hugging Face API
2024-07-17T01:06:57.627118Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/home/workbench/.cache/huggingface/token"    
2024-07-17T01:06:57.831817Z  INFO text_generation_router: router/src/main.rs:497: Serving revision d548c233192db00165d842bf8edff054bb3212f8 of model microsoft/Phi-3-mini-128k-instruct
2024-07-17T01:06:57.871924Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|endoftext|>' was expected to have ID '32000' but was given ID 'None'    
2024-07-17T01:06:57.871947Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|assistant|>' was expected to have ID '32001' but was given ID 'None'    
2024-07-17T01:06:57.871951Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder1|>' was expected to have ID '32002' but was given ID 'None'    
2024-07-17T01:06:57.871952Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder2|>' was expected to have ID '32003' but was given ID 'None'    
2024-07-17T01:06:57.871953Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder3|>' was expected to have ID '32004' but was given ID 'None'    
2024-07-17T01:06:57.871955Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder4|>' was expected to have ID '32005' but was given ID 'None'    
2024-07-17T01:06:57.871962Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|system|>' was expected to have ID '32006' but was given ID 'None'    
2024-07-17T01:06:57.871963Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|end|>' was expected to have ID '32007' but was given ID 'None'    
2024-07-17T01:06:57.871964Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder5|>' was expected to have ID '32008' but was given ID 'None'    
2024-07-17T01:06:57.871966Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder6|>' was expected to have ID '32009' but was given ID 'None'    
2024-07-17T01:06:57.871967Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|user|>' was expected to have ID '32010' but was given ID 'None'    
2024-07-17T01:06:57.872152Z  INFO text_generation_router: router/src/main.rs:334: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205
2024-07-17T01:06:57.872168Z  INFO text_generation_router: router/src/main.rs:349: Using config Some(Phi3)
2024-07-17T01:06:57.872172Z  WARN text_generation_router: router/src/main.rs:376: Invalid hostname, defaulting to 0.0.0.0
2024-07-17T01:06:57.874313Z  INFO text_generation_router::server: router/src/server.rs:1577: Warming up model
Polling inference server. Awaiting status 200; trying again in 5s. 
2024-07-17T01:06:58.809627Z  INFO text_generation_launcher: Cuda Graphs are disabled (CUDA_GRAPHS=None).
2024-07-17T01:06:58.809944Z  INFO text_generation_router::server: router/src/server.rs:1604: Using scheduler V3
2024-07-17T01:06:58.809966Z  INFO text_generation_router::server: router/src/server.rs:1656: Setting max batch total tokens to 15104
2024-07-17T01:06:58.823937Z  INFO text_generation_router::server: router/src/server.rs:1894: Connected
Service reachable. Happy chatting!
INFO:     127.0.0.1:54790 - "POST /generate HTTP/1.1" 200 OK
2024-07-17T01:08:03.213875Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("1-nvidia-geforce-rtx-3080-ti"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.0), frequency_penalty: None, top_k: Some(10), top_p: Some(0.999), typical_p: Some(0.95), do_sample: false, max_new_tokens: Some(256), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None, adapter_id: None } total_time="2.912333311s" validation_time="622.048µs" queue_time="59.019µs" inference_time="2.911652384s" time_per_token="43.457498ms" seed="Some(11963795027966861836)"}: text_generation_router::server: router/src/server.rs:511: Success
RPC error: [search], <MilvusException: (code=1, message=failed to search: attempt #0: failed to search/query delegator 14 for channel by-dev-rootcoord-dml_0_451127928589389281v0: fail to Search, QueryNode ID=14, reason=Timestamp lag too large lag(26h11m50.855s) max(24h0m0s): attempt #1: no available shard delegator found: service unavailable)>, <Time:{'RPC start': '2024-07-17 01:08:23.721437', 'RPC error': '2024-07-17 01:08:24.326594'}>
Failed to search collection: llamalection
INFO:     127.0.0.1:44818 - "POST /documentSearch HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 758, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 778, in app
    await route.handle(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 299, in handle
    await self.app(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 79, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
    response = await func(request)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 299, in app
    raise e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 294, in app
    raw_response = await run_endpoint_function(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 193, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/project/code/chain_server/server.py", line 134, in document_search
    nodes = retriever.retrieve(data.content)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/core/base_retriever.py", line 224, in retrieve
    nodes = self._retrieve(query_bundle)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 92, in _retrieve
    return self._get_nodes_with_embeddings(query_bundle)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 168, in _get_nodes_with_embeddings
    query_result = self._vector_store.query(query, **self._kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/vector_stores/milvus.py", line 277, in query
    res = self.milvusclient.search(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 259, in search
    raise ex from ex
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 246, in search
    res = conn.search(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 127, in handler
    raise e from e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 123, in handler
    return func(*args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 162, in handler
    return func(self, *args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 102, in handler
    raise e from e
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 68, in handler
    return func(*args, **kwargs)
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 774, in search
    return self._execute_search_requests(
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 735, in _execute_search_requests
    raise pre_err from pre_err
  File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 726, in _execute_search_requests
    raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=failed to search: attempt #0: failed to search/query delegator 14 for channel by-dev-rootcoord-dml_0_451127928589389281v0: fail to Search, QueryNode ID=14, reason=Timestamp lag too large lag(26h11m50.855s) max(24h0m0s): attempt #1: no available shard delegator found: service unavailable)>
INFO:     127.0.0.1:45204 - "POST /generate HTTP/1.1" 200 OK
2024-07-17T01:18:50.611618Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("1-nvidia-geforce-rtx-3080-ti"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.0), frequency_penalty: None, top_k: Some(10), top_p: Some(0.999), typical_p: Some(0.95), do_sample: false, max_new_tokens: Some(256), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None, adapter_id: None } total_time="4.686537286s" validation_time="554.105µs" queue_time="41.218µs" inference_time="4.685942523s" time_per_token="47.332752ms" seed="Some(14024934939529428721)"}: text_generation_router::server: router/src/server.rs:511: Success
Terminated
stat: cannot statx '/var/host-run/docker.sock': No such file or directory
groupadd: invalid group ID 'docker'
usermod: group 'docker' does not exist
NVIDIA / workbench-example-hybrid-rag

Searches all fail if vector database enbabled #13

. Problem Report