Closed freemansoft closed 3 months ago
Just tried to start up a new container and got this in the log files
stat: cannot statx '/var/host-run/docker.sock': No such file or directory
groupadd: invalid group ID 'docker'
usermod: group 'docker' does not exist
Starting Milvus
Starting API
Polling inference server. Awaiting status 200; trying again in 5s.
Polling inference server. Awaiting status 200; trying again in 5s.
/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pydantic/_internal/_fields.py:161: UserWarning: Field "model_id" has conflict with protected namespace "model_".
You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
warnings.warn(
Polling inference server. Awaiting status 200; trying again in 5s.
INFO: Started server process [527]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO: 127.0.0.1:56938 - "GET /health HTTP/1.1" 200 OK
Service reachable. Happy chatting!
Detected system cuda
Files are already present on the host. Skipping download.
INFO: 127.0.0.1:33868 - "GET /health HTTP/1.1" 200 OK
Error: 'http://localhost:19530/v1/vector/collections' returned HTTP code 200
Polling inference server. Awaiting status 200; trying again in 5s.
2024-07-17T00:59:41.839679Z INFO text_generation_launcher: Args {
model_id: "microsoft/Phi-3-mini-128k-instruct",
revision: None,
validation_workers: 2,
sharded: None,
num_shard: None,
quantize: Some(
BitsandbytesNF4,
),
speculate: None,
dtype: None,
trust_remote_code: true,
max_concurrent_requests: 128,
max_best_of: 2,
max_stop_sequences: 4,
max_top_n_tokens: 5,
max_input_tokens: None,
max_input_length: Some(
4000,
),
max_total_tokens: Some(
5000,
),
waiting_served_ratio: 0.3,
max_batch_prefill_tokens: None,
max_batch_total_tokens: None,
max_waiting_tokens: 20,
max_batch_size: None,
cuda_graphs: None,
hostname: "project-hybrid-rag",
port: 9090,
shard_uds_path: "/tmp/text-generation-server",
master_addr: "localhost",
master_port: 29500,
huggingface_hub_cache: Some(
"/data/",
),
weights_cache_override: None,
disable_custom_kernels: false,
cuda_memory_fraction: 0.85,
rope_scaling: None,
rope_factor: None,
json_output: false,
otlp_endpoint: None,
otlp_service_name: "text-generation-inference.router",
cors_allow_origin: [],
watermark_gamma: None,
watermark_delta: None,
ngrok: false,
ngrok_authtoken: None,
ngrok_edge: None,
tokenizer_config_path: None,
disable_grammar_support: false,
env: false,
max_client_batch_size: 4,
lora_adapters: None,
}
2024-07-17T00:59:41.840001Z INFO hf_hub: Token file not found "/home/workbench/.cache/huggingface/token"
2024-07-17T00:59:41.978960Z INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 4050
2024-07-17T00:59:41.978993Z INFO text_generation_launcher: Bitsandbytes doesn't work with cuda graphs, deactivating them
2024-07-17T00:59:41.978998Z WARN text_generation_launcher: `trust_remote_code` is set. Trusting that model `microsoft/Phi-3-mini-128k-instruct` do not contain malicious code.
2024-07-17T00:59:41.979124Z INFO download: text_generation_launcher: Starting check and download process for microsoft/Phi-3-mini-128k-instruct
2024-07-17T00:59:43.077673Z INFO text_generation_launcher: Detected system cuda
2024-07-17T00:59:44.391796Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-07-17T00:59:44.981997Z INFO download: text_generation_launcher: Successfully downloaded weights for microsoft/Phi-3-mini-128k-instruct
2024-07-17T00:59:44.982184Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-07-17T00:59:46.299519Z INFO text_generation_launcher: Detected system cuda
Polling inference server. Awaiting status 200; trying again in 5s.
Polling inference server. Awaiting status 200; trying again in 5s.
2024-07-17T00:59:53.870205Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2024-07-17T00:59:53.891604Z INFO shard-manager: text_generation_launcher: Shard ready in 8.908719315s rank=0
2024-07-17T00:59:53.989970Z INFO text_generation_launcher: Starting Webserver
2024-07-17T00:59:54.042832Z INFO text_generation_router: router/src/main.rs:221: Using the Hugging Face API
2024-07-17T00:59:54.043554Z INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/home/workbench/.cache/huggingface/token"
2024-07-17T00:59:54.281214Z INFO text_generation_router: router/src/main.rs:497: Serving revision d548c233192db00165d842bf8edff054bb3212f8 of model microsoft/Phi-3-mini-128k-instruct
2024-07-17T00:59:54.333862Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|endoftext|>' was expected to have ID '32000' but was given ID 'None'
2024-07-17T00:59:54.333902Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|assistant|>' was expected to have ID '32001' but was given ID 'None'
2024-07-17T00:59:54.333906Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder1|>' was expected to have ID '32002' but was given ID 'None'
2024-07-17T00:59:54.333907Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder2|>' was expected to have ID '32003' but was given ID 'None'
2024-07-17T00:59:54.333908Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder3|>' was expected to have ID '32004' but was given ID 'None'
2024-07-17T00:59:54.333910Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder4|>' was expected to have ID '32005' but was given ID 'None'
2024-07-17T00:59:54.333911Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|system|>' was expected to have ID '32006' but was given ID 'None'
2024-07-17T00:59:54.333918Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|end|>' was expected to have ID '32007' but was given ID 'None'
2024-07-17T00:59:54.333919Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder5|>' was expected to have ID '32008' but was given ID 'None'
2024-07-17T00:59:54.333920Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder6|>' was expected to have ID '32009' but was given ID 'None'
2024-07-17T00:59:54.333922Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|user|>' was expected to have ID '32010' but was given ID 'None'
2024-07-17T00:59:54.334189Z INFO text_generation_router: router/src/main.rs:334: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205
2024-07-17T00:59:54.334208Z INFO text_generation_router: router/src/main.rs:349: Using config Some(Phi3)
2024-07-17T00:59:54.334211Z WARN text_generation_router: router/src/main.rs:376: Invalid hostname, defaulting to 0.0.0.0
2024-07-17T00:59:54.341850Z INFO text_generation_router::server: router/src/server.rs:1577: Warming up model
2024-07-17T00:59:55.465534Z INFO text_generation_launcher: Cuda Graphs are disabled (CUDA_GRAPHS=None).
2024-07-17T00:59:55.466324Z INFO text_generation_router::server: router/src/server.rs:1604: Using scheduler V3
2024-07-17T00:59:55.466344Z INFO text_generation_router::server: router/src/server.rs:1656: Setting max batch total tokens to 15104
2024-07-17T00:59:55.484829Z INFO text_generation_router::server: router/src/server.rs:1894: Connected
Service reachable. Happy chatting!
RPC error: [search], <MilvusException: (code=1, message=failed to search: attempt #0: collection=451127928589389281: collection not loaded: unrecoverable error)>, <Time:{'RPC start': '2024-07-17 01:00:20.654016', 'RPC error': '2024-07-17 01:00:20.658720'}>
Failed to search collection: llamalection
INFO: 127.0.0.1:49716 - "POST /documentSearch HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
return await self.app(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 758, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 778, in app
await route.handle(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 299, in handle
await self.app(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 79, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
response = await func(request)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 299, in app
raise e
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 294, in app
raw_response = await run_endpoint_function(
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 193, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/project/code/chain_server/server.py", line 134, in document_search
nodes = retriever.retrieve(data.content)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/core/base_retriever.py", line 224, in retrieve
nodes = self._retrieve(query_bundle)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 92, in _retrieve
return self._get_nodes_with_embeddings(query_bundle)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 168, in _get_nodes_with_embeddings
query_result = self._vector_store.query(query, **self._kwargs)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/vector_stores/milvus.py", line 277, in query
res = self.milvusclient.search(
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 259, in search
raise ex from ex
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 246, in search
res = conn.search(
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 127, in handler
raise e from e
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 123, in handler
return func(*args, **kwargs)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 162, in handler
return func(self, *args, **kwargs)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 102, in handler
raise e from e
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 68, in handler
return func(*args, **kwargs)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 774, in search
return self._execute_search_requests(
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 735, in _execute_search_requests
raise pre_err from pre_err
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 726, in _execute_search_requests
raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=failed to search: attempt #0: collection=451127928589389281: collection not loaded: unrecoverable error)>
INFO: 127.0.0.1:50486 - "POST /generate HTTP/1.1" 200 OK
2024-07-17T01:00:44.037608Z INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("1-nvidia-geforce-rtx-3080-ti"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.0), frequency_penalty: None, top_k: Some(10), top_p: Some(0.999), typical_p: Some(0.95), do_sample: false, max_new_tokens: Some(256), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None, adapter_id: None } total_time="795.391785ms" validation_time="1.243147ms" queue_time="379.27µs" inference_time="793.769498ms" time_per_token="52.917966ms" seed="Some(14154443803876825602)"}: text_generation_router::server: router/src/server.rs:511: Success
INFO: 127.0.0.1:45718 - "POST /generate HTTP/1.1" 200 OK
2024-07-17T01:00:55.506821Z INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("1-nvidia-geforce-rtx-3080-ti"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.0), frequency_penalty: None, top_k: Some(10), top_p: Some(0.999), typical_p: Some(0.95), do_sample: false, max_new_tokens: Some(256), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None, adapter_id: None } total_time="2.33083849s" validation_time="461.861µs" queue_time="53.164µs" inference_time="2.330323585s" time_per_token="30.263942ms" seed="Some(7620285655049830806)"}: text_generation_router::server: router/src/server.rs:511: Success
RPC error: [search], <MilvusException: (code=1, message=failed to search: attempt #0: collection=451127928589389281: collection not loaded: unrecoverable error)>, <Time:{'RPC start': '2024-07-17 01:01:19.640950', 'RPC error': '2024-07-17 01:01:19.643889'}>
Failed to search collection: llamalection
INFO: 127.0.0.1:47148 - "POST /documentSearch HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
return await self.app(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 758, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 778, in app
await route.handle(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 299, in handle
await self.app(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 79, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
response = await func(request)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 299, in app
raise e
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 294, in app
raw_response = await run_endpoint_function(
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 193, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/project/code/chain_server/server.py", line 134, in document_search
nodes = retriever.retrieve(data.content)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/core/base_retriever.py", line 224, in retrieve
nodes = self._retrieve(query_bundle)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 92, in _retrieve
return self._get_nodes_with_embeddings(query_bundle)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 168, in _get_nodes_with_embeddings
query_result = self._vector_store.query(query, **self._kwargs)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/vector_stores/milvus.py", line 277, in query
res = self.milvusclient.search(
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 259, in search
raise ex from ex
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 246, in search
res = conn.search(
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 127, in handler
raise e from e
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 123, in handler
return func(*args, **kwargs)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 162, in handler
return func(self, *args, **kwargs)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 102, in handler
raise e from e
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 68, in handler
return func(*args, **kwargs)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 774, in search
return self._execute_search_requests(
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 735, in _execute_search_requests
raise pre_err from pre_err
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 726, in _execute_search_requests
raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=failed to search: attempt #0: collection=451127928589389281: collection not loaded: unrecoverable error)>
[nltk_data] Downloading package punkt to /home/workbench/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] /home/workbench/nltk_data...
[nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip.
Traceback (most recent call last):
File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/connectionpool.py", line 536, in _make_request
response = conn.getresponse()
File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/connection.py", line 464, in getresponse
httplib_response = super().getresponse()
File "/home/workbench/.conda/envs/ui-env/lib/python3.10/http/client.py", line 1375, in getresponse
response.begin()
File "/home/workbench/.conda/envs/ui-env/lib/python3.10/http/client.py", line 318, in begin
version, status, reason = self._read_status()
File "/home/workbench/.conda/envs/ui-env/lib/python3.10/http/client.py", line 279, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/home/workbench/.conda/envs/ui-env/lib/python3.10/socket.py", line 705, in readinto
return self._sock.recv_into(b)
TimeoutError: timed out
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/connectionpool.py", line 843, in urlopen
retries = retries.increment(
File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/util/retry.py", line 474, in increment
raise reraise(type(error), error, _stacktrace)
File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/util/util.py", line 39, in reraise
raise value
File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/connectionpool.py", line 789, in urlopen
response = self._make_request(
File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/connectionpool.py", line 538, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/urllib3/connectionpool.py", line 369, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='localhost', port=8000): Read timed out. (read timeout=120)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/workbench/.local/lib/python3.10/site-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "/home/workbench/.local/lib/python3.10/site-packages/gradio/route_utils.py", line 232, in call_process_api
output = await app.get_blocks().process_api(
File "/home/workbench/.local/lib/python3.10/site-packages/gradio/blocks.py", line 1561, in process_api
result = await self.call_function(
File "/home/workbench/.local/lib/python3.10/site-packages/gradio/blocks.py", line 1179, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
File "/home/workbench/.local/lib/python3.10/site-packages/gradio/utils.py", line 678, in wrapper
response = f(*args, **kwargs)
File "/project/code/chatui/pages/converse.py", line 978, in _document_upload
file_paths = utils.upload_file(files, client)
File "/project/code/chatui/pages/utils.py", line 47, in upload_file
client.upload_documents(file_paths)
File "/project/code/chatui/chat_client.py", line 118, in upload_documents
_ = requests.post(
File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/requests/api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/home/workbench/.conda/envs/ui-env/lib/python3.10/site-packages/requests/adapters.py", line 713, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='localhost', port=8000): Read timed out. (read timeout=120)
INFO: 127.0.0.1:35626 - "POST /uploadDocument HTTP/1.1" 500 Internal Server Error
RPC error: [flush], <MilvusException: (code=1, message=attempt #0: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #1: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #2: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #3: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #4: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #5: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #6: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #7: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #8: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #9: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #10: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #11: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #12: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #13: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #14: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #15: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #16: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #17: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #18: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #19: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #20: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #21: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #22: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #23: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #24: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #25: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #26: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #27: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #28: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #29: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #30: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #31: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #32: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #33: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #34: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #35: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #36: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #37: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #38: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #39: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #40: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #41: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #42: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #43: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #44: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #45: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #46: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #47: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #48: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #49: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #50: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #51: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #52: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #53: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #54: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #55: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #56: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #57: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #58: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #59: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found)>, <Time:{'RPC start': '2024-07-17 01:02:37.353672', 'RPC error': '2024-07-17 01:05:28.429121'}>
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
return await self.app(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 758, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 778, in app
await route.handle(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 299, in handle
await self.app(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 79, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
response = await func(request)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 299, in app
raise e
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 294, in app
raw_response = await run_endpoint_function(
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/project/code/chain_server/server.py", line 89, in upload_document
chains.ingest_docs(file_path, upload_file)
File "/project/code/chain_server/chains.py", line 402, in ingest_docs
index.insert_nodes(nodes)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 287, in insert_nodes
self._insert(nodes, **insert_kwargs)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 278, in _insert
self._add_nodes_to_index(self._index_struct, nodes, **insert_kwargs)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 200, in _add_nodes_to_index
new_ids = self._vector_store.add(nodes_batch, **insert_kwargs)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/vector_stores/milvus.py", line 201, in add
self.collection.flush()
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/orm/collection.py", line 314, in flush
conn.flush([self.name], timeout=timeout, **kwargs)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 127, in handler
raise e from e
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 123, in handler
return func(*args, **kwargs)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 162, in handler
return func(self, *args, **kwargs)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 102, in handler
raise e from e
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 68, in handler
return func(*args, **kwargs)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 1283, in flush
raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=attempt #0: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #1: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #2: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #3: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #4: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #5: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #6: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #7: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #8: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #9: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #10: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #11: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #12: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #13: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #14: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #15: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #16: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #17: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #18: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #19: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #20: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #21: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #22: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #23: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #24: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #25: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #26: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #27: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #28: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #29: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #30: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #31: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #32: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #33: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #34: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #35: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #36: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #37: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #38: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #39: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #40: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #41: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #42: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #43: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #44: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #45: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #46: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #47: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #48: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #49: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #50: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #51: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #52: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #53: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #54: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #55: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #56: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #57: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #58: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found: attempt #59: channel=by-dev-rootcoord-dml_0_451127928589389281v0: channel not found)>
INFO: 127.0.0.1:57544 - "POST /generate HTTP/1.1" 200 OK
2024-07-17T01:05:34.575055Z INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("1-nvidia-geforce-rtx-3080-ti"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.0), frequency_penalty: None, top_k: Some(10), top_p: Some(0.999), typical_p: Some(0.95), do_sample: false, max_new_tokens: Some(256), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None, adapter_id: None } total_time="6.135800332s" validation_time="447.356µs" queue_time="47.348µs" inference_time="6.135305778s" time_per_token="32.985514ms" seed="Some(9174620117177924397)"}: text_generation_router::server: router/src/server.rs:511: Success
2024-07-17T01:05:37.492691Z INFO text_generation_router::server: router/src/server.rs:1948: signal received, starting graceful shutdown
2024-07-17T01:05:37.504578Z INFO text_generation_launcher: Terminating webserver
2024-07-17T01:05:37.504611Z INFO text_generation_launcher: Waiting for webserver to gracefully shutdown
2024-07-17T01:05:37.504629Z INFO text_generation_launcher: webserver terminated
2024-07-17T01:05:37.504631Z INFO text_generation_launcher: Shutting down shards
2024-07-17T01:05:37.504927Z INFO shard-manager: text_generation_launcher: Terminating shard rank=0
2024-07-17T01:05:37.504969Z INFO shard-manager: text_generation_launcher: Waiting for shard to gracefully shutdown rank=0
2024-07-17T01:05:38.605920Z INFO shard-manager: text_generation_launcher: shard terminated rank=0
INFO: 127.0.0.1:48004 - "GET /health HTTP/1.1" 200 OK
All URLs returned HTTP code 200
Detected system cuda
Files are already present on the host. Skipping download.
INFO: 127.0.0.1:49270 - "GET /health HTTP/1.1" 200 OK
All URLs returned HTTP code 200
2024-07-17T01:06:48.495811Z INFO text_generation_launcher: Args {
model_id: "microsoft/Phi-3-mini-128k-instruct",
revision: None,
validation_workers: 2,
sharded: None,
num_shard: None,
quantize: Some(
BitsandbytesNF4,
),
speculate: None,
dtype: None,
trust_remote_code: true,
max_concurrent_requests: 128,
max_best_of: 2,
max_stop_sequences: 4,
max_top_n_tokens: 5,
max_input_tokens: None,
max_input_length: Some(
4000,
),
max_total_tokens: Some(
5000,
),
waiting_served_ratio: 0.3,
max_batch_prefill_tokens: None,
max_batch_total_tokens: None,
max_waiting_tokens: 20,
max_batch_size: None,
cuda_graphs: None,
hostname: "project-hybrid-rag",
port: 9090,
shard_uds_path: "/tmp/text-generation-server",
master_addr: "localhost",
master_port: 29500,
huggingface_hub_cache: Some(
"/data/",
),
weights_cache_override: None,
disable_custom_kernels: false,
cuda_memory_fraction: 0.85,
rope_scaling: None,
rope_factor: None,
json_output: false,
otlp_endpoint: None,
otlp_service_name: "text-generation-inference.router",
cors_allow_origin: [],
watermark_gamma: None,
watermark_delta: None,
ngrok: false,
ngrok_authtoken: None,
ngrok_edge: None,
tokenizer_config_path: None,
disable_grammar_support: false,
env: false,
max_client_batch_size: 4,
lora_adapters: None,
}
2024-07-17T01:06:48.495873Z INFO hf_hub: Token file not found "/home/workbench/.cache/huggingface/token"
2024-07-17T01:06:48.495939Z INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 4050
2024-07-17T01:06:48.495956Z INFO text_generation_launcher: Bitsandbytes doesn't work with cuda graphs, deactivating them
2024-07-17T01:06:48.495958Z WARN text_generation_launcher: `trust_remote_code` is set. Trusting that model `microsoft/Phi-3-mini-128k-instruct` do not contain malicious code.
2024-07-17T01:06:48.496019Z INFO download: text_generation_launcher: Starting check and download process for microsoft/Phi-3-mini-128k-instruct
Polling inference server. Awaiting status 200; trying again in 5s.
2024-07-17T01:06:49.639523Z INFO text_generation_launcher: Detected system cuda
2024-07-17T01:06:50.981563Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-07-17T01:06:51.499219Z INFO download: text_generation_launcher: Successfully downloaded weights for microsoft/Phi-3-mini-128k-instruct
2024-07-17T01:06:51.499394Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-07-17T01:06:52.831384Z INFO text_generation_launcher: Detected system cuda
Polling inference server. Awaiting status 200; trying again in 5s.
2024-07-17T01:06:57.461016Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2024-07-17T01:06:57.507517Z INFO shard-manager: text_generation_launcher: Shard ready in 6.007455596s rank=0
2024-07-17T01:06:57.606256Z INFO text_generation_launcher: Starting Webserver
2024-07-17T01:06:57.627085Z INFO text_generation_router: router/src/main.rs:221: Using the Hugging Face API
2024-07-17T01:06:57.627118Z INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/home/workbench/.cache/huggingface/token"
2024-07-17T01:06:57.831817Z INFO text_generation_router: router/src/main.rs:497: Serving revision d548c233192db00165d842bf8edff054bb3212f8 of model microsoft/Phi-3-mini-128k-instruct
2024-07-17T01:06:57.871924Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|endoftext|>' was expected to have ID '32000' but was given ID 'None'
2024-07-17T01:06:57.871947Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|assistant|>' was expected to have ID '32001' but was given ID 'None'
2024-07-17T01:06:57.871951Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder1|>' was expected to have ID '32002' but was given ID 'None'
2024-07-17T01:06:57.871952Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder2|>' was expected to have ID '32003' but was given ID 'None'
2024-07-17T01:06:57.871953Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder3|>' was expected to have ID '32004' but was given ID 'None'
2024-07-17T01:06:57.871955Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder4|>' was expected to have ID '32005' but was given ID 'None'
2024-07-17T01:06:57.871962Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|system|>' was expected to have ID '32006' but was given ID 'None'
2024-07-17T01:06:57.871963Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|end|>' was expected to have ID '32007' but was given ID 'None'
2024-07-17T01:06:57.871964Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder5|>' was expected to have ID '32008' but was given ID 'None'
2024-07-17T01:06:57.871966Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder6|>' was expected to have ID '32009' but was given ID 'None'
2024-07-17T01:06:57.871967Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|user|>' was expected to have ID '32010' but was given ID 'None'
2024-07-17T01:06:57.872152Z INFO text_generation_router: router/src/main.rs:334: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205
2024-07-17T01:06:57.872168Z INFO text_generation_router: router/src/main.rs:349: Using config Some(Phi3)
2024-07-17T01:06:57.872172Z WARN text_generation_router: router/src/main.rs:376: Invalid hostname, defaulting to 0.0.0.0
2024-07-17T01:06:57.874313Z INFO text_generation_router::server: router/src/server.rs:1577: Warming up model
Polling inference server. Awaiting status 200; trying again in 5s.
2024-07-17T01:06:58.809627Z INFO text_generation_launcher: Cuda Graphs are disabled (CUDA_GRAPHS=None).
2024-07-17T01:06:58.809944Z INFO text_generation_router::server: router/src/server.rs:1604: Using scheduler V3
2024-07-17T01:06:58.809966Z INFO text_generation_router::server: router/src/server.rs:1656: Setting max batch total tokens to 15104
2024-07-17T01:06:58.823937Z INFO text_generation_router::server: router/src/server.rs:1894: Connected
Service reachable. Happy chatting!
INFO: 127.0.0.1:54790 - "POST /generate HTTP/1.1" 200 OK
2024-07-17T01:08:03.213875Z INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("1-nvidia-geforce-rtx-3080-ti"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.0), frequency_penalty: None, top_k: Some(10), top_p: Some(0.999), typical_p: Some(0.95), do_sample: false, max_new_tokens: Some(256), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None, adapter_id: None } total_time="2.912333311s" validation_time="622.048µs" queue_time="59.019µs" inference_time="2.911652384s" time_per_token="43.457498ms" seed="Some(11963795027966861836)"}: text_generation_router::server: router/src/server.rs:511: Success
RPC error: [search], <MilvusException: (code=1, message=failed to search: attempt #0: failed to search/query delegator 14 for channel by-dev-rootcoord-dml_0_451127928589389281v0: fail to Search, QueryNode ID=14, reason=Timestamp lag too large lag(26h11m50.855s) max(24h0m0s): attempt #1: no available shard delegator found: service unavailable)>, <Time:{'RPC start': '2024-07-17 01:08:23.721437', 'RPC error': '2024-07-17 01:08:24.326594'}>
Failed to search collection: llamalection
INFO: 127.0.0.1:44818 - "POST /documentSearch HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
return await self.app(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 758, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 778, in app
await route.handle(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 299, in handle
await self.app(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 79, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
response = await func(request)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 299, in app
raise e
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 294, in app
raw_response = await run_endpoint_function(
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/fastapi/routing.py", line 193, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/project/code/chain_server/server.py", line 134, in document_search
nodes = retriever.retrieve(data.content)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/core/base_retriever.py", line 224, in retrieve
nodes = self._retrieve(query_bundle)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 92, in _retrieve
return self._get_nodes_with_embeddings(query_bundle)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 168, in _get_nodes_with_embeddings
query_result = self._vector_store.query(query, **self._kwargs)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/llama_index/vector_stores/milvus.py", line 277, in query
res = self.milvusclient.search(
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 259, in search
raise ex from ex
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 246, in search
res = conn.search(
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 127, in handler
raise e from e
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 123, in handler
return func(*args, **kwargs)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 162, in handler
return func(self, *args, **kwargs)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 102, in handler
raise e from e
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/decorators.py", line 68, in handler
return func(*args, **kwargs)
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 774, in search
return self._execute_search_requests(
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 735, in _execute_search_requests
raise pre_err from pre_err
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 726, in _execute_search_requests
raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=failed to search: attempt #0: failed to search/query delegator 14 for channel by-dev-rootcoord-dml_0_451127928589389281v0: fail to Search, QueryNode ID=14, reason=Timestamp lag too large lag(26h11m50.855s) max(24h0m0s): attempt #1: no available shard delegator found: service unavailable)>
INFO: 127.0.0.1:45204 - "POST /generate HTTP/1.1" 200 OK
2024-07-17T01:18:50.611618Z INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("1-nvidia-geforce-rtx-3080-ti"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.0), frequency_penalty: None, top_k: Some(10), top_p: Some(0.999), typical_p: Some(0.95), do_sample: false, max_new_tokens: Some(256), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None, adapter_id: None } total_time="4.686537286s" validation_time="554.105µs" queue_time="41.218µs" inference_time="4.685942523s" time_per_token="47.332752ms" seed="Some(14024934939529428721)"}: text_generation_router::server: router/src/server.rs:511: Success
Terminated
stat: cannot statx '/var/host-run/docker.sock': No such file or directory
groupadd: invalid group ID 'docker'
usermod: group 'docker' does not exist
Either this incremental library version update fixed it or I smashed enough buttons and don't know what happened.
I was finally able to get this working by changing milvus[client]
to from version 2.3.2
to version 2.3.5
This is a regression.
. Problem Report
I cleared the cache and rebuilt the container to pick up the latest huggingface update with a fix for running the Microsoft model.
` ERR: Unable to process query.
Message: Expecting value: line 1 column 1 (char 0) `
and