LMCache / lmcache-tests

4 stars 4 forks source link

No slave found for 'mymaster' in E2E tests with `test_lmcache_redis_sentinel` #15

Open Shaoting-Feng opened 1 month ago

Shaoting-Feng commented 1 month ago

When running test_lmcache_redis_sentinel in test/tests.py, there is an error message after issusing a new request (driver.py:118). Then OpenAI request failed: peer closed connection without sending complete message body (incomplete chunked read.

I checked the stderr log of the port. The error message is No slave found for 'mymaster'. The whole related error message is as followed:

ERROR LMCache: Engine background task failed [2024-10-14 20:58:55,943.943] [line: 188] [file: vllm_injection.py]
Traceback (most recent call last):
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/site-packages/vllm/worker/model_runner_base.py", line 112, in _wrapper
    return func(*args, **kwargs)
  File "/local/shaotingf/lmcache1/lmcache-vllm/lmcache_vllm/vllm_injection.py", line 32, in new_execute_model
    model_input = lmcache_retrieve_kv(self.model, model_input, kv_caches)
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/site-packages/nvtx/nvtx.py", line 116, in inner
    result = func(*args, **kwargs)
  File "/local/shaotingf/lmcache1/lmcache-vllm/lmcache_vllm/vllm_adapter.py", line 278, in lmcache_retrieve_kv
    kv_tuple, num_computed_tokens = engine.retrieve(current_tokens)
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/site-packages/nvtx/nvtx.py", line 116, in inner
    result = func(*args, **kwargs)
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/local/shaotingf/lmcache1/LMCache/lmcache/cache_engine.py", line 319, in retrieve
    for chunk in retrival_iterator:
  File "/local/shaotingf/lmcache1/LMCache/lmcache/storage_backend/abstract_backend.py", line 110, in batched_get
    if self.contains(key):  # Jiayi: This seems to be redundant?
  File "/local/shaotingf/lmcache1/LMCache/lmcache/storage_backend/remote_backend.py", line 114, in contains
    flag = self.connection.exists(self._combine_key(key))
  File "/local/shaotingf/lmcache1/LMCache/lmcache/storage_backend/connector/base_connector.py", line 81, in exists
    return self.connector.exists(key)
  File "/local/shaotingf/lmcache1/LMCache/lmcache/storage_backend/connector/redis_connector.py", line 98, in exists
    return self.slave.exists(key)
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/site-packages/redis/commands/core.py", line 1729, in exists
    return self.execute_command("EXISTS", *names, keys=names)
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/site-packages/redis/client.py", line 559, in execute_command
    return self._execute_command(*args, **options)
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/site-packages/redis/client.py", line 565, in _execute_command
    conn = self.connection or pool.get_connection(command_name, **options)
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/site-packages/redis/connection.py", line 1423, in get_connection
    connection.connect()
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/site-packages/redis/sentinel.py", line 58, in connect
    return self.retry.call_with_retry(self._connect_retry, lambda error: None)
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/site-packages/redis/retry.py", line 67, in call_with_retry
    raise error
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/site-packages/redis/retry.py", line 62, in call_with_retry
    return do()
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/site-packages/redis/sentinel.py", line 50, in _connect_retry
    for slave in self.connection_pool.rotate_slaves():
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/site-packages/redis/sentinel.py", line 134, in rotate_slaves
    raise SlaveNotFoundError(f"No slave found for {self.service_name!r}")
redis.sentinel.SlaveNotFoundError: No slave found for 'mymaster'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/local/shaotingf/lmcache1/lmcache-vllm/lmcache_vllm/vllm_injection.py", line 177, in new_log_task_completion
    return_value = task.result()
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 733, in run_engine_loop
    result = task.result()
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 673, in engine_step
    request_outputs = await self.engine.step_async(virtual_engine)
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 340, in step_async
    outputs = await self.model_executor.execute_model_async(
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 185, in execute_model_async
    output = await make_async(self.driver_worker.execute_model
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 327, in execute_model
    output = self.model_runner.execute_model(
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/site-packages/vllm/worker/model_runner_base.py", line 125, in _wrapper
    pickle.dump(dumped_inputs, filep)
  File "/local/shaotingf/anaconda3/envs/lmcache1/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 563, in __reduce__
    raise RuntimeError("LLMEngine should not be pickled!")
RuntimeError: LLMEngine should not be pickled!
Shaoting-Feng commented 1 month ago

Check docker/redis-sentinel