NVIDIA / GenerativeAIExamples

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
https://nvidia.github.io/GenerativeAIExamples/latest/index.html
Apache License 2.0
1.77k stars 294 forks source link

chain-server container keeps crashing (rag-app-text-chatbot.yaml) #133

Open jbond00747 opened 2 weeks ago

jbond00747 commented 2 weeks ago

I'm trying to deploy a basic RAG chatbot using the rag-app-text-chatbot.yaml file, but I'm running into issues with the chain-server container crashing shortly after startup. I believe I've properly followed the directions on https://nvidia.github.io/GenerativeAIExamples/latest/local-gpu.html. I'm using the v0.6.0 tag on the github repository. If I run docker logs on the chain-server container, here's the output I see:

===
INFO:     Started server process [1]
INFO:     Waiting for application startup.
/usr/local/lib/python3.10/dist-packages/langchain/embeddings/__init__.py:29: LangChainDeprecationWarning: Importing embeddings from langchain is deprecated. Importing from langchain will no longer be supported as of langchain==0.2.0. Please import from langchain-community instead:

`from langchain_community.embeddings import HuggingFaceEmbeddings`.

To install langchain-community run `pip install -U langchain-community`.
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/__init__.py:35: LangChainDeprecationWarning: Importing vector stores from langchain is deprecated. Importing from langchain will no longer be supported as of langchain==0.2.0. Please import from langchain-community instead:

`from langchain_community.vectorstores import FAISS`.

To install langchain-community run `pip install -U langchain-community`.
  warnings.warn(
INFO:faiss.loader:Loading faiss with AVX2 support.
INFO:faiss.loader:Successfully loaded faiss with AVX2 support.
/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/service_pb2_grpc.py:21: RuntimeWarning: The grpc package installed is at version 1.60.0, but the generated code in grpc_service_pb2_grpc.py depends on grpcio>=1.64.0. Please upgrade your grpc module to grpcio>=1.64.0 or downgrade your generated code using grpcio-tools<=1.60.0. This warning will become an error in 1.65.0, scheduled for release on June 25, 2024.
  warnings.warn(
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
INFO:RetrievalAugmentedGeneration.common.utils:Using huggingface as model engine and WhereIsAI/UAE-Large-V1 and model for embeddings
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: WhereIsAI/UAE-Large-V1
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
INFO:RetrievalAugmentedGeneration.common.utils:Using triton-trt-llm as model engine for llm. Model name: ensemble
ERROR:    Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 734, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 610, in __aenter__
    await self._router.startup()
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 713, in startup
    handler()
  File "/opt/RetrievalAugmentedGeneration/common/server.py", line 158, in import_example
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/opt/RetrievalAugmentedGeneration/example/chains.py", line 56, in <module>
    set_service_context()
  File "/opt/RetrievalAugmentedGeneration/common/utils.py", line 131, in wrapper
    return func(*args_hashable, **kwargs_hashable)
  File "/opt/RetrievalAugmentedGeneration/common/utils.py", line 138, in set_service_context
    llm = LangChainLLM(get_llm(**kwargs))
  File "/opt/RetrievalAugmentedGeneration/common/utils.py", line 131, in wrapper
    return func(*args_hashable, **kwargs_hashable)
  File "/opt/RetrievalAugmentedGeneration/common/utils.py", line 270, in get_llm
    trtllm = TensorRTLLM(  # type: ignore
  File "/usr/local/lib/python3.10/dist-packages/langchain_core/load/serializable.py", line 120, in __init__
    super().__init__(**kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 341, in __init__
    raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for TensorRTLLM
__root__
  Channel.unary_unary() got an unexpected keyword argument '_registered_method' (type=type_error)

ERROR:    Application startup failed. Exiting.
Exception ignored in: <function InferenceServerClient.__del__ at 0x7561548a9750>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 257, in __del__
    self.close()
  File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 264, in close
    self.stop_stream()
  File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 1811, in stop_stream
    if self._stream is not None:
AttributeError: 'InferenceServerClient' object has no attribute '_stream'
===

Here's some docker ps output:

$ docker ps -a --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
CONTAINER ID   NAMES                  STATUS
f025cd96cc5c   milvus-standalone      Up 13 minutes
ca017bfe8648   milvus-etcd            Up 13 minutes (healthy)
b44caa6c6e9a   milvus-minio           Up 13 minutes (healthy)
4b812c48035b   rag-playground         Up 13 minutes
a686d2b3938f   chain-server           Exited (3) 13 minutes ago
7fe575e94855   llm-inference-server   Up 13 minutes
80f535f5a462   notebook-server        Up 13 minutes