AIAnytime commented 1 year ago

Why do I am getting "Could not reach the server"?

willydouhard commented 1 year ago

It means that the UI can't connect to the server, either the server crashed or is not started.

Skisquaw commented 1 year ago

I'm getting this too now. chainlit was working yesterday and now after a couple mins, it shows this message but does not show any errors in the console. The last message is Batches: 100% and the UI shows LLMChain Running.

Is there detailed debugging that can be enabled? It looks like something is timing out but I don't see any timeout settings.

willydouhard commented 1 year ago

Are you running local llms?

nysagarg commented 1 year ago

im getting the same issue. Yes I am running local llm

Skisquaw commented 1 year ago

I worked on this more last night. It appears to be something timing out in the chainlit app. When I switched the llm inference to gpt-3.5-turbo (a few secs response time), it worked fine. My local LLM takes like 5 mins to respond and the message "Could not reach the server" appears while waiting. My python code is using async def function with await which has no timeout

willydouhard commented 1 year ago

Yes we might hit a default timeout here. Will investigate

AnirudhKabra commented 1 year ago

im getting the same issue. Yes I am running local llm

I am also getting same issue

willydouhard commented 1 year ago

Also are you calling your local llm in a synchronous way? If so that would explain the issue. If you are blocking the event loop for 5 minutes the client would think that the server is dead.

you could use cl.make_async to make that long running call in a separate thread.

bsurya27 commented 1 year ago

Even I am having the same issue. I am running a LLM locally and I get this error message after I give a prompt. It takes around 2-5 mins before it can connect to the server. How do I resolve this?

willydouhard commented 1 year ago

Are you using a LLM that runs locally? Would need more context to help

bsurya27 commented 1 year ago

Yes, I am using a LLM that runs locally, I've already mentioned that. Any other details you need to know?

willydouhard commented 1 year ago

Would be helpful how you "load" your llm in the python process

bsurya27 commented 1 year ago

I am using ggml llama -2 via Ctransformers. I am using llamachain to run them together

willydouhard commented 1 year ago

Can I see your Chainlit code? Are you instantiating the LLM un on_chat_start or at the beginning of your app file?

AnirudhKabra commented 1 year ago

import os

import chainlit as cl from dotenv import load_dotenv from langchain.chains import RetrievalQA from langchain.embeddings import HuggingFaceEmbeddings from langchain.llms import CTransformers from langchain.prompts import PromptTemplate from langchain.vectorstores import FAISS

Load environment variables from .env file

load_dotenv()

HUGGINGFACEHUB_API_TOKEN = os.getenv("HUGGINGFACEHUB_API_TOKEN")

DB_FAISS_PATH = "./../vectorstore/db_faiss" MODEL_PATH = "./../model/llama-2-7b-chat.ggmlv3.q8_0.bin"

prompt_template = """Use the following pieces of context to answer the users question. If you don't know the answer, just say that you don't know, don't try to make up an answer. ALWAYS return a "SOURCES" part in your answer. The "SOURCES" part should be a reference to the source of the document from which you got your answer. Example of your response should be as follows:

Context: {context} Question: {question}

Only return the helpful answer below and nothing else. Helpful answer: """

def set_custom_prompt(): """ Prompt template for QA retrieval for each vectorstore """ prompt = PromptTemplate( template=prompt_template, input_variables=["context", "question"] ) return prompt

def create_retrieval_qa_chain(llm, prompt, db):

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever(search_kwargs={"k": 3}),
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt},
)
return qa_chain

def load_model( model_path=MODEL_PATH, model_type="llama", max_new_tokens=512, temperature=0.7, ):

if not os.path.exists(model_path):
    raise FileNotFoundError(f"No model file found at {model_path}")

# Additional error handling could be added here for corrupt files, etc.

llm = CTransformers(
    model=model_path,
    model_type=model_type,
    max_new_tokens=max_new_tokens,  # type: ignore
    temperature=temperature,  # type: ignore
)

return llm

def create_retrieval_qa_bot( model_name="sentence-transformers/all-MiniLM-L6-v2", persist_dir=DB_FAISS_PATH, device="cpu", ):

if not os.path.exists(persist_dir):
    raise FileNotFoundError(f"No directory found at {persist_dir}")

try:
    embeddings = HuggingFaceEmbeddings(
        model_name=model_name,
        model_kwargs={"device": device},
    )
except Exception as e:
    raise Exception(
        f"Failed to load embeddings with model name {model_name}: {str(e)}"
    )

db = FAISS.load_local(folder_path=DB_FAISS_PATH, embeddings=embeddings)

try:
    llm = load_model()  # Assuming this function exists and works as expected
except Exception as e:
    raise Exception(f"Failed to load model: {str(e)}")

qa_prompt = (
    set_custom_prompt()
)  # Assuming this function exists and works as expected

try:
    qa = create_retrieval_qa_chain(
        llm=llm, prompt=qa_prompt, db=db
    )  # Assuming this function exists and works as expected
except Exception as e:
    raise Exception(f"Failed to create retrieval QA chain: {str(e)}")

return qa

def retrieve_bot_answer(query):

qa_bot_instance = create_retrieval_qa_bot()
bot_response = qa_bot_instance({"query": query})
return bot_response

@cl.on_chat_start async def initialize_bot():

qa_chain = create_retrieval_qa_bot()
welcome_message = cl.Message(content="Starting the bot...")
await welcome_message.send()
welcome_message.content = (
    "Hi, Welcome to Chat With Documents using Llama2 and LangChain."
)
await welcome_message.update()

cl.user_session.set("chain", qa_chain)

@cl.on_message async def process_chat_message(message):

qa_chain = cl.user_session.get("chain")
callback_handler = cl.AsyncLangchainCallbackHandler(
    stream_final_answer=True, answer_prefix_tokens=["FINAL", "ANSWER"]
)
callback_handler.answer_reached = True
response = await qa_chain.acall(message, callbacks=[callback_handler])
bot_answer = response["result"]
source_documents = response["source_documents"]

if source_documents:
    bot_answer += f"\nSources:" + str(source_documents)
else:
    bot_answer += "\nNo sources found"

await cl.Message(content=bot_answer).send()

willydouhard commented 1 year ago

If i understand correctly you are trying to run llama 2 7B on CPU? If that's the case I think the token per second throughput is going to be extremely slow, with or without chainlit.

willydouhard commented 1 year ago

This issue looks similar https://github.com/Chainlit/chainlit/issues/345. Might be helpful!

Skisquaw commented 1 year ago

There seems to a timeout right around 30 seconds where I get a response back from chatgpt and then call await cl.Message(content=answer).send() and no data shows up on the UI (since OpenAI has lags, one took 44.24 seconds and the send did not show up in the UI. i typed in the same question and it took 21.75 seconds and the send did show up in the UI

Skisquaw commented 1 year ago

Is there some default timeout setting I'm missing that is automatically set to 30 seconds?

willydouhard commented 1 year ago

I don't think so. "Could not reach the server" always happen because the event loop of the server is stuck. This is often caused by a long running synchronous task on the main thread. The solution is either to run async tasks or to wrap the task in make_async.

Skisquaw commented 1 year ago

There is seems to be some default timeout of 30 secs (or my code is wrong). This is a snippet of my code. If the chat completion takes more than 30 secs, then the send is ignored and nothing appears on the chainlit screen. I was running tests, and 32.57 seconds was the closest one that failed to send. Anything under 30 secs was fine. The chat completion is definitely returning data. Note that there is no "Could not reach the server" message. The chainlit screen is still working and I can type the request again.

In my @cl.on_message async def main(message): print("Sending ",user_message) start_time = time.time() res = openai.ChatCompletion.create( model=model_name, messages=[ {"role": "user", "content": user_message} ] ) end_time = time.time() elapsed_time = end_time - start_time print(f"The function took {elapsed_time:.2f} seconds to execute.") answer = res['choices'][0]['message']['content'] await cl.Message(content=answer).send()

willydouhard commented 1 year ago

The problem is that you use the sync implem of openai.ChatCompletion on the main thread. Two solutions here, run this function in a different thread (with cl.make_async) or use the async implementation (example here).

Skisquaw commented 1 year ago

That fixed it, thank you Willy! I even get the nice "Running..." spinning animation.

jerem64 commented 7 months ago

The problem is that you use the sync implem of openai.ChatCompletion on the main thread. Two solutions here, run this function in a different thread (with cl.make_async) or use the async implementation (example here).

The link is dead 😭

cigotete commented 7 months ago

The problem is that you use the sync implem of openai.ChatCompletion on the main thread. Two solutions here, run this function in a different thread (with cl.make_async) or use the async implementation (example here).

The link is dead 😭

Hello @jerem64, I think the link is this now.

I confirm the issue about 30 seconds. About the issue, I am not sure what is the reason of this 30 seconds limit. I think that, in the short term, processes that exceed that time window will be more common.

In my case, the issue was surpassed with the cl.make_async @willydouhard allusion. Following the documentation is more clear about how to implement the change https://docs.chainlit.io/api-reference/make-async, this is the example:

import time
import chainlit as cl

def sync_func():
    time.sleep(5)
    return "Hello!"

@cl.on_message
async def main(message: cl.Message):
    answer = await cl.make_async(sync_func)()
    await cl.Message(
        content=answer,
    ).send()

httplups commented 3 months ago

I am facing the same issue, but I am running all async functions from llama index. The LLM takes a long time to respond due to the amount of documents.

Chainlit / chainlit

Could not reach the server #274

Load environment variables from .env file