Stuck at "Retrieving Context..."

suryyyansh commented 5 months ago

Borrowing from the LlamaEdge example, I put the chatbot-ui frontend in the root directory before starting the RAG API server.

This seems to work well initially, the --model-name arguments seem to get updated on the frontend, and the user prompt gets passed properly in json to the server:

********************************** [LOG: RAG (Query user input)] **********************************

[+] Computing embeddings for user query ...
    * user query: nintendo 3ds

    * embedding request (json):

{
  "model": "all-MiniLM-L6-v2-ggml-model-f16",
  "input": [
    "nintendo 3ds"
  ]
}

The embeddings are also computed just fine:

...
[2024-05-07 02:55:44.550] [info] [WASI-NN] llama.cpp: llama_print_timings:       total time =   16056.25 ms /     6 tokens
    * chunk 1 done! (prompt tokens: 5)
[+] Embeddings computed successfully.

But the final output never comes:

[+] Retrieving context ...

The server gets stuck here, and doesn't move.

I tried this with the gemma-2b-it-Q5_K_M.gguf model as well as the Llama-2-7b-chat-hf-Q5_K_M.gguf model as suggested in the README.md, but can't seem to make it go forward.

For context, the LlamaEdge quickstart command works perfectly.

suryyyansh commented 5 months ago

On running sending the query through curl as mentioned in the README.md:

➜  ~ curl -X POST http://localhost:8080/v1/chat/completions \
    -H 'accept:application/json' \
    -H 'Content-Type: application/json' \
    -d '{"messages":[{"role":"system", "content": "You are a helpful assistant."}, {"role":"user", "content": "What is the location of Paris, France along the Seine River?"}], "model":"llama-2-chat"}'
500 Internal Server Error: error sending request for url (http://localhost:6333/collections/default/points/search): channel closed

Similar output is seen when trying to use the embedding endpoint.

The Qdrant DB seems to be the issue.

harsh-ps-2003 commented 5 months ago

I dont think you have Qdrant running already! You need to set it up on port 6333!

suryyyansh commented 5 months ago

I dont think you have Qdrant running already! You need to set it up on port 6333!

That seems to have been the issue. I installed Qdrant and uploaded paris.txt and it seems to be working fine now.

alabulei1 commented 5 months ago

Hi @suryyyansh

Are you following this doc? https://llamaedge.com/docs/user-guide/server-side-rag

suryyyansh commented 5 months ago

Hi @suryyyansh

Are you following this doc? https://llamaedge.com/docs/user-guide/server-side-rag

Hey @alabulei1 I didn't realize there was an additional doc at the time, but I still think we should have the bare minimum for getting started available right here in the README.

Also, please check and let me know if the linked PR can be merged.

alabulei1 commented 5 months ago

Merged. Thanks for your PR.

BTW, we have marked the RAG-example repo archived.

LlamaEdge / rag-api-server

Stuck at "Retrieving Context..." #3