LlamaEdge / rag-api-server

A RAG API server written in Rust following OpenAI specs
https://llamaedge.com/docs/user-guide/server-side-rag/quick-start
Apache License 2.0
20 stars 7 forks source link

Slow chunking the text file #9

Open katopz opened 3 months ago

katopz commented 3 months ago

after try step from readme

curl -X POST http://127.0.0.1:8080/v1/create/rag -F "file=@paris.txt"

It took 590824.84 ms = nearly 1 minute for only chunking 306 lines (91KB) file on m3 max.

This is just me or I miss some flag?

juntao commented 3 months ago

Can you perhaps try this?

https://docs.gaianet.ai/creator-guide/knowledge/text

katopz commented 3 months ago

Can you perhaps try this?

https://docs.gaianet.ai/creator-guide/knowledge/text

This one took 6.88s seem to be faster.🤔

juntao commented 3 months ago

Just make sure that the embedding model you used to generate the vector collection / snapshot is the same as the one rag-api-server starts with.

katopz commented 3 months ago

I'm not quite sure which line i've to check, I follow step from readme which is

wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama-2-7b-chat-hf-Q5_K_M.gguf \
    --nn-preload embedding:GGML:AUTO:all-MiniLM-L6-v2-ggml-model-f16.gguf \
    rag-api-server.wasm \
    --model-name Llama-2-7b-chat-hf-Q5_K_M,all-MiniLM-L6-v2-ggml-model-f16 \
    --ctx-size 4096,384 \
    --prompt-template llama-2-chat \
    --rag-prompt "Use the following pieces of context to answer the user's question.\nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\n" \
    --log-prompts \
    --log-stat

and

curl -X POST http://127.0.0.1:8080/v1/create/rag -F "file=@paris.txt"

it should be same?

juntao commented 3 months ago

You started the rag api server with all-MiniLM-L6-v2-ggml-model-f16.gguf

So, the command you used to create the embeddings should also be all-MiniLM-L6-v2-ggml-model-f16.gguf

If you just ran the steps in the docs, you should be fine.

katopz commented 3 months ago

Yes i just run 100% steps in the docs(for many times by now), but it's still slow.

I think I miss something pretty obvious 🤔.