docker / genai-stack

Langchain + Docker + Neo4j + Ollama
Creative Commons Zero v1.0 Universal
3.39k stars 703 forks source link

pdf_bot on Mac (with Ollama) outputting gibberish #122

Closed whalygood closed 4 months ago

whalygood commented 4 months ago

Trying out the GenAI stack on a Macbook Air M1 8GB running Docker Engine 4.27.1 and Ollama 0.1.23 with the llama2 model. Running into an issue where the pdf_bot example appears to vectorize a PDF document successfully into the neo4j vector database, but outputs gibberish when asked a question about the PDFs contents.

A token like "dim" or "owner" is repeated over and over again in the output.

Attempted with two different PDF documents from different sources, one several pages long, and another > 100 pages long.

Screenshot 2024-02-07 at 12 28 34 PM

There is possibly relevant log message from the pdf_bot-1 container when this occurs:

pdf_bot-1 | /usr/local/lib/python3.11/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The functionrunwas deprecated in LangChain 0.1.0 and will be removed in 0.2.0. Use invoke instead.

All the services in the compose stack started healthily. Have retried after restarting Ollama, the stack (docker compose down -v and docker compose up -d), and MacOS to no avail.

Prompting the llama2 model on its own using the ollama run llama2 command in a console has normal output.

whalygood commented 4 months ago

Screenshot of the issue (incomprehensible, gibberish output)

gibberish ollama

jexp commented 4 months ago

Can you share the PDF you used?

whalygood commented 4 months ago

Sure, the PDF files I tested with were: https://delcode.delaware.gov/title8/Title8.pdf https://constitutioncenter.org/media/files/constitution.pdf

jexp commented 4 months ago

What kind of machine do you have and do you use a local LLM (which one did you use) or OpenAI? It could be that your GPU is too small for the local models? Can you try with OpenAI?

whalygood commented 4 months ago

The machine I have is Macbook Air M1 8GB, and I'm using the llama2 model with Ollama 0.1.23. The local model does work properly when conversing with it directly using ollama run llama2, but are you suggesting that my system resources are insufficient to use the local model with embeddings? Using the same stack, I tested with OpenAI on a separate Windows machine, which does work.

wolfieorama commented 4 months ago

image

Tested it on Macbook pro M3 18GB and it worked with the same doc

whalygood commented 4 months ago

Closing the issue as I can confirm that it works on an Ubuntu VM with 17GB of RAM and 4 cores / 8 threads (Ryzen 7 5000 series). A little bit of a shame it won't work with the base M1 configuration as I was hoping to see how it'd perform on ARM, but understandable.