Leon-Sander / Local-Multimodal-AI-Chat

GNU General Public License v3.0
136 stars 83 forks source link

Timeout Error while loading /pull llava in the chatbar #33

Open Paramjethwa opened 1 month ago

Paramjethwa commented 1 month ago

this is what i got in the streamlit app after 5 minutes of waiting after doing /pull llava

TimeoutError

Traceback: File "/usr/local/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/exec_code.py", line 88, in exec_func_with_error_handling result = func() File "/usr/local/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 590, in code_to_exec exec(code, module.dict) File "/app/app.py", line 171, in main() File "/app/app.py", line 123, in main response = command(user_input) File "/app/utils.py", line 15, in command return pull_model_in_background(splitted_input[1]) File "/app/utils.py", line 73, in pull_model_in_background return asyncio.run(pull_ollama_model_async(model_name, stream=stream)) File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/app/utils.py", line 39, in pull_ollama_model_async async with session.post(url, json=json_data) as response: File "/usr/local/lib/python3.10/site-packages/aiohttp/client.py", line 1353, in aenter self._resp = await self._coro File "/usr/local/lib/python3.10/site-packages/aiohttp/client.py", line 684, in _request await resp.start(conn) File "/usr/local/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 994, in start with self._timer: File "/usr/local/lib/python3.10/site-packages/aiohttp/helpers.py", line 713, in exit raise asyncio.TimeoutError from None

/pull nomic-embed-text loads up pretty quick but the problem with that is when i upload an pdf file it is stuck on processing pdf... and now responsing anything even i ask in chatbar

Leon-Sander commented 1 month ago

It seems that you solved the problems from the other two issues you created, can you write the solution under them please and close the issue?

For your issue, it seems to be an ollama related problem, so looking into their issues might help. Also how many pages does your pdf file have?

Paramjethwa commented 1 month ago

yes i have updated the solution for previous issues

my pdf size was 1.5 mb only and when i try to /pull llava model it is showing processing and then timeout error.

/pull nomic-embed-text loads up pretty quick but the problem with that is when i upload an pdf file it is stuck on processing pdf... and not responsing anything even i ask in chatbar

Leon-Sander commented 1 month ago

I have updated the code yesterday to track timings, can you insert the pdf that is inside the pdf folder of this repository and then paste the command line output here so I can see whats happening and how long it takes?

Paramjethwa commented 1 month ago

ok so i did update the code with the latest commit you have done.

streamlit app.py in my browser, /pull nomic-embed-text upload the same pdf present in the repository( hover.pdf)

but still same issue of connectionerror : timeout

This is what i got in my browser : ERROR FILE 1.txt

and this is what i got in my Ubuntu terminal : ERROR FILE 2.txt

Leon-Sander commented 1 month ago

In another issue you wrote that you changed the ollama url, did you also change it in the vectordb_handler.py file?

Also sidenote, you dont need to pull the same model over and over, after pulling it once, you have it locally. Also ollama did not recognize your gpu, make sure that you have the latest version of the code and wsl set up if you work on windows.

Paramjethwa commented 1 month ago

i have solved the timeout error and will upload the entire solution tomorrow in a proper format

I've encountered a different issue where the /pull command for 'nomic-embed-text' completes successfully, and the PDF loads without any problems. However, 'nomic-embed-text' does not appear in the model selection section, while other random models I've pulled show up there, allowing me to choose them for chatting or asking questions about images.

Image for reference llava-phi3 is working fine for chat and image both image

Also, how can i unpull the model which i dont want to use and where it is getting stored locally, i cant find the model in my project directory?

Paramjethwa commented 1 month ago

In another issue you wrote that you changed the ollama url, did you also change it in the vectordb_handler.py file?

Also sidenote, you dont need to pull the same model over and over, after pulling it once, you have it locally. Also ollama did not recognize your gpu, make sure that you have the latest version of the code and wsl set up if you work on windows.

yes i did changed it in the vectordb file as well but now with the latest repo it works with ollama URL too, yeah i dont know why it is not recognizing the GPU i do have WSL2 (Ubuntu) but even with the CPU only pdf loads take around 30 second now and response tiem is 10-15 second

Leon-Sander commented 1 month ago

However, 'nomic-embed-text' does not appear in the model selection section

I build a filter to not show models that have "embed" in their name, because you cant chat with embedding models, therefore it makes no sense to have them in the model selection. Also embeddings usually are chosen once and then being used continuously. You can define which embedding model to use in the config.yaml file

Paramjethwa commented 1 month ago

However, 'nomic-embed-text' does not appear in the model selection section

I build a filter to not show models that have "embed" in their name, because you cant chat with embedding models, therefore it makes no sense to have them in the model selection. Also embeddings usually are chosen once and then being used continuously. You can define which embedding model to use in the config.yaml file

oh okay make sense now so to summarize/chat with the pdf which model is being used after embedding the pdf?

Also, so to unpull i have to delete the model from .ollama directory which i dont want to use

Leon-Sander commented 1 month ago

Y you would need to delete it from the directory you defined, but its a but complicated since they are not named based on their model names. The model used for the chat is the model you have chosen in the Select a Model dropdown.

Also a sidenote, the code is not build in a way to directly summarize the pdf, its a RAG approach where three text snippets get retrieved from the vector database, which are most similar to your query and will be given to the llm as context to answer your question.

Leon-Sander commented 1 month ago

Okay so I switched to windows and tested it out myself. Llama3.1 would not load on gpu for me, I also got a timeout error. But when loading smaller models it worked, but took a long time. For example 5 minutes to load llava on gpu while it took only 5 seconds on linux. Without gpu support it worked tho.

Also this is an ollama related issue, you might want look for answers over there: https://github.com/ollama/ollama/issues/4427

Paramjethwa commented 1 month ago

okay sure ill check out the ollama thread for solution for gpu usage

also, once you upload the pdf using nomic embed text model, which model do you use to chat with pdf?

also can you suggest similar kind of model such as llava but smaller in size so that we dont get a timeout error for it ?

Leon-Sander commented 1 month ago

You can choose what ever model you want for chatting. Look into the ollama library and select and try what fits your system best https://ollama.com/library For example gemma2:2b qwen2.5:3b

Paramjethwa commented 1 month ago

so here is the solution for http connection error or connection timeout error

what i did is change the permission setting for .ollama folder by going into properties, security tab and changing the user (your laptop) setting to full control so that it can remove the unnecessary blobs files automatically which is present in the .ollama/model folder

after doing this restart your pc and the http error should be gone

although whenever i load/pull bigger model (3GB to 5GB) i get timeout error but work fine with small model

ok so i did update the code with the latest commit you have done.

streamlit app.py in my browser, /pull nomic-embed-text upload the same pdf present in the repository( hover.pdf)

but still same issue of connectionerror : timeout

This is what i got in my browser : ERROR FILE 1.txt

and this is what i got in my Ubuntu terminal : ERROR FILE 2.txt

Paramjethwa commented 1 month ago

You can choose what ever model you want for chatting. Look into the ollama library and select and try what fits your system best https://ollama.com/library For example gemma2:2b qwen2.5:3b

so i tried to pull this model i was having timeout error but then i plugged in the charger, set the system into performance/turbo mode and it loads the model successfully.

although I still cannot load the llava:7b model ends up with timeout error every time i guess it's because of its 4-5GB size while the smaller model loads perfectly

Edit: should I download the llava model manually and put it in the .ollama/model folder, so will it be available in the dropdown select a model section in the stream lit app?

Paramjethwa commented 1 month ago

AUDIO ERROR i am getting this when i record an audio from streamlitapp.

LibsndfileError: Error opening <_io.BytesIO object at 0x7f71cd288f40>: Format not recognised.

i feel like the audio which is recording is not in format of Wav or something similar acceptable format hence this error

Audioerror.txt

Edit: although when i try to upload the audio file (ogg format) and summarize it, it works fine but takes few minute to load and process while when i record directly from the record audio button the above error occurs

Leon-Sander commented 1 month ago

@Paramjethwa Just pushed a fix for the audio error and the timeout error. Make sure to rebuild the docker image for the app. Can you test it out and give feedback please if it fixed the problem?

Paramjethwa commented 1 month ago

@Leon-Sander so i have tested the commit and at first the audio didnt work but later as i start the normal chat with keyboard and then used record audio it Worked perfectly fine

And about timeout Error of /pull llava it still exist tried few time and end up with the same error of asyncio.exceptions.TimeoutError ERROR FILE TIMEOUT ERROR.txt

Leon-Sander commented 1 month ago

Okay I think I got it now, it is an aiohttp error. I pushed a fix on a new branch, timout_error_fix. On line 53 in utils.py, you can set the time to wait, after which it will throw an timeout error, I set it to 30 minutes now. Can you please test if this fixes your problem, and adjust if you think you would need more time to download the model?

If this does not work, I got another workaround idea for you which will surely work.

Paramjethwa commented 1 month ago

Hey Leon it took 15-20 minute to load the model and it got sucessfully completed this works perfectly fine now.. You are Amazing!! I aspire to become a skilled problem solver like you!

Now the chat App is successfully running everything, but the only issue here is ollama is not detecting/using my GPU to load or response query although i am trying to find the solution in ollama GitHub issue thread hope i find it soon.

Leon-Sander commented 1 month ago

Thanks happy to hear that.

Do you have wsl enabled in docker?

Paramjethwa commented 1 month ago

Thanks happy to hear that.

Do you have wsl enabled in docker?

yes it is already enabled the WSL in docker desktop

Paramjethwa commented 1 month ago

Hi Leon finally found the solution for GPU usage, what i did it

  1. Re-install the docker complete and removed all the tmp.json files ( as there was multiple temporary file causing conflicts) from the .docker folder
  2. Ran the docker desktop as an administrator.
  3. Make sure the Cuda driver(nvcc --version) and NVidia Smi is responding perfectly
  4. Checked and enable the WSL integration, also
  5. restarted the system
  6. rebuild the image: (docker compose up --Build)
  7. Finally, it is recognizing the GPU and perfectly using while running the project

Major credit and thanks to you Leon for helping me solve every single problem super grateful!