Leon-Sander / Local-Multimodal-AI-Chat

GNU General Public License v3.0
124 stars 73 forks source link

PDF AND IMAGE LOADING AND RESPONSE TOO SLOW! #30

Closed Paramjethwa closed 4 days ago

Paramjethwa commented 1 week ago

Facing this issue where the pdf file of 1.5 mb took me 30 minutes to load and more few minutes to response, while image processing took around 10 minutes per images from loading to getting response.

Voice loading is pretty decent with 3-4 minutes of processing and answer the given question.

Normal chat with the model working pretty good which gives around 20-25 second response time as compared to others

Also i cant have my GPU to work for this, pdf loading takes zero GPU usage as well as the image loading.

while Image loading take 90%+ usage of my CPU

i am using windows 11 and checked in my task manager performance where the GPU usage is nearly zero and cpu is around 4-8 percent while running the project, i have GPU Nvidia 4050 with latest intel 13th gen 13700hx processor, is there any solution for this?

This is snippet of terminal processing in my Vscode, this is how the process look while loading the PDF which is of 1.5mb

model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 1.34G/1.34G [04:23<00:00, 3.02MB/s] model.onnx: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 1.34G/1.34G [07:26<00:00, 2.99MB/s] pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 1.34G/1.34G [07:29<00:00, 2.98MB/s] sentence_bert_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████| 52.0/52.0 [00:00<00:00, 51.8kB/s] special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 125/125 [00:00<?, ?B/s] tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 711k/711k [00:00<00:00, 944kB/s] tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 366/366 [00:00<00:00, 338kB/s] vocab.txt: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 616kB/s] modules.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 349/349 [00:00<00:00, 375kB/s] load INSTRUCTOR_Transformer max_seq_length 512 Documents added to db.

JTMarsh556 commented 1 week ago

I suggest verifying you have pytorch installed in your environment. A lot of the embedding methods out there rely on it. Been a while since I have used this application but I have found that in many cases this will lead to slow embedding because without it you are forced to use the CPU for the embeddings which is very slow. Again, I have not used this application in a while so this may not be the problem here but it is extremely common.

https://pytorch.org/get-started/locally/

Leon-Sander commented 1 week ago

I just updated the code to work with ollama, which makes everything faster. I would suggest downloading the newest code version and using docker compose to set it up, everything explained in the readme. I tested it with cpu only on windows and it took 30 seconds for a 3mb pdf. With gpu of course way faster.

You need docker and on windows also wsl installed to run docker with gpu usage. I would recommend to always use GPU when possible.

Leon-Sander commented 4 days ago

Something I just realized after looking at your screenshot again, you seem to have a slow internet connection, it took around 20 minutes to download the embedding model. The same probably applied to voice messages. This is only done once, so your subsequent usages will be faster. Also working on windows is generally slower than on linux.