Closed aug2umbc closed 2 months ago
Hi, is this happent to all pdf files ? Can you try multiple but different pdf files ? I doubt that there are something wrong with the second pdf ? Would you mind to send us the second file to investigate more ?
I have uploaded PDFs to the same Google Drive folder: https://drive.google.com/drive/folders/1fdHT5cKTsxkLpF3aXuv27VhxoY07fSgt?usp=sharing
There are a total of 4 PDFs that I was doing trial and error with.
Hi, is this happent to all pdf files ? Can you try multiple but different pdf files ? I doubt that there are something wrong with the second pdf ? Would you mind to send us the second file to investigate more ?
Yes, the issue is happening to all pdf files. I have just now confirmed that with Ollama this issue is happening no matter the size of the pdfs or the order in which they are uploaded. I have tried both nomic-embed-text:latest and mxbai-embed-large:latest for the embedding models. Same outcome as shown in my screen recordings (https://drive.google.com/drive/folders/1o_JQxq-Qp8FZMz4q5Tp1BLYjYKmz7MOQ?usp=sharing)
Thank you for your quick reply.
@phv2312 can I fix this issue?
Seem this issue happened with ollama in Windows. Please check if this reproducible @phv2312. Thanks for you report @aug2umbc.
I can check @phv2312
Sure @matiasdev30 , your help is more than welcome
Hi @aug2umbc , sorry for late reply, can you try to install the following:
pip install chromadb==0.5.0
.
I have found some similar problems on chroma here https://github.com/chroma-core/chroma/issues/2513 . It suggests downgrading to chroma 0.5.0 and chroma-hnswlib 0.7.3 will work.
I have tried on my machine and it works. Can you try on your machine too
pip install chromadb==0.5.0
Installing chromadb==0.5.0 worked!
Thank you so much. This will help others as well.
Installing chromadb==0.5.0 worked!
bash into docker container to verify
root@3e9c3d102a33:/app# pip show chromadb | grep Version
Version: 0.5.0
Dockerfile
RUN --mount=type=ssh pip install --no-cache-dir -e "libs/kotaemon[all]" \
&& pip install --no-cache-dir -e "libs/ktem" \
&& pip install --no-cache-dir graphrag future \
&& pip install --no-cache-dir "pdfservices-sdk@git+https://github.com/niallcm/pdfservices-python-sdk.git@bump-and-unfreeze-requirements" \
&& pip install --no-cache-dir llama-index-vector-stores-milvus \
&& pip install --no-cache-dir chromadb==0.5.0
@phv2312 i think this is embedding as expected? it still would be nice to get easier way to let user know its been index properly. per haps a simple checkmark
The error
❌ | fall21-bs-knowlgeandskills.pdf: RetryError[<Future at 0x7054283420b0 state=finished raised APIConnectionError>]
I suppose is more about local llama iisue
Description
I have tried this multiple times with and without the "Forced Index file" checked. Each time, the outcome is the same. The first PDF gets indexed fine, but the second one causes a crash. This occurs even if I upload files one at a time and indexing files one at a time.
Any help would be appreciated and allow me to use this application.
I am using Ollama for both embedding and Chat. Below are my screenshots. Ollama works great for me, but uploading more than one PDF is causing issues:
Reproduction steps
Screenshots
No response
Logs
No response
Browsers
Firefox, Chrome, Microsoft Edge
OS
Windows
Additional information
Video GIF and MP4 of the error screen recording is at the following link: https://drive.google.com/drive/folders/1o_JQxq-Qp8FZMz4q5Tp1BLYjYKmz7MOQ?usp=sharing