Cinnamon / kotaemon

An open-source RAG-based tool for chatting with your documents.
https://cinnamon.github.io/kotaemon/
Apache License 2.0
14.84k stars 1.15k forks source link

[BUG] - <title> Chat with files error. #342

Open Smile-L-up opened 1 month ago

Smile-L-up commented 1 month ago

Description

Thanks for your help. I have opened the web interface normally. But there is a problem. When I talk to LLM, it is normal. But when I upload a file and ask questions about it, it can return related documents. The correlation coefficient is 0.8, but the reply is (Sorry, I don't know). No matter what I ask, it always replies (Sorry, I don't know). I really don't know what is causing this. Do you know why?

Reproduction steps

1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

Screenshots

![DESCRIPTION](LINK.png)

Logs

Retrievers [DocumentRetrievalPipeline(DS=<kotaemon.storages.docstores.lancedb.LanceDBDocumentStore object at 0x7f8bbed2e350>, FSPath=PosixPath('/root/autodl-tmp/Smile_L/kotaemon/ktem_app_data/user_data/files/index_1'), Index=<class 'ktem.index.file.index.IndexTable'>, Source=<class 'ktem.index.file.index.Source'>, VS=<kotaemon.storages.vectorstores.chroma.ChromaVectorStore object at 0x7f8bbed2e770>, get_extra_table=False, llm_scorer=LLMTrulensScoring(concurrent=True, normalize=10, prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7f8ba6ff3d90>, system_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7f8ba6ff33a0>, top_k=3, user_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7f8ba6ff34c0>), mmr=False, rerankers=[], retrieval_mode='hybrid', top_k=10, user_id=1), GraphRAGRetrieverPipeline(DS=<theflow.base.unset_ object at 0x7f8c4e665210>, FSPath=<theflow.base.unset_ object at 0x7f8c4e665210>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset_ object at 0x7f8c4e665210>, VS=<theflow.base.unset_ object at 0x7f8c4e665210>, file_ids=[], user_id=<theflow.base.unset_ object at 0x7f8c4e665210>)]
searching in doc_ids ['38b223a8-86b8-4a92-9516-51a25860169a']
retrieval_kwargs: dict_keys(['do_extend', 'scope', 'filters'])
Got 100 from vectorstore
Got 1 from docstore
Got raw 10 retrieved documents
thumbnail docs 0 non-thumbnail docs 10 raw-thumbnail docs 0
retrieval step took 0.12580156326293945
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Got 9 retrieved documents
len (original) 6847
len (trimmed) 6847
Got 0 images
Trying LLM streaming
CitationPipeline: invoking LLM
CitationPipeline: finish invoking LLM
LLM rerank scores [0.8, 0.2, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Got 0 cited docs
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Generate nothing: (Sorry, I don't know)

Browsers

Chrome

OS

Linux

Additional information

Linux autodl-container-52214cb424-950f8e01 5.15.0-91-generic https://github.com/Cinnamon/kotaemon/pull/101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

cin-jimmy commented 2 weeks ago

Hi @Smile-L-up cc: @cin-albert It looks like the file you uploaded wasn’t a parse-able PDF or a supported format. Could you please either provide the correct file or try uploading a different one? If the issue continues, please share more details with us.