Cinnamon / kotaemon

An open-source RAG-based tool for chatting with your documents.
https://cinnamon.github.io/kotaemon/
Apache License 2.0
16.44k stars 1.27k forks source link

[BUG] Unable to Chat - Option never appears #463

Open meltedhead opened 1 day ago

meltedhead commented 1 day ago

Description

I can connect to local ollama model for embeddings etc. I can then upload documents and they are indexed successfully but anytime i try to chat then the chat section never appears. If I select a certain document to chat with or search all then i still don't see the option to chat. Any ideas? Screenshots below showing everything set up Screenshot 2024-11-04 160156 Screenshot 2024-11-04 155954 Screenshot 2024-11-04 155909 Screenshot 2024-11-04 155848

Reproduction steps

I run the following

# optional (setup env)
conda create -n kotaemon python=3.10
conda activate kotaemon

# clone this repo
git clone https://github.com/Cinnamon/kotaemon
cd kotaemon

pip install -e "libs/kotaemon[all]"
pip install -e "libs/ktem"
Install and unzip PDF_JS_DIST

i have my .env file as below:

# this is an example .env file, use it to create your own .env file and place it in the root of the project

# settings for OpenAI
#OPENAI_API_BASE=https://api.openai.com/v1
#OPENAI_API_KEY=<YOUR_OPENAI_KEY>
#OPENAI_CHAT_MODEL=gpt-3.5-turbo
#OPENAI_EMBEDDINGS_MODEL=text-embedding-ada-002

# settings for Azure OpenAI
#AZURE_OPENAI_ENDPOINT=
#AZURE_OPENAI_API_KEY=
#OPENAI_API_VERSION=2024-02-15-preview
#AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-35-turbo
#AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=text-embedding-ada-002

# settings for Cohere
COHERE_API_KEY=<COHERE_API_KEY>

# settings for local models
LOCAL_MODEL=llama3.1:8b
LOCAL_MODEL_EMBEDDINGS=nomic-embed-text
LOCAL_EMBEDDING_MODEL_DIM = 768
LOCAL_EMBEDDING_MODEL_MAX_TOKENS = 8192

# settings for GraphRAG
GRAPHRAG_API_KEY=openai_key
GRAPHRAG_LLM_MODEL=gpt-4o-mini
GRAPHRAG_EMBEDDING_MODEL=text-embedding-3-small

# set to true if you want to use customized GraphRAG config file
USE_CUSTOMIZED_GRAPHRAG_SETTING=false

# settings for Azure DI
AZURE_DI_ENDPOINT=
AZURE_DI_CREDENTIAL=

# settings for Adobe API
# get free credential at https://acrobatservices.adobe.com/dc-integration-creation-app-cdn/main.html?api=pdf-extract-api
# also install pip install "pdfservices-sdk@git+https://github.com/niallcm/pdfservices-python-sdk.git@bump-and-unfreeze-requirements"
PDF_SERVICES_CLIENT_ID=
PDF_SERVICES_CLIENT_SECRET=

# settings for PDF.js
PDFJS_VERSION_DIST="pdfjs-4.0.379-dist"
Then I start the app. I can test the LLM and Embeddings and everything is working. I can upload files and they are indexed but I can'#t seem to chat? It just never appears?

Screenshots

![DESCRIPTION](LINK.png)

Logs

No response

Browsers

Chrome

OS

Linux

Additional information

I am running this in google cloud workstation

meltedhead commented 18 hours ago

Which version of graphrag and future should be installed? Could this be the cause? When i try to install latest versions it causes lots of issues.

meltedhead commented 17 hours ago

I have tried installing with the run_linux.sh script and again same issue. The only issue i can see is below and when i try and install graphrag and future then i end up with lots of library conflicts and it doesn't start. I keep retrying with various versions of both and can't seem to resolve.

******************************************************
Launching Kotaemon in your browser, please wait...
******************************************************

[nltk_data] Downloading package punkt_tab to
[nltk_data]     /home/user/kotaemon/install_dir/env/lib/python3.10/sit
[nltk_data]     e-packages/llama_index/core/_static/nltk_cache...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
GraphRAG dependencies not installed. Try `pip install graphrag future` to install. GraphRAG retriever pipeline will not work properly.
Nano-GraphRAG dependencies not installed. Try `pip install nano-graphrag` to install. Nano-GraphRAG retriever pipeline will not work properly.
User "admin" created successfully
Setting up quick upload event
Running on local URL:  http://127.0.0.1:7860
meltedhead commented 17 hours ago

i have managed to get 3.0.5 of futures installed and graphrag version 0.1.1 but still no chat when i launch. when i check pip check i get

 pip check
ipykernel 6.29.5 requires pyzmq, which is not installed.
jupyter-client 8.6.3 requires pyzmq, which is not installed.
gradio 4.39.0 has requirement aiofiles<24.0,>=22.0, but you have aiofiles 24.1.0.

I then uninstall aiofiles and try to reinstall but i get

 pip install "aiofiles<24.0"     
Collecting aiofiles<24.0
  Downloading aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Downloading aiofiles-23.2.1-py3-none-any.whl (15 kB)
Installing collected packages: aiofiles
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
graphrag 0.1.1 requires aiofiles<25.0.0,>=24.1.0, but you have aiofiles 23.2.1 which is incompatible.
Successfully installed aiofiles-23.2.1

Is this the cause? How can i get around this?

meltedhead commented 17 hours ago

Despite the Pip Issues above, I can now see the chat input but still have the compatibility issues. The chats are totally irrelevant though as the LLM can't access my files. Screenshot 2024-11-05 120101

meltedhead commented 17 hours ago

Various errors showing in the log. When i try to upload files for GraphRag i get errors as below.

Traceback (most recent call last):
  File "/home/user/kotaemon/install_dir/env/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/home/user/kotaemon/install_dir/env/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/kotaemon/libs/kotaemon/kotaemon/indices/vectorindex.py", line 203, in query_vectorstore
    vs_docs = self.doc_store.get(vs_ids)
  File "/home/user/kotaemon/libs/kotaemon/kotaemon/storages/docstores/lancedb.py", line 109, in get
    .to_list()
  File "/home/user/kotaemon/install_dir/env/lib/python3.10/site-packages/lancedb/query.py", line 303, in to_list
    return self.to_arrow().to_pylist()
  File "/home/user/kotaemon/install_dir/env/lib/python3.10/site-packages/lancedb/query.py", line 760, in to_arrow
    return ds.to_table(
  File "/home/user/kotaemon/install_dir/env/lib/python3.10/site-packages/lance/dataset.py", line 435, in to_table
    ).to_table()
  File "/home/user/kotaemon/install_dir/env/lib/python3.10/site-packages/lance/dataset.py", line 2202, in to_table
    return self.to_reader().read_all()
  File "pyarrow/ipc.pxi", line 757, in pyarrow.lib.RecordBatchReader.read_all
  File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
OSError: Io error: Execution error: External error: Execution error: ExecNode(Take): thread panicked: task 10 panicked
Got 0 from vectorstore
Got 0 from docstore
Cohere API key not found. Skipping rerankings.
Got raw 0 retrieved documents
thumbnail docs 0 non-thumbnail docs 0 raw-thumbnail docs 0
retrieval step took 1.051011562347412
Got 0 retrieved documents
len (original) 0
Got 0 images
Trying LLM streaming
Got 0 cited docs
User-id: 1, can see public conversations: True
User-id: 1, can see public conversations: True
No row was found when one was required
/home/user/kotaemon/install_dir/env/lib/python3.10/site-packages/gradio/components/dropdown.py:188: UserWarning:

The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include:  or set allow_custom_value=True.

User-id: 1, can see public conversations: True
Session reasoning type simple
Session LLM None
Reasoning class <class 'ktem.reasoning.simple.FullQAPipeline'>
Reasoning state {'app': {'regen': False}, 'pipeline': {}}
Thinking ...
Retrievers [DocumentRetrievalPipeline(DS=<kotaemon.storages.docstores.lancedb.LanceDBDocumentStore object at 0x7ee7d83fcb80>, FSPath=PosixPath('/home/user/kotaemon/ktem_app_data/user_data/files/index_1'), Index=<class 'ktem.index.file.index.IndexTable'>, Source=<class 'ktem.index.file.index.Source'>, VS=<kotaemon.storages.vectorstores.chroma.ChromaVectorStore object at 0x7ee7d83fc970>, get_extra_table=True, llm_scorer=LLMTrulensScoring(concurrent=True, normalize=10, prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7ee7cb7a61d0>, system_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7ee7cb7a6650>, top_k=3, user_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7ee7cb7a57b0>), mmr=True, rerankers=[CohereReranking(cohere_api_key='<COHERE_API_KEY>', model_name='rerank-multilingual-v2.0')], retrieval_mode='hybrid', top_k=10, user_id=1), GraphRAGRetrieverPipeline(DS=<theflow.base.unset_ object at 0x7ee83468a380>, FSPath=<theflow.base.unset_ object at 0x7ee83468a380>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset_ object at 0x7ee83468a380>, VS=<theflow.base.unset_ object at 0x7ee83468a380>, file_ids=[], user_id=<theflow.base.unset_ object at 0x7ee83468a380>)]
searching in doc_ids []
Got 0 retrieved documents
len (original) 0
Got 0 images
Trying LLM streaming
Got 0 cited docs
User-id: 1, can see public conversations: True
User-id: 1, can see public conversations: True
No row was found when one was required
User-id: 1, can see public conversations: True
Session reasoning type simple
Session LLM None
Reasoning class <class 'ktem.reasoning.simple.FullQAPipeline'>
Reasoning state {'app': {'regen': False}, 'pipeline': {}}
Thinking ...
Retrievers [DocumentRetrievalPipeline(DS=<kotaemon.storages.docstores.lancedb.LanceDBDocumentStore object at 0x7ee7d83fcb80>, FSPath=PosixPath('/home/user/kotaemon/ktem_app_data/user_data/files/index_1'), Index=<class 'ktem.index.file.index.IndexTable'>, Source=<class 'ktem.index.file.index.Source'>, VS=<kotaemon.storages.vectorstores.chroma.ChromaVectorStore object at 0x7ee7d83fc970>, get_extra_table=True, llm_scorer=LLMTrulensScoring(concurrent=True, normalize=10, prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7ee7cb7d4040>, system_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7ee7cb7d4a00>, top_k=3, user_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7ee7cb7d4af0>), mmr=True, rerankers=[CohereReranking(cohere_api_key='<COHERE_API_KEY>', model_name='rerank-multilingual-v2.0')], retrieval_mode='hybrid', top_k=10, user_id=1), GraphRAGRetrieverPipeline(DS=<theflow.base.unset_ object at 0x7ee83468a380>, FSPath=<theflow.base.unset_ object at 0x7ee83468a380>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset_ object at 0x7ee83468a380>, VS=<theflow.base.unset_ object at 0x7ee83468a380>, file_ids=[], user_id=<theflow.base.unset_ object at 0x7ee83468a380>)]
searching in doc_ids ['0a1908f7-2b7b-4504-8558-e5f30e1a09f7', '0c28dda8-b4ed-42ad-8fac-91e4ddba8af7', 'a3d6d5be-3833-4100-9342-0827c97b75c7']
retrieval_kwargs: dict_keys(['do_extend', 'scope', 'filters', 'mode', 'mmr_threshold'])
Number of requested results 100 is greater than number of elements in index 74, updating n_results = 74
thread 'lance_background_thread' panicked at /home/runner/work/lance/lance/rust/lance-encoding/src/decoder.rs:686:29:
Expected a list column
Exception in thread Thread-5 (query_vectorstore):
Traceback (most recent call last):
  File "/home/user/kotaemon/install_dir/env/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/home/user/kotaemon/install_dir/env/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/kotaemon/libs/kotaemon/kotaemon/indices/vectorindex.py", line 203, in query_vectorstore
    vs_docs = self.doc_store.get(vs_ids)
  File "/home/user/kotaemon/libs/kotaemon/kotaemon/storages/docstores/lancedb.py", line 109, in get
    .to_list()
  File "/home/user/kotaemon/install_dir/env/lib/python3.10/site-packages/lancedb/query.py", line 303, in to_list
    return self.to_arrow().to_pylist()
  File "/home/user/kotaemon/install_dir/env/lib/python3.10/site-packages/lancedb/query.py", line 760, in to_arrow
    return ds.to_table(
  File "/home/user/kotaemon/install_dir/env/lib/python3.10/site-packages/lance/dataset.py", line 435, in to_table
    ).to_table()
  File "/home/user/kotaemon/install_dir/env/lib/python3.10/site-packages/lance/dataset.py", line 2202, in to_table
    return self.to_reader().read_all()
  File "pyarrow/ipc.pxi", line 757, in pyarrow.lib.RecordBatchReader.read_all
  File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
OSError: Io error: Execution error: External error: Execution error: ExecNode(Take): thread panicked: task 37 panicked
Got 0 from vectorstore
Got 0 from docstore
Cohere API key not found. Skipping rerankings.
Got raw 0 retrieved documents
thumbnail docs 0 non-thumbnail docs 0 raw-thumbnail docs 0
retrieval step took 0.4497072696685791
Got 0 retrieved documents
len (original) 0
Got 0 images
Trying LLM streaming
Got 0 cited docs
Session reasoning type simple
Session LLM None
Reasoning class <class 'ktem.reasoning.simple.FullQAPipeline'>
Reasoning state {'app': {'regen': False}, 'pipeline': {}}
Thinking ...
Retrievers [DocumentRetrievalPipeline(DS=<kotaemon.storages.docstores.lancedb.LanceDBDocumentStore object at 0x7ee7d83fcb80>, FSPath=PosixPath('/home/user/kotaemon/ktem_app_data/user_data/files/index_1'), Index=<class 'ktem.index.file.index.IndexTable'>, Source=<class 'ktem.index.file.index.Source'>, VS=<kotaemon.storages.vectorstores.chroma.ChromaVectorStore object at 0x7ee7d83fc970>, get_extra_table=True, llm_scorer=LLMTrulensScoring(concurrent=True, normalize=10, prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7ee7cb7d6470>, system_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7ee7cb7d6590>, top_k=3, user_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7ee7cb7d6680>), mmr=True, rerankers=[CohereReranking(cohere_api_key='<COHERE_API_KEY>', model_name='rerank-multilingual-v2.0')], retrieval_mode='hybrid', top_k=10, user_id=1), GraphRAGRetrieverPipeline(DS=<theflow.base.unset_ object at 0x7ee83468a380>, FSPath=<theflow.base.unset_ object at 0x7ee83468a380>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset_ object at 0x7ee83468a380>, VS=<theflow.base.unset_ object at 0x7ee83468a380>, file_ids=[], user_id=<theflow.base.unset_ object at 0x7ee83468a380>)]
searching in doc_ids ['0a1908f7-2b7b-4504-8558-e5f30e1a09f7', '0c28dda8-b4ed-42ad-8fac-91e4ddba8af7', 'a3d6d5be-3833-4100-9342-0827c97b75c7']
User-id: 1, can see public conversations: True
retrieval_kwargs: dict_keys(['do_extend', 'scope', 'filters', 'mode', 'mmr_threshold'])
Number of requested results 100 is greater than number of elements in index 74, updating n_results = 74
thread 'lance_background_thread' panicked at /home/runner/work/lance/lance/rust/lance-encoding/src/decoder.rs:686:29:
Expected a list column
Exception in thread Thread-7 (query_vectorstore):
Traceback (most recent call last):
  File "/home/user/kotaemon/install_dir/env/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/home/user/kotaemon/install_dir/env/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/kotaemon/libs/kotaemon/kotaemon/indices/vectorindex.py", line 203, in query_vectorstore
    vs_docs = self.doc_store.get(vs_ids)
  File "/home/user/kotaemon/libs/kotaemon/kotaemon/storages/docstores/lancedb.py", line 109, in get
    .to_list()
  File "/home/user/kotaemon/install_dir/env/lib/python3.10/site-packages/lancedb/query.py", line 303, in to_list
    return self.to_arrow().to_pylist()
  File "/home/user/kotaemon/install_dir/env/lib/python3.10/site-packages/lancedb/query.py", line 760, in to_arrow
    return ds.to_table(
  File "/home/user/kotaemon/install_dir/env/lib/python3.10/site-packages/lance/dataset.py", line 435, in to_table
    ).to_table()
  File "/home/user/kotaemon/install_dir/env/lib/python3.10/site-packages/lance/dataset.py", line 2202, in to_table
    return self.to_reader().read_all()
  File "pyarrow/ipc.pxi", line 757, in pyarrow.lib.RecordBatchReader.read_all
  File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
OSError: Io error: Execution error: External error: Execution error: ExecNode(Take): thread panicked: task 64 panicked
Got 0 from vectorstore
Got 0 from docstore
Cohere API key not found. Skipping rerankings.
Got raw 0 retrieved documents
thumbnail docs 0 non-thumbnail docs 0 raw-thumbnail docs 0
retrieval step took 0.3767068386077881
Got 0 retrieved documents
len (original) 0
Got 0 images
Trying LLM streaming
Got 0 cited docs
use_quick_index_mode False
reader_mode default
Using reader <kotaemon.loaders.excel_loader.PandasExcelReader object at 0x7ee7cb7b73a0>
/home/user/kotaemon/libs/kotaemon/kotaemon/loaders/excel_loader.py:87: FutureWarning:

Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.

Got 0 page thumbnails
Adding documents to doc store
[2024-11-05T12:05:12Z WARN  lance::dataset] No existing dataset at /home/user/kotaemon/ktem_app_data/user_data/docstore/index_2.lance, it will be created
indexing step took 0.3589944839477539
Using reader <kotaemon.loaders.pdf_loader.PDFThumbnailReader object at 0x7ee7cb702050>
/home/user/kotaemon/install_dir/env/lib/python3.10/site-packages/pypdf/_crypt_providers/_cryptography.py:32: CryptographyDeprecationWarning:

ARC4 has been moved to cryptography.hazmat.decrepit.ciphers.algorithms.ARC4 and will be removed from this module in 48.0.0.

Page numbers: 31
Got 31 page thumbnails
Adding documents to doc store
indexing step took 0.8728067874908447
Using reader <kotaemon.loaders.pdf_loader.PDFThumbnailReader object at 0x7ee7cb702050>
Page numbers: 202
Got 202 page thumbnails
Adding documents to doc store
indexing step took 2.9457244873046875
Initializing project at /home/user/kotaemon/ktem_app_data/user_data/files/graphrag/744998a0-91f0-484a-a8d6-24cd005cacc2

/home/user/kotaemon/install_dir/env/lib/python3.10/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
  return bound(*args, **kwds)
meltedhead commented 15 hours ago

I feel like this might be something simple. Any help would be much appreciated as i am wasting a lot of time trying to fix?