Closed thistleknot closed 1 day ago
If you already have openai api compatible endpoints, you can add a new LLM model and a new Embedding model using those endpoints by following these steps https://cinnamon.github.io/kotaemon/usage/#1-add-your-ai-models (also available in the Help tab within the UI). Please ignore the .env file, it's only used for the first time you start the app, it won't have any effect in subsequent runs. Please give it a try.
Anyway, our team is workin on both the setup scripts and more clear document to setup local model (Ollama, GGUF) in the README directly. Stay tuned.
Meanwhile you can try the suggestion from @lone17 here first.
If you followed this guide https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API to set up your endpoint, the endpoint URL you need to add to kotaemon
would be http://127.0.0.1:5000/v1/
. Something like this
I tried to follow those instructions in the url. They don't really explain how to setup gguf properly. Pictures would be nice. For example, in this thread, you say not to use the .env, but in the url it says to use the .env,
.env
and then they show a bunch of parms for llama.cpp but there is no input fields for those parms. It is not clear by how adding a name only, the model path will be inferred locally (for example... if I choose a gguf model... how does the ui know what quantized version I want or know where the path is locally without asking for it in the ui?)
how I tried to setup my api endpoint using text-generation-webui
If you followed this guide https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API to set up your endpoint, the endpoint URL you need to add to
kotaemon
would behttp://127.0.0.1:5000/v1/
. Something like this
thank you. This got me past my error(s)!
well, that did get me past one error, but not the embedding error
Thinking ...
Retrievers [DocumentRetrievalPipeline(DS=<kotaemon.storages.docstores.lancedb.LanceDBDocumentStore object at 0x7fc670142b60>, FSPath=PosixPath('/home/user/kotaemon/ktem_app_data/user_data/files/index_1'), Index=<class 'ktem.index.file.index.IndexTable'>, Source=<class 'ktem.index.file.index.Source'>, VS=<kotaemon.storages.vectorstores.chroma.ChromaVectorStore object at 0x7fc670142e00>, get_extra_table=False, llm_scorer=LLMTrulensScoring(concurrent=True, normalize=10, prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7fc66c1df520>, system_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7fc66c1df640>, top_k=3, user_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7fc66c1df730>), mmr=True, rerankers=[CohereReranking(cohere_api_key='', model_name='rerank-multilingual-v2.0')], retrieval_mode='hybrid', top_k=10, user_id=1), GraphRAGRetrieverPipeline(DS=<theflow.base.unset_ object at 0x7fc6b0aa0a30>, FSPath=<theflow.base.unset_ object at 0x7fc6b0aa0a30>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset_ object at 0x7fc6b0aa0a30>, VS=<theflow.base.unset_ object at 0x7fc6b0aa0a30>, file_ids=[], user_id=<theflow.base.unset_ object at 0x7fc6b0aa0a30>)]
searching in doc_ids ['4914ae98-3916-4aef-8e95-f7b67971b6f2', '0d4527a2-567b-4971-a52f-46a9ee38dd8a', '3f0fc6c2-c402-445d-9a12-45f15780699a']
retrieval_kwargs: dict_keys(['do_extend', 'scope', 'filters', 'mode', 'mmr_threshold'])
Traceback (most recent call last):
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/queueing.py", line 575, in process_events
response = await route_utils.call_process_api(
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/route_utils.py", line 276, in call_process_api
output = await app.get_blocks().process_api(
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api
result = await self.call_function(
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
prediction = await utils.async_iteration(iterator)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 663, in async_iteration
return await iterator.__anext__()
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 656, in __anext__
return await anyio.to_thread.run_sync(
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 639, in run_sync_iterator_async
return next(iterator)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 801, in gen_wrapper
response = next(iterator)
File "/home/user/kotaemon/libs/ktem/ktem/pages/chat/__init__.py", line 804, in chat_fn
for response in pipeline.stream(chat_input, conversation_id, chat_history):
File "/home/user/kotaemon/libs/ktem/ktem/reasoning/simple.py", line 660, in stream
docs, infos = self.retrieve(message, history)
File "/home/user/kotaemon/libs/ktem/ktem/reasoning/simple.py", line 488, in retrieve
retriever_docs = retriever_node(text=query)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1097, in __call__
raise e from None
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1088, in __call__
output = self.fl.exec(func, args, kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
return run(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
raise e from None
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
_output = self.next_call(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
return self.next_call(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
return self.run(*args, **kwargs)
File "/home/user/kotaemon/libs/ktem/ktem/index/file/pipelines.py", line 162, in run
docs = self.vector_retrieval(text=text, top_k=self.top_k, **retrieval_kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1261, in exec
return child(*args, **kwargs, __fl_runstates__=__fl_runstates__)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1097, in __call__
raise e from None
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1088, in __call__
output = self.fl.exec(func, args, kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
return run(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
raise e from None
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
_output = self.next_call(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
return self.next_call(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
return self.run(*args, **kwargs)
File "/home/user/kotaemon/libs/kotaemon/kotaemon/indices/vectorindex.py", line 188, in run
emb = self.embedding(text)[0].embedding
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1261, in exec
return child(*args, **kwargs, __fl_runstates__=__fl_runstates__)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1675, in __call__
return self._create_callable(getattr(self.ff_original_obj, "__call__"))(
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1663, in wrapper
raise e from None
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1661, in wrapper
output = callable_obj(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
raise e from None
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
_output = self.next_call(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
return self.next_call(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1261, in exec
return child(*args, **kwargs, __fl_runstates__=__fl_runstates__)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1097, in __call__
raise e from None
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1088, in __call__
output = self.fl.exec(func, args, kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
return run(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
raise e from None
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
_output = self.next_call(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
return self.next_call(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
return self.run(*args, **kwargs)
File "/home/user/kotaemon/libs/kotaemon/kotaemon/embeddings/base.py", line 10, in run
return self.invoke(text, *args, **kwargs)
File "/home/user/kotaemon/libs/kotaemon/kotaemon/embeddings/openai.py", line 104, in invoke
resp = self.openai_response(client, input=input_, **kwargs).dict()
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/tenacity/__init__.py", line 289, in wrapped_f
return self(f, *args, **kw)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__
do = self.iter(retry_state=retry_state)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/tenacity/__init__.py", line 326, in iter
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7fc66c216620 state=finished raised InternalServerError>]
@thistleknot yeah sorry for that confusing instruction. The .env section was a different way to set up GGUF models, if you already have an API endpoint deployed using text-generation-webui, please ignore that part.
Please try it out, sorry for not having clear instructions, the team is working on it.
this is what I had set but I'll try to follow the guidelines you just pasted as well as try this fast embeddings option that uses a local model. I will likely try this instead of trying to route through text-generation-webui (the message below). I'm not sure if I need to do anything else, I set the same model in my settings.yaml for text-generation-webui
relevant text-generation-webui settings.yaml
openai-embedding_device: cuda
openai-embedding_model: "sentence-transformers/all-MiniLM-L6-v2"
openai-sd_webui_url: http://192.168.3.17:7861
![Uploading image.png…]()
Thinking ... Retrievers [DocumentRetrievalPipeline(DS=<kotaemon.storages.docstores.lancedb.LanceDBDocumentStore obj ect at 0x7f435c882e90>, FSPath=PosixPath('/home/user/kotaemon/ktem_app_data/user_data/files/index_1'), Index=<class 'ktem.index.file.index.IndexTable'>, Source=<class 'ktem.index.file.index.Source'>, VS=< kotaemon.storages.vectorstores.chroma.ChromaVectorStore object at 0x7f435c883130>, get_extra_table=Fal se, llm_scorer=LLMTrulensScoring(concurrent=True, normalize=10, prompt_template=<kotaemon.llms.prompts .template.PromptTemplate object at 0x7f4358321900>, system_prompt_template=<kotaemon.llms.prompts.temp late.PromptTemplate object at 0x7f4358320f40>, top_k=3, user_prompt_template=<kotaemon.llms.prompts.te mplate.PromptTemplate object at 0x7f43583227a0>), mmr=True, rerankers=[CohereReranking(cohere_api_key= '', model_name='rerank-multilingual-v2.0')], retrieval_mode='hybrid', top_k=10, userid=1), GraphRAGRe trieverPipeline(DS=<theflow.base.unset object at 0x7f439d0a0a30>, FSPath=<theflow.base.unset object at 0x7f439d0a0a30>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset obje ct at 0x7f439d0a0a30>, VS=<theflow.base.unset_ object at 0x7f439d0a0a30>, file_ids=[], userid=<theflo w.base.unset object at 0x7f439d0a0a30>)] searching in doc_ids ['4914ae98-3916-4aef-8e95-f7b67971b6f2', '0d4527a2-567b-4971-a52f-46a9ee38dd8a', '3f0fc6c2-c402-445d-9a12-45f15780699a'] retrieval_kwargs: dict_keys(['do_extend', 'scope', 'filters', 'mode', 'mmr_threshold']) Traceback (most recent call last): File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/queueing.py", line 575 , in process_events response = await route_utils.call_process_api( File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/route_utils.py", line 276, in call_process_api output = await app.get_blocks().process_api( File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api result = await self.call_function( File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function prediction = await utils.async_iteration(iterator) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 663, i n async_iteration return await iterator.anext() File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 656, i n anext return await anyio.to_thread.run_sync( File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread return await future File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run result = context.run(func, args) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 639, i n run_sync_iterator_async return next(iterator) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 801, i n gen_wrapper response = next(iterator) File "/home/user/kotaemon/libs/ktem/ktem/pages/chat/init.py", line 804, in chat_fn for response in pipeline.stream(chat_input, conversation_id, chat_history): File "/home/user/kotaemon/libs/ktem/ktem/reasoning/simple.py", line 660, in stream docs, infos = self.retrieve(message, history) File "/home/user/kotaemon/libs/ktem/ktem/reasoning/simple.py", line 488, in retrieve retriever_docs = retriever_node(text=query) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1097, in call raise e from None File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1088, in call output = self.fl.exec(func, args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/backends/base.py", li ne 151, in exec return run(args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in call raise e from None File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in call _output = self.next_call(*args, *kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in call return self.next_call(args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx return self.run(args, kwargs) File "/home/user/kotaemon/libs/ktem/ktem/index/file/pipelines.py", line 162, in run docs = self.vector_retrieval(text=text, top_k=self.top_k, retrieval_kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1261, in exec return child(args, kwargs, fl_runstates=fl_runstates) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1097, in call raise e from None File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1088, in call output = self.fl.exec(func, args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/backends/base.py", li ne 151, in exec return run(*args, *kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in call raise e from None File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in call _output = self.next_call(args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in call return self.next_call(*args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx return self.run(*args, *kwargs) File "/home/user/kotaemon/libs/kotaemon/kotaemon/indices/vectorindex.py", line 188, in run emb = self.embedding(text)[0].embedding File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1261, in exec return child(args, kwargs, fl_runstates=fl_runstates) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1675, in call return self._create_callable(getattr(self.ff_original_obj, "call"))( File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1663, in wrapper raise e from None File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1661, in wrapper output = callable_obj(*args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in call raise e from None File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in call _output = self.next_call(*args, *kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in call return self.next_call(args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1261, in exec return child(*args, kwargs, fl_runstates=fl_runstates) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1097, in call raise e from None File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1088, in call output = self.fl.exec(func, args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/backends/base.py", li ne 151, in exec return run(*args, *kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in call raise e from None File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in call _output = self.next_call(args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in call return self.next_call(*args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx return self.run(*args, *kwargs) File "/home/user/kotaemon/libs/kotaemon/kotaemon/embeddings/base.py", line 10, in run return self.invoke(text, args, kwargs) File "/home/user/kotaemon/libs/kotaemon/kotaemon/embeddings/openai.py", line 104, in invoke resp = self.openairesponse(client, input=input, *kwargs).dict() File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/tenacity/init.py", line 2 89, in wrapped_f return self(f, args, **kw) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/tenacity/init.py", line 3 79, in call do = self.iter(retry_state=retry_state) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/tenacity/init.py", line 3 26, in iter raise retry_exc from fut.exception() tenacity.RetryError: RetryError[<Future at 0x7f435812e590 state=finished raised InternalServerError>]
tried the fast embeddings route using the default fast embedding option
Thinking ...
Retrievers [DocumentRetrievalPipeline(DS=<kotaemon.storages.docstores.lancedb.LanceDBDocumentStore object at 0x7f435c882e90>, FSPath=PosixPath('/home/user/kotaemon/ktem_app_data/user_data/files/index_1'), Index=<class 'ktem.index.file.index.IndexTable'>, Source=<class 'ktem.index.file.index.Source'>, VS=<kotaemon.storages.vectorstores.chroma.ChromaVectorStore object at 0x7f435c883130>, get_extra_table=False, llm_scorer=LLMTrulensScoring(concurrent=True, normalize=10, prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7f435812d300>, system_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7f435812cdc0>, top_k=3, user_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7f435812fb50>), mmr=True, rerankers=[CohereReranking(cohere_api_key='', model_name='rerank-multilingual-v2.0')], retrieval_mode='hybrid', top_k=10, user_id=1), GraphRAGRetrieverPipeline(DS=<theflow.base.unset_ object at 0x7f439d0a0a30>, FSPath=<theflow.base.unset_ object at 0x7f439d0a0a30>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset_ object at 0x7f439d0a0a30>, VS=<theflow.base.unset_ object at 0x7f439d0a0a30>, file_ids=[], user_id=<theflow.base.unset_ object at 0x7f439d0a0a30>)]
searching in doc_ids ['4914ae98-3916-4aef-8e95-f7b67971b6f2', '0d4527a2-567b-4971-a52f-46a9ee38dd8a', '3f0fc6c2-c402-445d-9a12-45f15780699a']
retrieval_kwargs: dict_keys(['do_extend', 'scope', 'filters', 'mode', 'mmr_threshold'])
Traceback (most recent call last):
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/queueing.py", line 575, in process_events
response = await route_utils.call_process_api(
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/route_utils.py", line 276, in call_process_api
output = await app.get_blocks().process_api(
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api
result = await self.call_function(
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
prediction = await utils.async_iteration(iterator)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 663, in async_iteration
return await iterator.__anext__()
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 656, in __anext__
return await anyio.to_thread.run_sync(
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 639, in run_sync_iterator_async
return next(iterator)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 801, in gen_wrapper
response = next(iterator)
File "/home/user/kotaemon/libs/ktem/ktem/pages/chat/__init__.py", line 804, in chat_fn
for response in pipeline.stream(chat_input, conversation_id, chat_history):
File "/home/user/kotaemon/libs/ktem/ktem/reasoning/simple.py", line 660, in stream
docs, infos = self.retrieve(message, history)
File "/home/user/kotaemon/libs/ktem/ktem/reasoning/simple.py", line 488, in retrieve
retriever_docs = retriever_node(text=query)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1097, in __call__
raise e from None
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1088, in __call__
output = self.fl.exec(func, args, kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
return run(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
raise e from None
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
_output = self.next_call(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
return self.next_call(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
return self.run(*args, **kwargs)
File "/home/user/kotaemon/libs/ktem/ktem/index/file/pipelines.py", line 162, in run
docs = self.vector_retrieval(text=text, top_k=self.top_k, **retrieval_kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1261, in exec
return child(*args, **kwargs, __fl_runstates__=__fl_runstates__)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1097, in __call__
raise e from None
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1088, in __call__
output = self.fl.exec(func, args, kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
return run(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
raise e from None
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
_output = self.next_call(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
return self.next_call(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
return self.run(*args, **kwargs)
File "/home/user/kotaemon/libs/kotaemon/kotaemon/indices/vectorindex.py", line 188, in run
emb = self.embedding(text)[0].embedding
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1261, in exec
return child(*args, **kwargs, __fl_runstates__=__fl_runstates__)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1675, in __call__
return self._create_callable(getattr(self.ff_original_obj, "__call__"))(
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1663, in wrapper
raise e from None
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1661, in wrapper
output = callable_obj(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
raise e from None
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
_output = self.next_call(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
return self.next_call(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1261, in exec
return child(*args, **kwargs, __fl_runstates__=__fl_runstates__)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1097, in __call__
raise e from None
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1088, in __call__
output = self.fl.exec(func, args, kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
return run(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
raise e from None
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
_output = self.next_call(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
return self.next_call(*args, **kwargs)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
return self.run(*args, **kwargs)
File "/home/user/kotaemon/libs/kotaemon/kotaemon/embeddings/base.py", line 10, in run
return self.invoke(text, *args, **kwargs)
File "/home/user/kotaemon/libs/kotaemon/kotaemon/embeddings/openai.py", line 104, in invoke
resp = self.openai_response(client, input=input_, **kwargs).dict()
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/tenacity/__init__.py", line 289, in wrapped_f
return self(f, *args, **kw)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__
do = self.iter(retry_state=retry_state)
File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/tenacity/__init__.py", line 326, in iter
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7f43582c9120 state=finished raised InternalServerError>]
After changing the embedding for the File Collection like mentioned in https://github.com/Cinnamon/kotaemon/issues/143#issuecomment-2315620503, have you tried restarting the app ? Sorry I couldn't test your set up now, so I will briefly describe how the app works, maybe it could help you with debugging.
Basically there are 3 main components we need to care about:
In your fast embedding logs, it looks like that the app still uses the OpenAIEmbeddings connector, so I guess you haven't changed the embedding model of the File Collection or you haven't restart the app. (the app needs to be restarted because these settings are stored in a db, when the app starts it will read from the db to set up the corresponding models. When you make changes to the settings/specifications, it will update the db but doesn't reload the model, hence the need for a restart).
Hope that helps. The team is working on a better doc, I hope it will come out soon. Note I'm not a part of the team so I cannot promise anything on their behalf, but I'm sure they are working hard on it. Thank you for your interest in this project, I hope this hiccup doesn't discourage you from supporting it in the future.
I got further
I went back to the basics and constructed a minimum viable example of a curl call
curl http://127.0.0.1:5000/v1/embeddings -H "Content-Type: application/json" -H "Authorization: Bearer blahblahblah" -d '{
"input": "Your text goes here",
"model": "text-embedding-ada-002"
}'
this told me I had to change the name of the model to text-embedding-ada-002 (despite what is in settings.yaml, although I think in settings.yaml, you still need to specify the backend model that will be used in place, the port is still 5000 for text-generation-webui)
then I went in the file setting as someone had pointed out and swapped 'openai' with 'local'
so far no errors, but no inference either
changing settings and asking to restart is a no go with a docker container fyi in fact, restarting the docker container wipes out everything
I can't set the llm for relevant scoring, I think that's what is causing the below error in docker (while everything else seems to work) and stops the response short at just a single word
retrieval step took 8.569526195526123
[2024-09-01 01:55:26,512][INFO] - Downloading https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.ftz to lid.176.ftz (916.0K)
100%|█████████████████████████████████████████████████████████████| 916k/916k [00:00<00:00, 1.48MB/s]
lang en
highlight_text Nau, Au, Ilghami, Kuter, Murdock, Wu, & Yaman true
lang en
highlight_text SHOP2: An HTN Planning System true
lang en
highlight_text Table 3: Resolution Methods for Task Interactions in true
lang en
highlight_text Nau, Au, Ilghami, Kuter, Murdock, Wu, & Yaman true
lang en
highlight_text consider when analysing the planners a resource hierarchy, as shown in Figure 8. Each element of the hierarchy is true
lang en
highlight_text hyper-parameter, a multinomial distribution can be true
lang en
highlight_text 3.1.2 Expressiveness true
lang en
highlight_text SHOP2: An HTN Planning System true
lang en
highlight_text Nau, Au, Ilghami, Kuter, Murdock, Wu, & Yaman true
lang en
highlight_text •Variables can be allowed or not in the planning problem. true
Got 10 retrieved documents
len (original) 29626
len (trimmed) 29626
Got 3 images
Trying LLM streaming
CitationPipeline: invoking LLM
Exception in thread Thread-13 (generate_relevant_scores):
Traceback (most recent call last):
File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/app/libs/ktem/ktem/reasoning/simple.py", line 668, in generate_relevant_scores
docs = self.retrievers[0].generate_relevant_scores(message, docs)
File "/app/libs/ktem/ktem/index/file/pipelines.py", line 206, in generate_relevant_scores
else self.llm_scorer(documents=documents, query=query)
File "/usr/local/lib/python3.10/site-packages/theflow/base.py", line 1097, in __call__
raise e from None
File "/usr/local/lib/python3.10/site-packages/theflow/base.py", line 1088, in __call__
output = self.fl.exec(func, args, kwargs)
File "/usr/local/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
return run(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
raise e from None
File "/usr/local/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
_output = self.next_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
return self.next_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
return self.run(*args, **kwargs)
File "/app/libs/kotaemon/kotaemon/indices/rankings/llm_trulens.py", line 150, in run
results = [future.result() for future in futures]
File "/app/libs/kotaemon/kotaemon/indices/rankings/llm_trulens.py", line 150, in <listcomp>
results = [future.result() for future in futures]
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/app/libs/kotaemon/kotaemon/indices/rankings/llm_trulens.py", line 146, in llm_call
return self.llm(messages).text
File "/usr/local/lib/python3.10/site-packages/theflow/base.py", line 1097, in __call__
raise e from None
File "/usr/local/lib/python3.10/site-packages/theflow/base.py", line 1088, in __call__
output = self.fl.exec(func, args, kwargs)
File "/usr/local/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
return run(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
raise e from None
File "/usr/local/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
_output = self.next_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
return self.next_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
return self.run(*args, **kwargs)
File "/app/libs/kotaemon/kotaemon/llms/base.py", line 25, in run
return self.invoke(*args, **kwargs)
File "/app/libs/kotaemon/kotaemon/llms/chats/openai.py", line 204, in invoke
resp = self.openai_response(
File "/app/libs/kotaemon/kotaemon/llms/chats/openai.py", line 313, in openai_response
return client.chat.completions.create(**params)
File "/usr/local/lib/python3.10/site-packages/openai/_utils/_utils.py", line 274, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 668, in create
return self._post(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1260, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 937, in request
return self._request(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1041, in _request
raise self._make_status_error_from_response(err.response) from None
openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: openai_key. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}
^[[DUser-id: 1, can see public conversations: True
^CKeyboard interruption in main thread... closing server.
^CTraceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 2690, in block_thread
time.sleep(0.1)
KeyboardInterrupt
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/app.py", line 17, in <module>
demo.queue().launch(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 2595, in launch
self.block_thread()
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 2694, in block_thread
self.server.close()
File "/usr/local/lib/python3.10/site-packages/gradio/http_server.py", line 68, in close
self.thread.join(timeout=5)
File "/usr/local/lib/python3.10/threading.py", line 1100, in join
self._wait_for_tstate_lock(timeout=max(timeout, 0))
File "/usr/local/lib/python3.10/threading.py", line 1116, in _wait_for_tstate_lock
if lock.acquire(block, timeout):
KeyboardInterrupt
^CException ignored in atexit callback: <function exit_cacert_ctx at 0x7fab042de290>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/certifi/core.py", line 10, in exit_cacert_ctx
Exception ignored in atexit callback: <function _exit_function at 0x7f8454b14310>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/multiprocessing/util.py", line 320, in _exit_function
def exit_cacert_ctx() -> None:
KeyboardInterrupt:
def _exit_function(info=info, debug=debug, _run_finalizers=_run_finalizers,
Hi @thistleknot, you can persist app state by mounting /app/ktem_app_data
folder outside.
Something like:
docker run \
-e GRADIO_SERVER_NAME=0.0.0.0 \
-e GRADIO_SERVER_PORT=7860 \
-v ./ktem_app_data:/app/ktem_app_data \
-p 7860:7860 -it \
taprosoft/kotaemon-demo:v1.0
All of your data and app state will be stored in the path ./ktem_app_data
so it persists across runs.
For your LLM relevant score problem please check our Local model setup guide.
docker run -e GRADIO_SERVER_NAME=0.0.0.0 -e GRADIO_SERVER_PORT=7860 -v /data/ktem_app_data:/app/ktem_app_data -p 7860:7860 -it --rm taprosoft/kotaemon:v1.0
Reasoning class <class 'ktem.reasoning.simple.FullQAPipeline'>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
response = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 276, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
prediction = await utils.async_iteration(iterator)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 663, in async_iteration
return await iterator.__anext__()
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 656, in __anext__
return await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 639, in run_sync_iterator_async
return next(iterator)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 801, in gen_wrapper
response = next(iterator)
File "/app/libs/ktem/ktem/pages/chat/__init__.py", line 785, in chat_fn
pipeline, reasoning_state = self.create_pipeline(
File "/app/libs/ktem/ktem/pages/chat/__init__.py", line 752, in create_pipeline
iretrievers = index.get_retriever_pipelines(
File "/app/libs/ktem/ktem/index/file/index.py", line 423, in get_retriever_pipelines
obj = cls.get_pipeline(stripped_settings, self.config, selected_ids)
File "/app/libs/ktem/ktem/index/file/pipelines.py", line 281, in get_pipeline
embedding=embedding_models_manager[
File "/app/libs/ktem/ktem/embeddings/manager.py", line 65, in __getitem__
return self._models[key]
KeyError: 'local'
I think having a video of starting from scratch would help, but that's just me. That way there is no way a step was missed.
edit: nm, the error was due to a malformed gguf
however, after seeing my gpu being used... eventually no output, just 'thinking'
I got a little bit further
but it stops after one word.
docker run --name kotaemon -e GRADIO_SERVER_NAME=0.0.0.0 -e GRADIO_SERVER_PORT=7860 -v /data/ktem_app_data:/app/ktem_app_data -p 7860:7860 -it taprosoft/kotaemon:v1.0
persists the settings, but not the document store
On Sun, Sep 1, 2024 at 2:02 AM Tuan Anh Nguyen Dang (Tadashi_Cin) < @.***> wrote:
Hi @thistleknot https://github.com/thistleknot, you can persist app state by mounting /app/ktem_app/data folder outside. Something like:
docker run \ -e GRADIO_SERVER_NAME=0.0.0.0 \ -e GRADIO_SERVER_PORT=7860 \ -v ./ktem_app_data:/app/ktem_app_data \ -p 7860:7860 -it \ taprosoft/kotaemon-demo:v1.0
All of your data and app state will be stored in the path ./ktem_app_data so it persists across runs.
For your LLM relevant score problem please check our Local model setup guide https://github.com/Cinnamon/kotaemon/blob/main/docs/local_model.md .
— Reply to this email directly, view it on GitHub https://github.com/Cinnamon/kotaemon/issues/150#issuecomment-2323242283, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHKKOV5RJTEPO2JUWWC6D3ZULJZPAVCNFSM6AAAAABNKWGKQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRTGI2DEMRYGM . You are receiving this because you were mentioned.Message ID: @.***>
I'm going to close this
I did eventually get kotaemon working =D
thank you
I think I had my embedding model spelled incorrectly
for others
docker run \
-e GRADIO_SERVER_NAME=0.0.0.0 \
-e GRADIO_SERVER_PORT=7860 \
-p 7860:7860 -it --rm \
-v /data/ktem_app_data:/app/ktem_app_data \
-it ghcr.io/cinnamon/kotaemon:main-full
how I started my server.py
python server.py --api --listen --n-gpu-layers 32 --threads 8 --numa --tensorcores --trust-remote-code
(textgen) [root@pve-m7330 text-generation-webui]# head settings.yaml -n 20 openai-embedding_device: cuda openai-embedding_model: "sentence-transformers/all-MiniLM-L6-v2" openai-sd_webui_url: http://192.168.3.17:7861 openai-debug: 1
Reference Issues
No response
Summary
instructions for using gguf locally for both embeddings and model.
Basic Example
I tried to get this setup to work with text-generation-webui openai api endpoint, but it threw errors on the embeddings... I see gguf local model support is supported, but without clear instructions on how to implement it (how/where do I specify in .env or in ui) I'm not sure how to implement it.
Drawbacks
no drawbacks, simply asking for updated documentation
Additional information
No response