[REQUEST] - Update documentation for gguf models for both embeddings and inference

thistleknot commented 2 weeks ago

Reference Issues

No response

Summary

instructions for using gguf locally for both embeddings and model.

Basic Example

I tried to get this setup to work with text-generation-webui openai api endpoint, but it threw errors on the embeddings... I see gguf local model support is supported, but without clear instructions on how to implement it (how/where do I specify in .env or in ui) I'm not sure how to implement it.

Drawbacks

no drawbacks, simply asking for updated documentation

Additional information

No response

lone17 commented 2 weeks ago

If you already have openai api compatible endpoints, you can add a new LLM model and a new Embedding model using those endpoints by following these steps https://cinnamon.github.io/kotaemon/usage/#1-add-your-ai-models (also available in the Help tab within the UI). Please ignore the .env file, it's only used for the first time you start the app, it won't have any effect in subsequent runs. Please give it a try.

taprosoft commented 2 weeks ago

Anyway, our team is workin on both the setup scripts and more clear document to setup local model (Ollama, GGUF) in the README directly. Stay tuned.

taprosoft commented 2 weeks ago

Meanwhile you can try the suggestion from @lone17 here first.

lone17 commented 2 weeks ago

If you followed this guide https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API to set up your endpoint, the endpoint URL you need to add to kotaemon would be http://127.0.0.1:5000/v1/. Something like this

thistleknot commented 2 weeks ago

I tried to follow those instructions in the url. They don't really explain how to setup gguf properly. Pictures would be nice. For example, in this thread, you say not to use the .env, but in the url it says to use the .env,

.env

and then they show a bunch of parms for llama.cpp but there is no input fields for those parms. It is not clear by how adding a name only, the model path will be inferred locally (for example... if I choose a gguf model... how does the ui know what quantized version I want or know where the path is locally without asking for it in the ui?)

how I tried to setup my api endpoint using text-generation-webui

thistleknot commented 2 weeks ago

If you followed this guide https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API to set up your endpoint, the endpoint URL you need to add to kotaemon would be http://127.0.0.1:5000/v1/. Something like this

thank you. This got me past my error(s)!

thistleknot commented 2 weeks ago

well, that did get me past one error, but not the embedding error

Thinking ...
Retrievers [DocumentRetrievalPipeline(DS=<kotaemon.storages.docstores.lancedb.LanceDBDocumentStore object at 0x7fc670142b60>, FSPath=PosixPath('/home/user/kotaemon/ktem_app_data/user_data/files/index_1'), Index=<class 'ktem.index.file.index.IndexTable'>, Source=<class 'ktem.index.file.index.Source'>, VS=<kotaemon.storages.vectorstores.chroma.ChromaVectorStore object at 0x7fc670142e00>, get_extra_table=False, llm_scorer=LLMTrulensScoring(concurrent=True, normalize=10, prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7fc66c1df520>, system_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7fc66c1df640>, top_k=3, user_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7fc66c1df730>), mmr=True, rerankers=[CohereReranking(cohere_api_key='', model_name='rerank-multilingual-v2.0')], retrieval_mode='hybrid', top_k=10, user_id=1), GraphRAGRetrieverPipeline(DS=<theflow.base.unset_ object at 0x7fc6b0aa0a30>, FSPath=<theflow.base.unset_ object at 0x7fc6b0aa0a30>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset_ object at 0x7fc6b0aa0a30>, VS=<theflow.base.unset_ object at 0x7fc6b0aa0a30>, file_ids=[], user_id=<theflow.base.unset_ object at 0x7fc6b0aa0a30>)]
searching in doc_ids ['4914ae98-3916-4aef-8e95-f7b67971b6f2', '0d4527a2-567b-4971-a52f-46a9ee38dd8a', '3f0fc6c2-c402-445d-9a12-45f15780699a']
retrieval_kwargs: dict_keys(['do_extend', 'scope', 'filters', 'mode', 'mmr_threshold'])
Traceback (most recent call last):
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/queueing.py", line 575, in process_events
    response = await route_utils.call_process_api(
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api
    result = await self.call_function(
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 663, in async_iteration
    return await iterator.__anext__()
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 656, in __anext__
    return await anyio.to_thread.run_sync(
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 639, in run_sync_iterator_async
    return next(iterator)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 801, in gen_wrapper
    response = next(iterator)
  File "/home/user/kotaemon/libs/ktem/ktem/pages/chat/__init__.py", line 804, in chat_fn
    for response in pipeline.stream(chat_input, conversation_id, chat_history):
  File "/home/user/kotaemon/libs/ktem/ktem/reasoning/simple.py", line 660, in stream
    docs, infos = self.retrieve(message, history)
  File "/home/user/kotaemon/libs/ktem/ktem/reasoning/simple.py", line 488, in retrieve
    retriever_docs = retriever_node(text=query)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1097, in __call__
    raise e from None
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1088, in __call__
    output = self.fl.exec(func, args, kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
    return run(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
    raise e from None
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
    _output = self.next_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
    return self.next_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
    return self.run(*args, **kwargs)
  File "/home/user/kotaemon/libs/ktem/ktem/index/file/pipelines.py", line 162, in run
    docs = self.vector_retrieval(text=text, top_k=self.top_k, **retrieval_kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1261, in exec
    return child(*args, **kwargs, __fl_runstates__=__fl_runstates__)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1097, in __call__
    raise e from None
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1088, in __call__
    output = self.fl.exec(func, args, kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
    return run(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
    raise e from None
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
    _output = self.next_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
    return self.next_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
    return self.run(*args, **kwargs)
  File "/home/user/kotaemon/libs/kotaemon/kotaemon/indices/vectorindex.py", line 188, in run
    emb = self.embedding(text)[0].embedding
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1261, in exec
    return child(*args, **kwargs, __fl_runstates__=__fl_runstates__)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1675, in __call__
    return self._create_callable(getattr(self.ff_original_obj, "__call__"))(
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1663, in wrapper
    raise e from None
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1661, in wrapper
    output = callable_obj(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
    raise e from None
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
    _output = self.next_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
    return self.next_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1261, in exec
    return child(*args, **kwargs, __fl_runstates__=__fl_runstates__)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1097, in __call__
    raise e from None
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1088, in __call__
    output = self.fl.exec(func, args, kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
    return run(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
    raise e from None
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
    _output = self.next_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
    return self.next_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
    return self.run(*args, **kwargs)
  File "/home/user/kotaemon/libs/kotaemon/kotaemon/embeddings/base.py", line 10, in run
    return self.invoke(text, *args, **kwargs)
  File "/home/user/kotaemon/libs/kotaemon/kotaemon/embeddings/openai.py", line 104, in invoke
    resp = self.openai_response(client, input=input_, **kwargs).dict()
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/tenacity/__init__.py", line 326, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7fc66c216620 state=finished raised InternalServerError>]

lone17 commented 2 weeks ago

@thistleknot yeah sorry for that confusing instruction. The .env section was a different way to set up GGUF models, if you already have an API endpoint deployed using text-generation-webui, please ignore that part.

After you change the embedding model, you also need to switch to using it in the File Index Collection like described here https://github.com/Cinnamon/kotaemon/issues/143#issuecomment-2315620503
You might also need to do this step as well https://github.com/Cinnamon/kotaemon/issues/143#issuecomment-2315646439

Please try it out, sorry for not having clear instructions, the team is working on it.

thistleknot commented 2 weeks ago

this is what I had set but I'll try to follow the guidelines you just pasted as well as try this fast embeddings option that uses a local model. I will likely try this instead of trying to route through text-generation-webui (the message below). I'm not sure if I need to do anything else, I set the same model in my settings.yaml for text-generation-webui

relevant text-generation-webui settings.yaml

openai-embedding_device: cuda
openai-embedding_model: "sentence-transformers/all-MiniLM-L6-v2"
openai-sd_webui_url: http://192.168.3.17:7861


![Uploading image.png…]()

Thinking ... Retrievers [DocumentRetrievalPipeline(DS=<kotaemon.storages.docstores.lancedb.LanceDBDocumentStore obj ect at 0x7f435c882e90>, FSPath=PosixPath('/home/user/kotaemon/ktem_app_data/user_data/files/index_1'), Index=<class 'ktem.index.file.index.IndexTable'>, Source=<class 'ktem.index.file.index.Source'>, VS=< kotaemon.storages.vectorstores.chroma.ChromaVectorStore object at 0x7f435c883130>, get_extra_table=Fal se, llm_scorer=LLMTrulensScoring(concurrent=True, normalize=10, prompt_template=<kotaemon.llms.prompts .template.PromptTemplate object at 0x7f4358321900>, system_prompt_template=<kotaemon.llms.prompts.temp late.PromptTemplate object at 0x7f4358320f40>, top_k=3, user_prompt_template=<kotaemon.llms.prompts.te mplate.PromptTemplate object at 0x7f43583227a0>), mmr=True, rerankers=[CohereReranking(cohere_api_key= '', model_name='rerank-multilingual-v2.0')], retrieval_mode='hybrid', top_k=10, userid=1), GraphRAGRe trieverPipeline(DS=<theflow.base.unset object at 0x7f439d0a0a30>, FSPath=<theflow.base.unset object at 0x7f439d0a0a30>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset obje ct at 0x7f439d0a0a30>, VS=<theflow.base.unset_ object at 0x7f439d0a0a30>, file_ids=[], userid=<theflo w.base.unset object at 0x7f439d0a0a30>)] searching in doc_ids ['4914ae98-3916-4aef-8e95-f7b67971b6f2', '0d4527a2-567b-4971-a52f-46a9ee38dd8a', '3f0fc6c2-c402-445d-9a12-45f15780699a'] retrieval_kwargs: dict_keys(['do_extend', 'scope', 'filters', 'mode', 'mmr_threshold']) Traceback (most recent call last): File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/queueing.py", line 575 , in process_events response = await route_utils.call_process_api( File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/route_utils.py", line 276, in call_process_api output = await app.get_blocks().process_api( File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api result = await self.call_function( File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function prediction = await utils.async_iteration(iterator) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 663, i n async_iteration return await iterator.anext() File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 656, i n anext return await anyio.to_thread.run_sync( File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread return await future File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run result = context.run(func, args) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 639, i n run_sync_iterator_async return next(iterator) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 801, i n gen_wrapper response = next(iterator) File "/home/user/kotaemon/libs/ktem/ktem/pages/chat/init.py", line 804, in chat_fn for response in pipeline.stream(chat_input, conversation_id, chat_history): File "/home/user/kotaemon/libs/ktem/ktem/reasoning/simple.py", line 660, in stream docs, infos = self.retrieve(message, history) File "/home/user/kotaemon/libs/ktem/ktem/reasoning/simple.py", line 488, in retrieve retriever_docs = retriever_node(text=query) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1097, in call raise e from None File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1088, in call output = self.fl.exec(func, args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/backends/base.py", li ne 151, in exec return run(args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in call raise e from None File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in call _output = self.next_call(*args, *kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in call return self.next_call(args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx return self.run(args, kwargs) File "/home/user/kotaemon/libs/ktem/ktem/index/file/pipelines.py", line 162, in run docs = self.vector_retrieval(text=text, top_k=self.top_k, retrieval_kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1261, in exec return child(args, kwargs, fl_runstates=fl_runstates) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1097, in call raise e from None File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1088, in call output = self.fl.exec(func, args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/backends/base.py", li ne 151, in exec return run(*args, *kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in call raise e from None File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in call _output = self.next_call(args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in call return self.next_call(*args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx return self.run(*args, *kwargs) File "/home/user/kotaemon/libs/kotaemon/kotaemon/indices/vectorindex.py", line 188, in run emb = self.embedding(text)[0].embedding File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1261, in exec return child(args, kwargs, fl_runstates=fl_runstates) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1675, in call return self._create_callable(getattr(self.ff_original_obj, "call"))( File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1663, in wrapper raise e from None File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1661, in wrapper output = callable_obj(*args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in call raise e from None File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in call _output = self.next_call(*args, *kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in call return self.next_call(args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1261, in exec return child(*args, kwargs, fl_runstates=fl_runstates) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1097, in call raise e from None File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1088, in call output = self.fl.exec(func, args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/backends/base.py", li ne 151, in exec return run(*args, *kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in call raise e from None File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in call _output = self.next_call(args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in call return self.next_call(*args, kwargs) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx return self.run(*args, *kwargs) File "/home/user/kotaemon/libs/kotaemon/kotaemon/embeddings/base.py", line 10, in run return self.invoke(text, args, kwargs) File "/home/user/kotaemon/libs/kotaemon/kotaemon/embeddings/openai.py", line 104, in invoke resp = self.openairesponse(client, input=input, *kwargs).dict() File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/tenacity/init.py", line 2 89, in wrapped_f return self(f, args, **kw) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/tenacity/init.py", line 3 79, in call do = self.iter(retry_state=retry_state) File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/tenacity/init.py", line 3 26, in iter raise retry_exc from fut.exception() tenacity.RetryError: RetryError[<Future at 0x7f435812e590 state=finished raised InternalServerError>]

thistleknot commented 2 weeks ago

tried the fast embeddings route using the default fast embedding option

Thinking ...
Retrievers [DocumentRetrievalPipeline(DS=<kotaemon.storages.docstores.lancedb.LanceDBDocumentStore object at 0x7f435c882e90>, FSPath=PosixPath('/home/user/kotaemon/ktem_app_data/user_data/files/index_1'), Index=<class 'ktem.index.file.index.IndexTable'>, Source=<class 'ktem.index.file.index.Source'>, VS=<kotaemon.storages.vectorstores.chroma.ChromaVectorStore object at 0x7f435c883130>, get_extra_table=False, llm_scorer=LLMTrulensScoring(concurrent=True, normalize=10, prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7f435812d300>, system_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7f435812cdc0>, top_k=3, user_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7f435812fb50>), mmr=True, rerankers=[CohereReranking(cohere_api_key='', model_name='rerank-multilingual-v2.0')], retrieval_mode='hybrid', top_k=10, user_id=1), GraphRAGRetrieverPipeline(DS=<theflow.base.unset_ object at 0x7f439d0a0a30>, FSPath=<theflow.base.unset_ object at 0x7f439d0a0a30>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset_ object at 0x7f439d0a0a30>, VS=<theflow.base.unset_ object at 0x7f439d0a0a30>, file_ids=[], user_id=<theflow.base.unset_ object at 0x7f439d0a0a30>)]
searching in doc_ids ['4914ae98-3916-4aef-8e95-f7b67971b6f2', '0d4527a2-567b-4971-a52f-46a9ee38dd8a', '3f0fc6c2-c402-445d-9a12-45f15780699a']
retrieval_kwargs: dict_keys(['do_extend', 'scope', 'filters', 'mode', 'mmr_threshold'])
Traceback (most recent call last):
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/queueing.py", line 575, in process_events
    response = await route_utils.call_process_api(
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api
    result = await self.call_function(
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 663, in async_iteration
    return await iterator.__anext__()
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 656, in __anext__
    return await anyio.to_thread.run_sync(
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 639, in run_sync_iterator_async
    return next(iterator)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 801, in gen_wrapper
    response = next(iterator)
  File "/home/user/kotaemon/libs/ktem/ktem/pages/chat/__init__.py", line 804, in chat_fn
    for response in pipeline.stream(chat_input, conversation_id, chat_history):
  File "/home/user/kotaemon/libs/ktem/ktem/reasoning/simple.py", line 660, in stream
    docs, infos = self.retrieve(message, history)
  File "/home/user/kotaemon/libs/ktem/ktem/reasoning/simple.py", line 488, in retrieve
    retriever_docs = retriever_node(text=query)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1097, in __call__
    raise e from None
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1088, in __call__
    output = self.fl.exec(func, args, kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
    return run(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
    raise e from None
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
    _output = self.next_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
    return self.next_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
    return self.run(*args, **kwargs)
  File "/home/user/kotaemon/libs/ktem/ktem/index/file/pipelines.py", line 162, in run
    docs = self.vector_retrieval(text=text, top_k=self.top_k, **retrieval_kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1261, in exec
    return child(*args, **kwargs, __fl_runstates__=__fl_runstates__)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1097, in __call__
    raise e from None
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1088, in __call__
    output = self.fl.exec(func, args, kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
    return run(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
    raise e from None
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
    _output = self.next_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
    return self.next_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
    return self.run(*args, **kwargs)
  File "/home/user/kotaemon/libs/kotaemon/kotaemon/indices/vectorindex.py", line 188, in run
    emb = self.embedding(text)[0].embedding
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1261, in exec
    return child(*args, **kwargs, __fl_runstates__=__fl_runstates__)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1675, in __call__
    return self._create_callable(getattr(self.ff_original_obj, "__call__"))(
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1663, in wrapper
    raise e from None
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1661, in wrapper
    output = callable_obj(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
    raise e from None
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
    _output = self.next_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
    return self.next_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1261, in exec
    return child(*args, **kwargs, __fl_runstates__=__fl_runstates__)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1097, in __call__
    raise e from None
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1088, in __call__
    output = self.fl.exec(func, args, kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
    return run(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
    raise e from None
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
    _output = self.next_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
    return self.next_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
    return self.run(*args, **kwargs)
  File "/home/user/kotaemon/libs/kotaemon/kotaemon/embeddings/base.py", line 10, in run
    return self.invoke(text, *args, **kwargs)
  File "/home/user/kotaemon/libs/kotaemon/kotaemon/embeddings/openai.py", line 104, in invoke
    resp = self.openai_response(client, input=input_, **kwargs).dict()
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/home/user/miniconda3/envs/kotaemon/lib/python3.10/site-packages/tenacity/__init__.py", line 326, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7f43582c9120 state=finished raised InternalServerError>]

lone17 commented 2 weeks ago

After changing the embedding for the File Collection like mentioned in https://github.com/Cinnamon/kotaemon/issues/143#issuecomment-2315620503, have you tried restarting the app ? Sorry I couldn't test your set up now, so I will briefly describe how the app works, maybe it could help you with debugging.

Basically there are 3 main components we need to care about:

the LLM model which is used for generation, scoring, reranking, etc. If you have an openai compatible endpoint, you can simply use the OpenAIChat option and then change the URL and the key to your local ones.
the Embedding model which is used for turning text into embedding vectors. Similarly to the LLM, you can simply drop in your openai compatible endpoint to use it.
then there is the File Collection, which is the thing that processes your documents and stores the embeddings to use for retrieval later. Each Collection uses an Embedding model. This embedding model is configured once when the Collection is created (a default File Collection is created as you start the app for the first time), and shouldn't be changed after such creation (because you don't want to store embeddings produced by different models together as they would have different distributions). But technically you can still change it by following https://github.com/Cinnamon/kotaemon/issues/143#issuecomment-2315620503, but you would need to restart the app for the change to take effect.
there's also the Graph Collection but let's ignore it for now as setting it up locally is rather tricky.

In your fast embedding logs, it looks like that the app still uses the OpenAIEmbeddings connector, so I guess you haven't changed the embedding model of the File Collection or you haven't restart the app. (the app needs to be restarted because these settings are stored in a db, when the app starts it will read from the db to set up the corresponding models. When you make changes to the settings/specifications, it will update the db but doesn't reload the model, hence the need for a restart).

Hope that helps. The team is working on a better doc, I hope it will come out soon. Note I'm not a part of the team so I cannot promise anything on their behalf, but I'm sure they are working hard on it. Thank you for your interest in this project, I hope this hiccup doesn't discourage you from supporting it in the future.

thistleknot commented 2 weeks ago

I got further

I went back to the basics and constructed a minimum viable example of a curl call

curl http://127.0.0.1:5000/v1/embeddings   -H "Content-Type: application/json"   -H "Authorization: Bearer blahblahblah"   -d '{
    "input": "Your text goes here",
    "model": "text-embedding-ada-002"
  }'

this told me I had to change the name of the model to text-embedding-ada-002 (despite what is in settings.yaml, although I think in settings.yaml, you still need to specify the backend model that will be used in place, the port is still 5000 for text-generation-webui)

then I went in the file setting as someone had pointed out and swapped 'openai' with 'local'

so far no errors, but no inference either

thistleknot commented 2 weeks ago

changing settings and asking to restart is a no go with a docker container fyi in fact, restarting the docker container wipes out everything

I can't set the llm for relevant scoring, I think that's what is causing the below error in docker (while everything else seems to work) and stops the response short at just a single word

retrieval step took 8.569526195526123
[2024-09-01 01:55:26,512][INFO] - Downloading https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.ftz to lid.176.ftz (916.0K)
100%|█████████████████████████████████████████████████████████████| 916k/916k [00:00<00:00, 1.48MB/s]
lang en
highlight_text Nau, Au, Ilghami, Kuter, Murdock, Wu, & Yaman true
lang en
highlight_text SHOP2: An HTN Planning System true
lang en
highlight_text Table 3: Resolution Methods for Task Interactions in true
lang en
highlight_text Nau, Au, Ilghami, Kuter, Murdock, Wu, & Yaman true
lang en
highlight_text consider when analysing the planners a resource hierarchy, as shown in Figure 8. Each element of the hierarchy is true
lang en
highlight_text hyper-parameter, a multinomial distribution can be true
lang en
highlight_text 3.1.2 Expressiveness true
lang en
highlight_text SHOP2: An HTN Planning System true
lang en
highlight_text Nau, Au, Ilghami, Kuter, Murdock, Wu, & Yaman true
lang en
highlight_text •Variables can be allowed or not in the planning problem. true
Got 10 retrieved documents
len (original) 29626
len (trimmed) 29626
Got 3 images
Trying LLM streaming
CitationPipeline: invoking LLM
Exception in thread Thread-13 (generate_relevant_scores):
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/app/libs/ktem/ktem/reasoning/simple.py", line 668, in generate_relevant_scores
    docs = self.retrievers[0].generate_relevant_scores(message, docs)
  File "/app/libs/ktem/ktem/index/file/pipelines.py", line 206, in generate_relevant_scores
    else self.llm_scorer(documents=documents, query=query)
  File "/usr/local/lib/python3.10/site-packages/theflow/base.py", line 1097, in __call__
    raise e from None
  File "/usr/local/lib/python3.10/site-packages/theflow/base.py", line 1088, in __call__
    output = self.fl.exec(func, args, kwargs)
  File "/usr/local/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
    return run(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
    raise e from None
  File "/usr/local/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
    _output = self.next_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
    return self.next_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
    return self.run(*args, **kwargs)
  File "/app/libs/kotaemon/kotaemon/indices/rankings/llm_trulens.py", line 150, in run
    results = [future.result() for future in futures]
  File "/app/libs/kotaemon/kotaemon/indices/rankings/llm_trulens.py", line 150, in <listcomp>
    results = [future.result() for future in futures]
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/app/libs/kotaemon/kotaemon/indices/rankings/llm_trulens.py", line 146, in llm_call
    return self.llm(messages).text
  File "/usr/local/lib/python3.10/site-packages/theflow/base.py", line 1097, in __call__
    raise e from None
  File "/usr/local/lib/python3.10/site-packages/theflow/base.py", line 1088, in __call__
    output = self.fl.exec(func, args, kwargs)
  File "/usr/local/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
    return run(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/theflow/middleware.py", line 144, in __call__
    raise e from None
  File "/usr/local/lib/python3.10/site-packages/theflow/middleware.py", line 141, in __call__
    _output = self.next_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/theflow/middleware.py", line 117, in __call__
    return self.next_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
    return self.run(*args, **kwargs)
  File "/app/libs/kotaemon/kotaemon/llms/base.py", line 25, in run
    return self.invoke(*args, **kwargs)
  File "/app/libs/kotaemon/kotaemon/llms/chats/openai.py", line 204, in invoke
    resp = self.openai_response(
  File "/app/libs/kotaemon/kotaemon/llms/chats/openai.py", line 313, in openai_response
    return client.chat.completions.create(**params)
  File "/usr/local/lib/python3.10/site-packages/openai/_utils/_utils.py", line 274, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 668, in create
    return self._post(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1260, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 937, in request
    return self._request(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1041, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: openai_key. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}
^[[DUser-id: 1, can see public conversations: True

^CKeyboard interruption in main thread... closing server.
^CTraceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 2690, in block_thread
    time.sleep(0.1)
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/app.py", line 17, in <module>
    demo.queue().launch(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 2595, in launch
    self.block_thread()
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 2694, in block_thread
    self.server.close()
  File "/usr/local/lib/python3.10/site-packages/gradio/http_server.py", line 68, in close
    self.thread.join(timeout=5)
  File "/usr/local/lib/python3.10/threading.py", line 1100, in join
    self._wait_for_tstate_lock(timeout=max(timeout, 0))
  File "/usr/local/lib/python3.10/threading.py", line 1116, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
KeyboardInterrupt
^CException ignored in atexit callback: <function exit_cacert_ctx at 0x7fab042de290>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/certifi/core.py", line 10, in exit_cacert_ctx
Exception ignored in atexit callback: <function _exit_function at 0x7f8454b14310>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/multiprocessing/util.py", line 320, in _exit_function
    def exit_cacert_ctx() -> None:
KeyboardInterrupt:
    def _exit_function(info=info, debug=debug, _run_finalizers=_run_finalizers,

taprosoft commented 2 weeks ago

Hi @thistleknot, you can persist app state by mounting /app/ktem_app_data folder outside. Something like:

docker run \
-e GRADIO_SERVER_NAME=0.0.0.0 \
-e GRADIO_SERVER_PORT=7860 \
-v ./ktem_app_data:/app/ktem_app_data \
-p 7860:7860 -it \
taprosoft/kotaemon-demo:v1.0

All of your data and app state will be stored in the path ./ktem_app_data so it persists across runs.

For your LLM relevant score problem please check our Local model setup guide.

thistleknot commented 2 weeks ago

docker run -e GRADIO_SERVER_NAME=0.0.0.0 -e GRADIO_SERVER_PORT=7860 -v /data/ktem_app_data:/app/ktem_app_data -p 7860:7860 -it --rm taprosoft/kotaemon:v1.0

Reasoning class <class 'ktem.reasoning.simple.FullQAPipeline'>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 663, in async_iteration
    return await iterator.__anext__()
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 656, in __anext__
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 639, in run_sync_iterator_async
    return next(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 801, in gen_wrapper
    response = next(iterator)
  File "/app/libs/ktem/ktem/pages/chat/__init__.py", line 785, in chat_fn
    pipeline, reasoning_state = self.create_pipeline(
  File "/app/libs/ktem/ktem/pages/chat/__init__.py", line 752, in create_pipeline
    iretrievers = index.get_retriever_pipelines(
  File "/app/libs/ktem/ktem/index/file/index.py", line 423, in get_retriever_pipelines
    obj = cls.get_pipeline(stripped_settings, self.config, selected_ids)
  File "/app/libs/ktem/ktem/index/file/pipelines.py", line 281, in get_pipeline
    embedding=embedding_models_manager[
  File "/app/libs/ktem/ktem/embeddings/manager.py", line 65, in __getitem__
    return self._models[key]
KeyError: 'local'

I think having a video of starting from scratch would help, but that's just me. That way there is no way a step was missed.

edit: nm, the error was due to a malformed gguf

however, after seeing my gpu being used... eventually no output, just 'thinking'

thistleknot commented 2 weeks ago

I got a little bit further

but it stops after one word.

thistleknot commented 2 weeks ago

docker run --name kotaemon -e GRADIO_SERVER_NAME=0.0.0.0 -e GRADIO_SERVER_PORT=7860 -v /data/ktem_app_data:/app/ktem_app_data -p 7860:7860 -it taprosoft/kotaemon:v1.0

persists the settings, but not the document store

On Sun, Sep 1, 2024 at 2:02 AM Tuan Anh Nguyen Dang (Tadashi_Cin) < @.***> wrote:

Hi @thistleknot https://github.com/thistleknot, you can persist app state by mounting /app/ktem_app/data folder outside. Something like:

docker run \ -e GRADIO_SERVER_NAME=0.0.0.0 \ -e GRADIO_SERVER_PORT=7860 \ -v ./ktem_app_data:/app/ktem_app_data \ -p 7860:7860 -it \ taprosoft/kotaemon-demo:v1.0

All of your data and app state will be stored in the path ./ktem_app_data so it persists across runs.

For your LLM relevant score problem please check our Local model setup guide https://github.com/Cinnamon/kotaemon/blob/main/docs/local_model.md .

— Reply to this email directly, view it on GitHub https://github.com/Cinnamon/kotaemon/issues/150#issuecomment-2323242283, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHKKOV5RJTEPO2JUWWC6D3ZULJZPAVCNFSM6AAAAABNKWGKQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRTGI2DEMRYGM . You are receiving this because you were mentioned.Message ID: @.***>

thistleknot commented 1 day ago

I'm going to close this

I did eventually get kotaemon working =D

thank you

I think I had my embedding model spelled incorrectly

for others

    docker run \
    -e GRADIO_SERVER_NAME=0.0.0.0 \
    -e GRADIO_SERVER_PORT=7860 \
    -p 7860:7860 -it --rm \
    -v /data/ktem_app_data:/app/ktem_app_data \
    -it ghcr.io/cinnamon/kotaemon:main-full

how I started my server.py

python server.py --api --listen --n-gpu-layers 32 --threads 8 --numa --tensorcores --trust-remote-code

(textgen) [root@pve-m7330 text-generation-webui]# head settings.yaml -n 20 openai-embedding_device: cuda openai-embedding_model: "sentence-transformers/all-MiniLM-L6-v2" openai-sd_webui_url: http://192.168.3.17:7861 openai-debug: 1

Cinnamon / kotaemon