Storia-AI / sage

Chat with any codebase in under two minutes | Fully local or via third-party APIs
https://sage.storia.ai
Apache License 2.0
1.09k stars 91 forks source link

KeyError: 'filename' #27

Closed marksher closed 2 months ago

marksher commented 2 months ago

Worried I'm missing a step or a configuration. Any obvious ideas?

index Storia-AI/repo2vec \
    --embedder-type=marqo \
    --vector-store-type=marqo \
    --index-name=test

chat Storia-AI/repo2vec \
    --vector-store-type=marqo \
    --index-name=test \
    --llm-provider=ollama \
    --llm-model=llama3.1

I'm able to get everything running, but get an error in the web UI and the meat of the error is:

File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/repo2vec/vector_store.py", line 90, in patched_method
    documents.append(Document(page_content=res["text"], metadata={"filename": res["filename"]}))
                                                                              ~~~^^^^^^^^^^^^
KeyError: 'filename'

Here's the whole trace:

(venv) ➜  repo2vec_test git:(main) index Storia-AI/repo2vec \
    --embedder-type=marqo \
    --vector-store-type=marqo \
    --index-name=mark

chat Storia-AI/repo2vec \
    --vector-store-type=marqo \
    --index-name=mark \
    --llm-provider=ollama \
    --llm-model=llama3.1
WARNING:root:Marqo enforces a limit of 64 chunks per batch. Setting --chunks_per_batch to 64.
INFO:root:Cloning the repository...
INFO:root:Embedding the repo...
INFO:root:Successfully embedded 35 chunks.
INFO:root:Done!
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
IMPORTANT: You are using gradio version 4.26.0, however version 4.29.0 is available, please upgrade.
--------
Traceback (most recent call last):
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/gradio/queueing.py", line 527, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/gradio/route_utils.py", line 261, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/gradio/blocks.py", line 1786, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/gradio/blocks.py", line 1336, in call_function
    prediction = await fn(*processed_input)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/gradio/utils.py", line 726, in async_wrapper
    response = await f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/gradio/chat_interface.py", line 507, in _submit_fn
    response = await anyio.to_thread.run_sync(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/repo2vec/chat.py", line 109, in _predict
    response = rag_chain.invoke({"input": message, "chat_history": history_langchain_format})
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 5092, in invoke
    return self.bound.invoke(
           ^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 2876, in invoke
    input = context.run(step.invoke, input, config, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/langchain_core/runnables/passthrough.py", line 495, in invoke
    return self._call_with_config(self._invoke, input, config, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 1785, in _call_with_config
    context.run(
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/langchain_core/runnables/config.py", line 398, in call_func_with_variable_args
    return func(input, **kwargs)  # type: ignore[call-arg]
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/langchain_core/runnables/passthrough.py", line 482, in _invoke
    **self.mapper.invoke(
      ^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 3579, in invoke
    output = {key: future.result() for key, future in zip(steps, futures)}
                   ^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 3563, in _invoke_step
    return context.run(
           ^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 5092, in invoke
    return self.bound.invoke(
           ^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/langchain_core/runnables/branch.py", line 239, in invoke
    output = self.default.invoke(
             ^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 2878, in invoke
    input = context.run(step.invoke, input, config)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/langchain_core/retrievers.py", line 253, in invoke
    raise e
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/langchain_core/retrievers.py", line 246, in invoke
    result = self._get_relevant_documents(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/langchain_core/vectorstores/base.py", line 1042, in _get_relevant_documents
    docs = self.vectorstore.similarity_search(query, **self.search_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/langchain_community/vectorstores/marqo.py", line 167, in similarity_search
    documents = self._construct_documents_from_results_without_score(results)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/working/repo2vec_test/venv/lib/python3.12/site-packages/repo2vec/vector_store.py", line 90, in patched_method
    documents.append(Document(page_content=res["text"], metadata={"filename": res["filename"]}))
                                                                              ~~~^^^^^^^^^^^^
KeyError: 'filename'
iuliaturc commented 2 months ago

Hi @marksher, thanks for filing this issue. Would you mind checking what version you're using? The latest one (0.1.5) shouldn't have this problem.