NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.41k stars 14.36k forks source link

open-webui missing additional dependencies #361550

Open mentalblock opened 1 day ago

mentalblock commented 1 day ago

Describe the bug

Attempting to upload an epub document for RAG causes an error about the following Python modules not being available:

This issue is similar to https://github.com/NixOS/nixpkgs/issues/361295. This issue was discovered after the aforementioned issue was resolved. It turns out the dependencies listed above were also needed.

Logs

Dec 03 14:18:28 shonuff open-webui[83563]: INFO  [open_webui.apps.webui.routers.files] file.content_type: application/epub+zip
Dec 03 14:18:32 shonuff open-webui[83563]: ERROR [open_webui.apps.retrieval.main] No module named 'iso639'
Dec 03 14:18:32 shonuff open-webui[83563]: Traceback (most recent call last):
Dec 03 14:18:32 shonuff open-webui[83563]:   File "/nix/store/m9kv04nnad2mr3xg48ykpdskq4jdqafi-open-webui-0.4.7/lib/python3.12/site-packages/open_webui/apps/retrieval/main.py", line 960, in process_file
Dec 03 14:18:32 shonuff open-webui[83563]:     docs = loader.load(
Dec 03 14:18:32 shonuff open-webui[83563]:            ^^^^^^^^^^^^
Dec 03 14:18:32 shonuff open-webui[83563]:   File "/nix/store/m9kv04nnad2mr3xg48ykpdskq4jdqafi-open-webui-0.4.7/lib/python3.12/site-packages/open_webui/apps/retrieval/loaders/main.py", line 125, in load
Dec 03 14:18:32 shonuff open-webui[83563]:     docs = loader.load()
Dec 03 14:18:32 shonuff open-webui[83563]:            ^^^^^^^^^^^^^
Dec 03 14:18:32 shonuff open-webui[83563]:   File "/nix/store/0i914d1rx3vz29w62hxnhjsdyanvfdi7-python3.12-langchain-core-0.3.15/lib/python3.12/site-packages/langchain_core/document_loaders/base.py", li>
Dec 03 14:18:32 shonuff open-webui[83563]:     return list(self.lazy_load())
Dec 03 14:18:32 shonuff open-webui[83563]:            ^^^^^^^^^^^^^^^^^^^^^^
Dec 03 14:18:32 shonuff open-webui[83563]:   File "/nix/store/0c3n3gqnannd6hsgdzdn7n5fiyhvlyjb-python3.12-langchain-community-0.3.6/lib/python3.12/site-packages/langchain_community/document_loaders/uns>
Dec 03 14:18:32 shonuff open-webui[83563]:     elements = self._get_elements()
Dec 03 14:18:32 shonuff open-webui[83563]:                ^^^^^^^^^^^^^^^^^^^^
Dec 03 14:18:32 shonuff open-webui[83563]:   File "/nix/store/0c3n3gqnannd6hsgdzdn7n5fiyhvlyjb-python3.12-langchain-community-0.3.6/lib/python3.12/site-packages/langchain_community/document_loaders/epu>
Dec 03 14:18:32 shonuff open-webui[83563]:     from unstructured.partition.epub import partition_epub
Dec 03 14:18:32 shonuff open-webui[83563]:   File "/nix/store/8mnjxgnisazmnrai3zy6lhpq6yvq0c7w-python3.12-unstructured-0.16.8/lib/python3.12/site-packages/unstructured/partition/epub.py", line 9, in <m>
Dec 03 14:18:32 shonuff open-webui[83563]:     from unstructured.partition.common.metadata import get_last_modified_date
Dec 03 14:18:32 shonuff open-webui[83563]:   File "/nix/store/8mnjxgnisazmnrai3zy6lhpq6yvq0c7w-python3.12-unstructured-0.16.8/lib/python3.12/site-packages/unstructured/partition/common/metadata.py", li>
Dec 03 14:18:32 shonuff open-webui[83563]:     from unstructured.partition.common.lang import apply_lang_metadata
Dec 03 14:18:32 shonuff open-webui[83563]:   File "/nix/store/8mnjxgnisazmnrai3zy6lhpq6yvq0c7w-python3.12-unstructured-0.16.8/lib/python3.12/site-packages/unstructured/partition/common/lang.py", line 6>
Dec 03 14:18:32 shonuff open-webui[83563]:     import iso639  # pyright: ignore[reportMissingTypeStubs]
Dec 03 14:18:32 shonuff open-webui[83563]:     ^^^^^^^^^^^^^
Dec 03 14:18:32 shonuff open-webui[83563]: ModuleNotFoundError: No module named 'iso639'
Dec 03 14:39:09 shonuff open-webui[92141]: ERROR [open_webui.apps.retrieval.main] No module named 'langdetect'
Dec 03 14:39:09 shonuff open-webui[92141]: Traceback (most recent call last):
Dec 03 14:39:09 shonuff open-webui[92141]:   File "/nix/store/vljlzmyyhghb4ng5zzl698zj98gf603d-open-webui-0.4.7/lib/python3.12/site-packages/open_webui/apps/retrieval/main.py", line 960, in process_file
Dec 03 14:39:09 shonuff open-webui[92141]:     docs = loader.load(
Dec 03 14:39:09 shonuff open-webui[92141]:            ^^^^^^^^^^^^
Dec 03 14:39:09 shonuff open-webui[92141]:   File "/nix/store/vljlzmyyhghb4ng5zzl698zj98gf603d-open-webui-0.4.7/lib/python3.12/site-packages/open_webui/apps/retrieval/loaders/main.py", line 125, in load
Dec 03 14:39:09 shonuff open-webui[92141]:     docs = loader.load()
Dec 03 14:39:09 shonuff open-webui[92141]:            ^^^^^^^^^^^^^
Dec 03 14:39:09 shonuff open-webui[92141]:   File "/nix/store/0i914d1rx3vz29w62hxnhjsdyanvfdi7-python3.12-langchain-core-0.3.15/lib/python3.12/site-packages/langchain_core/document_loaders/base.py", li>
Dec 03 14:39:09 shonuff open-webui[92141]:     return list(self.lazy_load())
Dec 03 14:39:09 shonuff open-webui[92141]:            ^^^^^^^^^^^^^^^^^^^^^^
Dec 03 14:39:09 shonuff open-webui[92141]:   File "/nix/store/0c3n3gqnannd6hsgdzdn7n5fiyhvlyjb-python3.12-langchain-community-0.3.6/lib/python3.12/site-packages/langchain_community/document_loaders/uns>
Dec 03 14:39:09 shonuff open-webui[92141]:     elements = self._get_elements()
Dec 03 14:39:09 shonuff open-webui[92141]:                ^^^^^^^^^^^^^^^^^^^^
Dec 03 14:39:09 shonuff open-webui[92141]:   File "/nix/store/0c3n3gqnannd6hsgdzdn7n5fiyhvlyjb-python3.12-langchain-community-0.3.6/lib/python3.12/site-packages/langchain_community/document_loaders/epu>
Dec 03 14:39:09 shonuff open-webui[92141]:     from unstructured.partition.epub import partition_epub
Dec 03 14:39:09 shonuff open-webui[92141]:   File "/nix/store/8mnjxgnisazmnrai3zy6lhpq6yvq0c7w-python3.12-unstructured-0.16.8/lib/python3.12/site-packages/unstructured/partition/epub.py", line 9, in <m>
Dec 03 14:39:09 shonuff open-webui[92141]:     from unstructured.partition.common.metadata import get_last_modified_date
Dec 03 14:39:09 shonuff open-webui[92141]:   File "/nix/store/8mnjxgnisazmnrai3zy6lhpq6yvq0c7w-python3.12-unstructured-0.16.8/lib/python3.12/site-packages/unstructured/partition/common/metadata.py", li>
Dec 03 14:39:09 shonuff open-webui[92141]:     from unstructured.partition.common.lang import apply_lang_metadata
Dec 03 14:39:09 shonuff open-webui[92141]:   File "/nix/store/8mnjxgnisazmnrai3zy6lhpq6yvq0c7w-python3.12-unstructured-0.16.8/lib/python3.12/site-packages/unstructured/partition/common/lang.py", line 7>
Dec 03 14:39:09 shonuff open-webui[92141]:     from langdetect import (  # pyright: ignore[reportMissingTypeStubs]
Dec 03 14:39:09 shonuff open-webui[92141]: ModuleNotFoundError: No module named 'langdetect'
Dec 03 14:39:09 shonuff open-webui[92141]: ERROR [open_webui.apps.webui.routers.files] 400: No module named 'langdetect'

Steps To Reproduce

Steps to reproduce the behavior:

Steps to reproduce the behavior:

  1. When starting a new chat, hit the + icon and upload an epub.

or

  1. Create a new knowledge base in the Workspace and upload an epub.

An error pops up in the UI and checking the logs show errors like the above

Expected behavior

Screenshots

Additional context

Metadata

Notify maintainers

@shivaraj-bh @drupol


Note for maintainers: Please tag this issue in your PR.


Add a :+1: reaction to issues you find important.

mentalblock commented 1 day ago

I have to revert https://github.com/NixOS/nixpkgs/pull/361572. After investigating more closely, it seems like these dependencies actually need to be configured for the unstructured package (https://github.com/NixOS/nixpkgs/blob/55d15ad12a74eb7d4646254e13638ad0c4128776/pkgs/development/python-modules/unstructured/default.nix#L1).

Here is the dependency reference: https://github.com/Unstructured-IO/unstructured/blob/0fb814db6188813df61fd3aeb1905a9dae21771c/requirements/base.txt#L97 https://github.com/Unstructured-IO/unstructured/blob/0fb814db6188813df61fd3aeb1905a9dae21771c/requirements/base.txt#L63

@happysalada

happysalada commented 1 day ago

Ive had it on my list to fix the dependencies for some time. I made a PR some time ago that was sloppy and never got around to cleaning it up. If you have time to make a PR id be happy to review it. Otherwise ill try to fit it this weekend

drupol commented 21 hours ago

I won't be able to take care of it this time, feel free to to ahead. It would also be nice to report the issue upstream too. Thanks in advance!