aryn-ai / sycamore

🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
https://sycamore.readthedocs.io
Apache License 2.0
364 stars 43 forks source link

ModuleNotFoundError: No module named 'resource' #960

Open thepowerfulwoz opened 6 days ago

thepowerfulwoz commented 6 days ago

Describe the bug Sycamore appears to not work on Windows due to the fact that it uses the resource module. I am attempting to run locally. Are there any workarounds?

To Reproduce Steps to reproduce the behavior:

  1. Try to import sycamore

Expected behavior Sycamore loads in Windows

Desktop (please complete the following information):

Additional context

Traceback (most recent call last):
  File "C:\Users\User\PycharmProjects\sycamore_explore\explore.py", line 1, in <module>
    import sycamore
  File "C:\Users\User\AppData\Local\miniforge3\envs\sycamore_explore\lib\site-packages\sycamore\__init__.py", line 2, in <module>
    from sycamore.docset import DocSet
  File "C:\Users\User\AppData\Local\miniforge3\envs\sycamore_explore\lib\site-packages\sycamore\docset.py", line 11, in <module>
    from sycamore.functions.tokenizer import Tokenizer
  File "C:\Users\User\AppData\Local\miniforge3\envs\sycamore_explore\lib\site-packages\sycamore\functions\__init__.py", line 3, in <module>
    from sycamore.functions.document import split_and_convert_to_image, DrawBoxes
  File "C:\Users\User\AppData\Local\miniforge3\envs\sycamore_explore\lib\site-packages\sycamore\functions\document.py", line 11, in <module>
    from sycamore.utils.time_trace import timetrace
  File "C:\Users\User\AppData\Local\miniforge3\envs\sycamore_explore\lib\site-packages\sycamore\utils\time_trace.py", line 4, in <module>
    import resource
ModuleNotFoundError: No module named 'resource'
HenryL27 commented 6 days ago

We don't quite support windows at the moment... Can I suggest using WSL instead?

@alexaryn any way we can disable timetrace / remove the unconditional resource import in windows-land?

alexaryn commented 6 days ago

This can be done. I'll try to get it prioritized.

alexaryn commented 6 days ago

I don't actually have a Windows box to test on, but #962 may fix this. @thepowerfulwoz can you try it out?

thepowerfulwoz commented 5 days ago

I don't actually have a Windows box to test on, but #962 may fix this. @thepowerfulwoz can you try it out?

Yeah, I'll give it a shot

thepowerfulwoz commented 5 days ago

@alexaryn When trying to pip install the feature branch with pip install git+https://github.com/aryn-ai/sycamore.git@alex_tt_windows, I get:

 fatal: clone of 'git@github.com:aryn-ai/opensearch-remote-processor.git' into submodule path 'C:/Users/User/AppData/Local/Temp/pip-req-build-e0o7t59b/lib/remote-processors/opensearch-remote-processor' failed
  Failed to clone 'lib/remote-processors/opensearch-remote-processor'. Retry scheduled
  git@github.com: Permission denied (publickey).

I don't think this is an issue on my end, but I could be wrong.

HenryL27 commented 5 days ago

I think you'll need to do pip install git+https://github.com/aryn-ai/sycamore.git@alex_tt_windows#subdirectory=lib/sycamore

thepowerfulwoz commented 5 days ago

No dice there, but changing from ssh to https in my git config fixed it.

thepowerfulwoz commented 5 days ago

Update: #962 fixes this issue, but the same issue exists for the pwd import here:

Traceback (most recent call last):
  File "C:\Users\User\PycharmProjects\sycamore_explore\explore.py", line 1, in <module>
    from sycamore.transforms.detr_partitioner import ArynPDFPartitioner
  File "C:\Users\User\AppData\Local\miniforge3\envs\sycamore_explore\lib\site-packages\sycamore\__init__.py", line 2, in <module>
    from sycamore.docset import DocSet
  File "C:\Users\User\AppData\Local\miniforge3\envs\sycamore_explore\lib\site-packages\sycamore\docset.py", line 18, in <module>
    from sycamore.transforms.augment_text import TextAugmentor
  File "C:\Users\User\AppData\Local\miniforge3\envs\sycamore_explore\lib\site-packages\sycamore\transforms\__init__.py", line 7, in <module>
    from sycamore.transforms.partition import Partition, Partitioner
  File "C:\Users\User\AppData\Local\miniforge3\envs\sycamore_explore\lib\site-packages\sycamore\transforms\partition.py", line 21, in <module>
    from sycamore.transforms.detr_partitioner import (
  File "C:\Users\User\AppData\Local\miniforge3\envs\sycamore_explore\lib\site-packages\sycamore\transforms\detr_partitioner.py", line 10, in <module>
    import pwd
ModuleNotFoundError: No module named 'pwd'
alexaryn commented 5 days ago

I updated #962 to avoid using pwd. Luckily it was only in one place. @thepowerfulwoz please try again. Thanks.

thepowerfulwoz commented 4 days ago

Module loads now. I have found another windows bug in regards to Temp file creation on windows:

Traceback (most recent call last):
  File "C:\Users\User\PycharmProjects\sycamore_explore\explore.py", line 9, in <module>
    b = a.partition_pdf(file, extract_table_structure=True, use_partitioning_service=False, extract_images=True)
  File "C:\Users\User\AppData\Local\miniforge3\envs\sycamore_explore\lib\site-packages\sycamore\transforms\detr_partitioner.py", line 184, in partition_pdf
    temp = self._partition_pdf_batched(
  File "C:\Users\User\AppData\Local\miniforge3\envs\sycamore_explore\lib\site-packages\sycamore\transforms\detr_partitioner.py", line 388, in _partition_pdf_batched
    file_hash = Cache.get_hash_context_file(pdffile.name)
  File "C:\Users\User\AppData\Local\miniforge3\envs\sycamore_explore\lib\site-packages\sycamore\utils\cache.py", line 72, in get_hash_context_file
    with open(file_path, "rb") as file:
PermissionError: [Errno 13] Permission denied: 'C:\\Users\\User\\AppData\\Local\\Temp\\detr-pdf-input-uom50y59'

The cause of which I believe is explained is this stack overflow answer. I can also open a separate issue for this if that is preferred.

alexaryn commented 4 days ago

We can keep the Windows stuff all in one issue for now.

I made another update to work around the Windows behavior of temporary files. @thepowerfulwoz please iterate again.