emcf / thepipe

Extract markdown and images from URLs, PDFs, docs, slides, and more, ready for multimodal LLMs. ⚡
https://thepi.pe
MIT License
814 stars 61 forks source link

Running "Locally" #20

Open skyler14 opened 2 months ago

skyler14 commented 2 months ago

Multiple Questions: What are the resources recommend/required for local extraction?

When running locally can you provide us the option to expose a port and receive POST requests? That way we can have an on prem machine that can work interchangeably with your API for client machines.

emcf commented 2 months ago

Hi @skyler14, I have not yet tested minimum requirements for local extraction. If it helps -- I got it working on a 4 GB machine with no GPU. A GPU is highly recommended for video and audio files.

As for setting up a local REST API to send POST requests to, this is not on the roadmap at the moment -- you can do this yourself with Python API frameworks such as Flask or FastAPI

skyler14 commented 1 month ago

These are some of the errors I get

======================================================================
ERROR: test_extract_api (test_thepipe.test_thepipe.test_extract_api)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/dylan/Documents/thepipe/tests/test_thepipe.py", line 200, in test_extract_api
    chunks = extractor.extract_from_source(source=self.files_directory+"/example.md", local=False)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dylan/anaconda3/envs/thepipe/lib/python3.11/site-packages/thepipe_api/extractor.py", line 58, in extract_from_source
    return extract_from_file(file_path=source, source_type=source_type, verbose=verbose, ai_extraction=ai_extraction, text_only=text_only, local=local)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dylan/anaconda3/envs/thepipe/lib/python3.11/site-packages/thepipe_api/extractor.py", line 76, in extract_from_file
    raise ValueError(f"{response['error']}")
ValueError: No valid API key given. Visit https://thepi.pe/docs to learn more.

======================================================================
FAIL: test_compress_spreadsheet (test_thepipe.test_thepipe.test_compress_spreadsheet)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/dylan/Documents/thepipe/tests/test_thepipe.py", line 156, in test_compress_spreadsheet
    self.assertLess(len(all_new_text), len(all_text))
AssertionError: 194 not less than 194

======================================================================
FAIL: test_compress_with_ctags (test_thepipe.test_thepipe.test_compress_with_ctags)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/dylan/Documents/thepipe/tests/test_thepipe.py", line 173, in test_compress_with_ctags
    self.assertLess(len(new_chunks[0].text), len(chunks[0].text))
AssertionError: 90 not less than 90

----------------------------------------------------------------------
Ran 18 tests in 37.331s

FAILED (failures=2, errors=1, skipped=2)