emcf / thepipe

Extract markdown and images from URLs, PDFs, docs, slides, and more, ready for multimodal LLMs. ⚡
https://thepi.pe
MIT License
814 stars 61 forks source link

Directory extraction fails if one file or any files fail #25

Closed SpencerRightsma closed 2 weeks ago

SpencerRightsma commented 2 weeks ago

Directory extraction fails if one file or any files fail, it should output what it has been able to extract.

Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "C:\Python312\Scripts\thepipe.exe__main.py", line 7, in File "C:\Users\Spenc\AppData\Roaming\Python\Python312\site-packages\thepipe_api\thepipe.py", line 60, in main chunks = extractor.extract_from_source(source=args.source, match=args.match, ignore=args.ignore, limit=args.limit, verbose=args.verbose, ai_extraction=args.ai_extraction, text_only=args.text_only, local=args.local) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Spenc\AppData\Roaming\Python\Python312\site-packages\thepipe_api\extractor.py", line 53, in extract_from_source return extract_github(github_url=source, file_path='', match=match, ignore=ignore, text_only=text_only, verbose=verbose, ai_extraction=ai_extraction, branch='master') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Spenc\AppData\Roaming\Python\Python312\site-packages\thepipe_api\extractor.py", line 395, in extract_github files_contents = extract_from_directory(dir_path=temp_dir, match=match, ignore=ignore, verbose=verbose, ai_extraction=ai_extraction, text_only=text_only) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Spenc\AppData\Roaming\Python\Python312\site-packages\thepipe_api\extractor.py", line 195, in extract_from_directory for result in results: File "C:\Python312\Lib\concurrent\futures_base.py", line 619, in result_iterator yield _result_or_cancel(fs.pop()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Python312\Lib\concurrent\futures_base.py", line 317, in _result_or_cancel return fut.result(timeout) ^^^^^^^^^^^^^^^^^^^ File "C:\Python312\Lib\concurrent\futures_base.py", line 456, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "C:\Python312\Lib\concurrent\futures_base.py", line 401, in __get_result raise self._exception File "C:\Python312\Lib\concurrent\futures\thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Spenc\AppData\Roaming\Python\Python312\site-packages\thepipe_api\extractor.py", line 194, in results = executor.map(lambda file_path: extract_from_source(source=file_path, match=match, ignore=ignore, verbose=verbose, ai_extraction=ai_extraction, text_only=text_only, limit=limit, local=local), file_paths) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Spenc\AppData\Roaming\Python\Python312\site-packages\thepipe_api\extractor.py", line 46, in extract_fromsource raise ValueError(f"Could not detect source type for {source}.") ValueError: Could not detect source type for C:\Users\Spenc\AppData\Local\Temp\tmptlxpz0r\Changelog.rst.

C:\Users\Spenc>

emcf commented 2 weeks ago

This should now be fixed in this commit