Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
9.25k stars 767 forks source link

Fix bug - Auto partition fails on text files which are empty or contain only whitespaces #3675

Closed tc360950 closed 1 month ago

tc360950 commented 1 month ago

This is a fix for this bug.

Inference of .txt file type fails if the file has only whitespaces.

To Reproduce:

from tempfile import NamedTemporaryFile

from unstructured.partition.auto import partition

with NamedTemporaryFile(mode="w", suffix=".txt") as f:
    f.write("   \n")
    f.seek(0)
    elements = partition(filename=f.name)
cragwolfe commented 1 month ago

CI passed in https://github.com/Unstructured-IO/unstructured/pull/3676 (needed for extra creds), so merged.

cragwolfe commented 1 month ago

thanks for the contribution, @tc360950!