Closed MthwRobinson closed 10 months ago
@MthwRobinson
I can confirm that python-magic-bin
must be installed on Windows. However, it must be noted that the tests do not pass using it. Notably:
FAILED test_unstructured/partition/test_auto.py::test_auto_partition_email_from_file - ValueError: Invalid file. File type not support in partition.
FAILED test_unstructured/partition/test_auto.py::test_auto_partition_html_from_file - ValueError: Invalid file. File type not support in partition.
FAILED test_unstructured/partition/test_auto.py::test_auto_partition_text_from_file - ValueError: Invalid file. File type not support in partition.
FAILED test_unstructured/staging/test_base_staging.py::test_convert_to_isd_serializes_with_posix_paths - NotImplementedError: cannot instantiate 'PosixPath' on your system
(Note: Not an exhaustive list of test failures)
For the first three failures, the fake-text.txt
, fake-html.html
and the fake-email.eml
all get detected as application/octet-stream
mime type by libmagic, after which unstructured tries to check if it might be a docx, xlsx or pptx. After failing, it assigns the unknown filetype.
Lastly, the posix path simply can't be created on Windows.
I'll open a PR for the last issue.
Currently windows users have difficulty with file detection because windows needs to install
python-magic-bin
instead ofpython-magic
. The goal of this issue is to see if we can installpython-magic-bin
instead ofpython-magic
if the user's OS is Windows.See this comment for details.
References: