Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.54k stars 595 forks source link

enhancement: make tempfiles windows friendly #3108

Closed Blaxzter closed 3 weeks ago

Blaxzter commented 1 month ago

Summary

Updates handling of tempfiles so that they work on Windows systems.

MthwRobinson commented 4 weeks ago

@Blaxzter - Are you still working this PR? Saw @cragwolfe gave some feedback that's still pending.

Blaxzter commented 3 weeks ago

Added the file exist check @MthwRobinson

MthwRobinson commented 3 weeks ago

Clone for running CI is #3221

MthwRobinson commented 3 weeks ago

@Blaxzter - Could you add a CHANGELOG entry for this?

Blaxzter commented 3 weeks ago

So should I just add to: 0.14.7-dev1 -> Enhancements Or a new version (which I would not know how to increase the version). Otherwise, I would probably write:

Make the creation of a temp file in unstructured/partition/pdf_image/ocr.py windows compatible.

MthwRobinson commented 3 weeks ago

Can you run make tidy? There was a linting failure in this job.

Run source .venv/bin/activate
ruff check .
unstructured/partition/pdf_image/ocr.py:72:1: W293 [*] Blank line contains whitespace
unstructured/partition/pdf_image/ocr.py:87:1: W293 [*] Blank line contains whitespace
Found 2 errors.
[*] 2 fixable with the `--fix` option.
MthwRobinson commented 3 weeks ago

So should I just add to: 0.14.7-dev1 -> Enhancements Or a new version (which I would not know how to increase the version). Otherwise, I would probably write:

Make the creation of a temp file in unstructured/partition/pdf_image/ocr.py windows compatible.

Yeah that entry looks good to me. For the version, you can just increment the number at the end of -dev and update the version in unstructured/__version__.py to match

Blaxzter commented 3 weeks ago

As i have not checked the branch out locally i removed it by hand. Lets hope that worked.

MthwRobinson commented 3 weeks ago

Thanks @Blaxzter ! Updated the clone on https://github.com/Unstructured-IO/unstructured/pull/3221 an CI is running now

MthwRobinson commented 3 weeks ago

@Blaxzter - Could you add a description to this PR explaining why the change you made enable tempfiles to work on Windows? Otherwise this looks good and I'll approve once CI passes.