Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.44k stars 580 forks source link

feat/less strict Python version #3193

Closed egeres closed 2 days ago

egeres commented 3 weeks ago

Is your feature request related to a problem? Please describe. Unstructured is using >=3.10,<3.13 (link), I think in the long term it would be better to use >=3.10,<4.0. This also matches with Langchain's Python version, which is currently set to >=3.10,<4.0 (link), so when using unstructured for document loading within Langchain both packages are easier to set up

Describe the solution you'd like Switch the python_requires=">=3.9.0,<3.13" in setup.py to use >=3.10,<4.0

MthwRobinson commented 2 weeks ago

@egeres - Is this causing installation issues for you or anything like that? I'd be okay with this update, though we only test up to Python 3.12 in CI, so any later versions would be "use at your own risk".

scanny commented 2 weeks ago

Given the number of third-party dependencies and the non-trivial nature of supporting a newer Python version, I'd be inclined to limit folks to the latest version we know works with all the dependencies.

Problems encountered when we go to add support for a new Python version have always been related to third-party package dependencies, not the unstructured code itself. If we lead folks to believe they can get a clean install with a new version and then that fails because of a pip error, I'd say it was on us for not letting them know up front.

egeres commented 2 weeks ago

@MthwRobinson nothing critical, it was a tiny hassle with langchain because I had to change my package version to >=3.13,<4.0. Generally speaking I have seen more packages setting the bounds to >=3.10,<4.0 than otherwise, but the idea of being cautious as @scanny says seems reasonable as well

MthwRobinson commented 2 days ago

Going to close this out and we'll keep this as is (keeping the upper bound on the Python version explicit). We can revisit this though if the stricter range becomes a problem for people.