Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.45k stars 581 forks source link

Update Docker images to use Python 3.12 #3051

Open MthwRobinson opened 1 month ago

MthwRobinson commented 1 month ago

Currently the AMD image uses Python 3.11 and the ARM image using Python 3.10. Since we support Python 3.12 as of #3047, we can now update these containers to use Python 3.12 instead. This will keep us on the latest version and reduce the risk that our build will break of Python 3.11 is dropped in wolfi-base:latest.

MthwRobinson commented 1 month ago

Tried this but, as seen in this job, it says unstructured_inference is not installed for some reason.

python3.12 -c "from unstructured_inference.models.tables import UnstructuredTableTransformerModel; model = UnstructuredTableTransformerModel(); model.initialize('microsoft/table-transformer-structure-recognition')":
1.070 [nltk_data] Downloading package punkt to /home/nonroot/nltk_data...
1.122 [nltk_data]   Unzipping tokenizers/punkt.zip.
2.206 [nltk_data] Downloading package averaged_perceptron_tagger to
2.206 [nltk_data]     /home/nonroot/nltk_data...
2.228 [nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
2.481 Traceback (most recent call last):
2.481   File "<string>", line 1, in <module>
2.481   File "/app/unstructured/partition/model_init.py", line 3, in <module>
2.481     from unstructured_inference.models.base import get_model
2.481 ModuleNotFoundError: No module named 'unstructured_inference'
------
Dockerfile-amd64:36
--------------------
  35 |     
  36 | >>> RUN python3.12 -c "import nltk; nltk.download('punkt')" && \
  37 | >>>   python3.12 -c "import nltk; nltk.download('averaged_perceptron_tagger')" && \
  38 | >>>   python3.12 -c "from unstructured.partition.model_init import initialize; initialize()" && \
  39 | >>>   python3.12 -c "from unstructured_inference.models.tables import UnstructuredTableTransformerModel; model = UnstructuredTableTransformerModel(); model.initialize('microsoft/table-transformer-structure-recognition')"
  40 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c python3.12 -c \"import nltk; nltk.download('punkt')\" &&   python3.12 -c \"import nltk; nltk.download('averaged_perceptron_tagger')\" &&   python3.12 -c \"from unstructured.partition.model_init import initialize; initialize()\" &&   python3.12 -c \"from unstructured_inference.models.tables import UnstructuredTableTransformerModel; model = UnstructuredTableTransformerModel(); model.initialize('microsoft/table-transformer-structure-recognition')\"" did not complete successfully: exit code: 1
MthwRobinson commented 1 month ago

Looks like the issue may be pycocotools

  Building wheel for pycocotools (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for pycocotools (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [25 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-312
      creating build/lib.linux-x86_64-cpython-312/pycocotools
      copying pycocotools/cocoeval.py -> build/lib.linux-x86_64-cpython-312/pycocotools
      copying pycocotools/coco.py -> build/lib.linux-x86_64-cpython-312/pycocotools
      copying pycocotools/mask.py -> build/lib.linux-x86_64-cpython-312/pycocotools
      copying pycocotools/__init__.py -> build/lib.linux-x86_64-cpython-312/pycocotools
      running build_ext
      /tmp/pip-build-env-4ptm5umq/overlay/lib/python3.12/site-packages/Cython/Compiler/Main.py:381: FutureWarning: Cython directive 'language_level' not set, using '3str' for now (Py3). This has changed from earlier releases! File: /tmp/pip-install-02k6iv2j/pycocotools_341f2d6f5e184f7499e7fde6f3c47217/pycocotools/_mask.pyx
        tree = Parsing.p_module(s, pxd, full_module_name)
      Compiling pycocotools/_mask.pyx because it changed.
      [1/1] Cythonizing pycocotools/_mask.pyx
      building 'pycocotools._mask' extension
      creating build/temp.linux-x86_64-cpython-312
      creating build/temp.linux-x86_64-cpython-312/common
      creating build/temp.linux-x86_64-cpython-312/pycocotools
      x86_64-pc-linux-gnu-gcc -fno-strict-overflow -DNDEBUG -g -O3 -Wall -O2 -Wall -fomit-frame-pointer -march=x86-64-v2 -mtune=broadwell -O2 -Wall -fomit-frame-pointer -march=x86-64-v2 -mtune=broadwell -fPIC -I/tmp/pip-build-env-4ptm5umq/overlay/lib/python3.12/site-packages/numpy/core/include -I./common -I/usr/include/python3.12 -c ./common/maskApi.c -o build/temp.linux-x86_64-cpython-312/./common/maskApi.o -Wno-cpp -Wno-unused-function -std=c99
      ./common/maskApi.c:8:10: fatal error: math.h: No such file or directory
          8 | #include <math.h>
            |          ^~~~~~~~
      compilation terminated.
      error: command '/usr/bin/x86_64-pc-linux-gnu-gcc' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for pycocotools
Failed to build pycocotools
ERROR: Could not build wheels for pycocotools, which is required to install pyproject.toml-based projects