conda-forge / tesseract-feedstock

A conda-smithy repository for tesseract.
BSD 3-Clause "New" or "Revised" License
4 stars 17 forks source link

Reduced tesseract capabilities because of missing libraries for build process #63

Open stweil opened 2 months ago

stweil commented 2 months ago

Solution to issue cannot be found in the documentation.

Issue

The Tesseract package is built without libarchive and libcurl:

tesseract --version
tesseract 5.3.4
 leptonica-1.83.1
  libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 3.0.0) : libpng 1.6.39 : libtiff 4.6.0 : zlib 1.2.13 : libwebp 1.3.2 : libopenjp2 2.5.0
 Found NEON

Therefore some functionality (for example running OCR with an image URL) is missing.

Installed packages

not relevant

Environment info

not relevant
carlodri commented 1 month ago

@stweil can you open a PR adding the missing libraries?

stweil commented 1 month ago

See #66 which adds libcurl. As this is my first PR here I might have missed something.

libarchive was already in the package list, but not found by the build process. I still have no solution how to fix this.

scw commented 1 month ago

@stweil Thanks for adding libcurl! This package fails to import on Windows still because it was still missing. For libarchive, I do see it showing up once I manually add libcurl to the environment on Windows:

> tesseract --version
tesseract 5.4.1
 leptonica-1.83.1 (Oct 11 2023, 07:58:44) [MSC v.1937 LIB Release x64]
  libjpeg 8d (libjpeg-turbo 3.0.0) : libpng 1.6.43 : libtiff 4.6.0 : zlib 1.2.13 : libopenjp2 2.5.2
 Found AVX
 Found SSE4.1
 Found libarchive 3.7.4 zlib/1.2.13 liblzma/5.2.6 bz2lib/1.0.8 liblz4/1.9.3 libzstd/1.5.6
 Found libcurl/8.8.0 Schannel zlib/1.3.1 libssh2/1.11.0

However on MacOS I also see no mention of libarchive or libcurl. If you build this locally, you should see in the configuration step where it tries to identify the packages contained within the build environment. On Windows, this looks like:

-- Found LibArchive: C:/conda/conda-bld/tesseract/_h_env/Library/lib/archive.lib (found version "3.6.2")

On Windows, this package uses CMake, but uses configure / make on POSIX systems. Perhaps it can switch to CMake on Linux, or you can try passing in the configure flags like --with-curl. In CMake, all the library / include locations are set to reference the environment itself, it should be possible to do the same for configure based builds.

stweil commented 1 month ago

libarchive is less important (it supports zipped model files, but currently there are no such files as far as I know), so it would not matter much if some platforms don't find it.

libcurl is more important because it allows OCR with image URLs.