UB-Mannheim / tesseract

Tesseract Open Source OCR Engine (main repository)
Apache License 2.0
3.16k stars 439 forks source link

libwebp in windows executable #73

Closed dhouse13 closed 1 year ago

dhouse13 commented 1 year ago

Current Behavior

When installing the windows versions provided here (https://github.com/UB-Mannheim/tesseract/wiki), the newest version (5.3.1.2023401) contains libwebp 1.3.0 which has a zero day vulnerability.

Similarly, the version we use (5.0.0-alpha.20190708) contains libwebp 0.6.1 which also has a zero day vulnerability.

We do not use the webp functionality (directly) but removing or replacing the dll with a good version causes negative results.

Expected Behavior

Update supported versions of the tesseract windows installer to include a non-vulnerable version of libwebp

Suggested Fix

No response

tesseract -v

tesseract v5.3.1.20230401 leptonica-1.83.1 libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.1.4) : libpng 1.6.39 : libtiff 4.5.0 : zlib 1.2.13 : libwebp 1.3.0 : libopenjp2 2.5.0 Found AVX512BW Found AVX512F Found AVX512VNNI Found AVX2 Found AVX Found FMA Found SSE4.1 Found libarchive 3.6.2 zlib/1.2.13 liblzma/5.2.9 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.2 Found libcurl/8.0.1 Schannel zlib/1.2.13 brotli/1.0.9 zstd/1.5.4 libidn2/2.3.4 libpsl/0.21.2 (+libidn2/2.3.3) libssh2/1.10.0


tesseract v5.0.0-alpha.20190708 leptonica-1.78.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 Found AVX512BW Found AVX512F Found AVX2 Found AVX Found SSE Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5

Operating System

Windows 11

Other Operating System

No response

uname -a

No response

Compiler

N/A, using your pre-built installers

CPU

N/A

Virtualization / Containers

n/a

Other Information

Details on how we can compile our own version with a good copy of libwebp could also solve the problem, particularly on the version we currently use (5.0.0-alpha.20190708)

stweil commented 1 year ago

Replacing the relevant DLL by a newer compatible one (from msys2) should work.

And the code is only used for WebP images which are still very rare. It is possible to avoid or minimize the risk if you either don't process such images or only process WebP images from trusted sources.

stweil commented 1 year ago

Details on how we can compile our own version with a good copy of libwebp could also solve the problem, particularly on the version we currently use (5.0.0-alpha.20190708)

The build script make-installer.sh is part of the sources. And all installer versions are tagged, for example release v5.0.0-alpha.20190708. You still have to get a working Debian build environment which requires some work. Use the cross build GitHub action as a starting point.

dhouse13 commented 1 year ago

Replacing the relevant DLL by a newer compatible one (from msys2) should work.

And the code is only used for WebP images which are still very rare. It is possible to avoid or minimize the risk if you either don't process such images or only process WebP images from trusted sources.

Sadly, we tried this and it broke our tooling, thus why we are looking at other solutions

dhouse13 commented 1 year ago

Replacing the relevant DLL by a newer compatible one (from msys2) should work. And the code is only used for WebP images which are still very rare. It is possible to avoid or minimize the risk if you either don't process such images or only process WebP images from trusted sources.

Sadly, we tried this and it broke our tooling, thus why we are looking at other solutions

We discovered why replacing the libwebp-7 DLL did not work. In the newest (1.3.2) version, libwebp-7 adds another dependency. We are testing, but adding the additional dependency does seem to solve our problem. The additional dependency is libsharpyuv-0.dll. We used files from here: https://packages.msys2.org/package/mingw-w64-x86_64-libwebp

stweil commented 1 year ago

I think this issue was fixed by the installer for Tesseract 5.3.3. Please reopen if it still exists.