DanBloomberg / leptonica

Leptonica is an open source library containing software that is broadly useful for image processing and image analysis applications. The official github repository for Leptonica is: danbloomberg/leptonica. See leptonica.org for more documentation.
Other
1.79k stars 391 forks source link

Error in pixReadMemBmp #607

Closed walter-weinmann closed 2 years ago

walter-weinmann commented 2 years ago

I'm running the latest binary version of Tesseract OCR on Ubuntu:

$ tesseract --version
tesseract 5.1.0
 leptonica-1.79.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found OpenMP 201511
 Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.4
 Found libcurl/7.68.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3

With this bmp file here I get the following error message:

Error in pixReadMemBmp: cannot read compressed BMP files Error in pixReadStream: bmp: no pix returned Error in pixRead: pix not read Error during processing.

DanBloomberg commented 2 years ago

Leptonica does not support compressed bmp files, in either reading or writing. There are at least three reasons for this: (1) Compression and decompression of the image data would probably require using a bmp library, and we are always working to avoid dependencies on new libraries. All code for bmp reading and writing is in bmpio.c. (2) With uncompressed data, we know exactly how big the data is, and this is used to guard against invalid or dangerous bmp files. (3) Most bmp files are made without compression, and if you want lossless compression, there are good alternatives with png and tiff libraries.

walter-weinmann commented 2 years ago

Thank you very much for the detailed explanations. I now use Microsoft Paint so that the bmp files are uncompressed.