DanBloomberg / leptonica

Leptonica is an open source library containing software that is broadly useful for image processing and image analysis applications. The official github repository for Leptonica is: danbloomberg/leptonica. See leptonica.org for more documentation.
Other
1.76k stars 387 forks source link

Leptonica reports IO errors #606

Closed CSBVision closed 2 years ago

CSBVision commented 2 years ago

Hello,

we are using Leptonica in combination with Tesseract as part of OpenCV. We compiled Leptonica and Tesseract as static libraries using Visual Studio that are linked inside OpenCV such that there is no explicit dependency on either Leptonica or Tesseract. In general everything works fine, however Leptonica reports the following errors:

Error in pixReadMemTiff: function not present
Error in pixReadMem: function not present
Error in pixaGenerateFontFromString: pix not made
Error in bmfCreate: font pixa not made

To my understanding these errors occur because Leptonica was compiled without libtiff, libpng and libjpeg to support the respective image files. Still, Leptonica and Tesseract work fine. I suspect the third-party libraries are necessary for file IO using Leptonica, however this is not needed and handled directly by OpenCV here. Leptonica/Tesseract are only used as libraries for OpenCV, not for any other application that relies on Leptonica for image IO.

If this is correct, these errors are irritating and misleading. So is there any option to disable them?

Checking Leptonica's source code leads to environ.h where defining NO_CONSOLE_IO to our understanding should disable these errors. Still, is this the best option or won't this also disable errors that should be kept?

Thanks for any support!

stweil commented 2 years ago

Please use the correct name "Tesseract". I suggest building Leptonica with libtiff. That should fix the error messages.

CSBVision commented 2 years ago

Thanks for pointing out (typo repeated by auto completion fixed, see edit). But for explained reasoning, I do not want any tiff/IO functionality in Leptonica - or is there any use of libtiff in Leptonica besides IO of tiff files? If not and using Leptonica only as a sub-library of OpenCV that offers all IO functionality (including libtiff as a static lib), linking it against libtiff makes no real sense to me as it only complicates the compilation (i.e. configuring the dependencies) and extends the file size without any real reason.

stweil commented 2 years ago

If you already have libtiff as a static lib for OpenCV, using that for Leptonica, too, would not add much bytes to your binary. The error messages indicate that the related code is called, so either provide the necessary library or change the code for your special needs (remove message or remove caller).

CSBVision commented 2 years ago

Unluckily, it's not that simple. OpenCV builds libtiff as a static library if it does not find a system-wide installation. This is usually the case on Windows systems (on Linux systems with system-wide installations of dependencies, it's a completely different story; here also Leptonica and Tesseract can be used as system-wide shared libraries). To configure OpenCV with Tesseract support, at first both, Leptonica and Tesseract have to be compiled. Once they are correctly configured, you can configure OpenCV's CMake configuration to find the Leptonica and Tesseract headers and libraries. Only thereafter, OpenCV will be compiled and thus triggers the compilation of libtiff.

A solution would include to manually compile libtiff and configure it for Leptonica first. Still, I don't see any real benefit if Leptonica is not used for any image IO. But since your mentioning that related code might be called. Do you have any idea which libtiff code might be executed or required even though no image IO takes place?

stweil commented 2 years ago

No, but it should be easy to run your code in a debugger with a breakpoint on the error message. That should answer the question.

CSBVision commented 2 years ago

OK debugging is also not that easy as it requires the compilation of OpenCV debug binaries. So maybe an easier solution is to suppress the errors?

DanBloomberg commented 2 years ago

You ask if there is any use of libtiff in leptonica besides IO of tiff files. Tesseract uses libtiff and leptonica to generate pdf files.

CSBVision commented 2 years ago

Thanks for this information! So if Leptonica and Tesseract are only used as libraries without IO, these should not be required at all. To the best of my knowledge, OpenCV only receives the detected text, but no image or PDF files are written. So is it the best option to set the define NO_CONSOLE_IO during compilation (i.e. adding /DNO_CONSOLE_IO to CMAKE_CXX_FLAGS)?

By the way: Why does Leptonica even report these errors when no IO takes place?

DanBloomberg commented 2 years ago

leptonica is instrumented to let you know, for many functions, if the function fails in some way. If you don't supply libitff and a call is made that would require the library, unless you forbid output on error, an error message will be produced. For I/O, if a library is missing, stubs for the functions are compiled instead, and these stubs generate an error message to stderr (unless redirected).

As you saw, leptonica gives you many ways to control the output to stderr, including redirecting the messages using custom error handlers. If you don't want to see those messages, you will need to use one of those methods. See environ.h and utils1.c.

CSBVision commented 2 years ago

Thanks for this clarification, then I will simply set this flag as mentioned before. But can you think of any reason why there are these IO errors even without any IO (only API calls from OpenCV)? I don't really understand this... but as long everything works, I'm fine with that.

DanBloomberg commented 2 years ago

You are calling a function in Tesseract that requires libtiff. If libtiff is not available, you will get the "function not present" message.

CSBVision commented 2 years ago

Yes of course, I agree with that. Still, the errors are already present by initializing Tesseract. Initializing Tesseract using OpenCV calls TessBaseAPI.Init(), see OpenCV initialization The respective code should only initialize the Tesseract library, I don't see any IO happening here, see Tesseract Initialization By inspecting the code, it's rather hard to find out where a libtiff function is used. I suspect it is also just an initialization step that prepares libtiff, even though the respective parts are not used by OpenCV as OCR thereafter works fine. In case you might know whether this might be true or where this libtiff call might come from, I will have a closer look on it. If not, I'm fine with simply setting the NO_CONSOLE_IO flag as mentioned before.

DanBloomberg commented 2 years ago

I'm fine with your work-around. Thank you for bringing this up and investigating it.

Dan

CSBVision commented 2 years ago

Alright, thanks for your time 👍

stweil commented 2 years ago

@CSBVision, you could try building Tesseract with TESSERACT_DISABLE_DEBUG_FONTS defined. Then the code no longer calls bmfCreate which triggers the TIFF related warnings.

(gdb) i s
#0  __GI___libc_write (fd=2, buf=buf@entry=0xffffffffd710, nbytes=nbytes@entry=46) at ../sysdeps/unix/sysv/linux/write.c:25
#1  0x0000fffff76f4aac in _IO_new_file_write (f=0xfffff77f7468 <_IO_2_1_stderr_>, data=0xffffffffd710, n=46) at fileops.c:1181
#2  0x0000fffff76f3e6c in new_do_write (fp=fp@entry=0xfffff77f7468 <_IO_2_1_stderr_>, data=data@entry=0xffffffffd710 "Error in pixReadMemTiff: function not present\n", 
    to_do=to_do@entry=46) at libioP.h:948
#3  0x0000fffff76f51bc in _IO_new_file_xsputn (n=46, data=<optimized out>, f=0xfffff77f7468 <_IO_2_1_stderr_>) at fileops.c:1255
#4  _IO_new_file_xsputn (f=0xfffff77f7468 <_IO_2_1_stderr_>, data=<optimized out>, n=46) at fileops.c:1197
#5  0x0000fffff76e8c54 in __GI__IO_fputs (str=0xffffffffd710 "Error in pixReadMemTiff: function not present\n", fp=0xfffff77f7468 <_IO_2_1_stderr_>) at libioP.h:948
#6  0x0000fffff79d3c68 in lept_stderr (fmt=fmt@entry=0xfffff7a32248 "Error in %s: %s\n") at ../../../../src/utils1.c:317
#7  0x0000fffff79d3d00 in returnErrorPtr (msg=msg@entry=0xfffff7a15228 "function not present", procname=procname@entry=0xfffff7a32198 "pixReadMemTiff", pval=pval@entry=0x0)
    at ../../../../src/utils1.c:233
#8  0x0000fffff79d39f0 in pixReadMemTiff (cdata=cdata@entry=0xaaaaaaafc1d0 "II*", size=size@entry=3680, n=n@entry=0) at ../../../../src/tiffiostub.c:196
#9  0x0000fffff799131c in pixReadMem (data=data@entry=0xaaaaaaafc1d0 "II*", size=3680) at ../../../../src/readfile.c:885
#10 0x0000fffff7852630 in pixaGenerateFontFromString (fontsize=14, pbl0=0xaaaaaaafc188, pbl1=0xaaaaaaafc18c, pbl2=0xaaaaaaafc190) at ../../../../src/bmf.c:528
#11 0x0000fffff78530d0 in bmfCreate (dir=0x0, fontsize=14) at ../../../../src/bmf.c:132
#12 0x0000fffff7d00c70 in tesseract::DebugPixa::DebugPixa (this=0xfffff6554d00) at ../../../src/ccstruct/debugpixa.h:20
#13 tesseract::Tesseract::Tesseract (this=this@entry=0xfffff6531010) at ../../../src/ccmain/tesseractclass.cpp:461
#14 0x0000fffff7ca5a74 in tesseract::TessBaseAPI::Init (this=this@entry=0xfffffffff028, data=<optimized out>, data@entry=0x0, data_size=data_size@entry=0, 
    language=0xfffff7e80848 "", language@entry=0x0, oem=oem@entry=tesseract::OEM_DEFAULT, configs=configs@entry=0xfffffffff4b8, configs_size=configs_size@entry=0, 
    vars_vec=vars_vec@entry=0xffffffffeff8, vars_values=vars_values@entry=0xfffffffff010, set_only_non_debug_params=set_only_non_debug_params@entry=false, reader=reader@entry=0x0)
    at ../../../src/api/baseapi.cpp:407
#15 0x0000fffff7ca5f94 in tesseract::TessBaseAPI::Init (this=this@entry=0xfffffffff028, datapath=datapath@entry=0x0, language=language@entry=0x0, 
    oem=oem@entry=tesseract::OEM_DEFAULT, configs=configs@entry=0xfffffffff4b8, configs_size=configs_size@entry=0, vars_vec=vars_vec@entry=0xffffffffeff8, 
    vars_values=vars_values@entry=0xfffffffff010, set_only_non_debug_params=set_only_non_debug_params@entry=false) at ../../../src/api/baseapi.cpp:371
#16 0x0000aaaaaaaa2a3c in main (argc=2, argv=0xfffffffff4a8) at ../../../src/tesseract.cpp:698
CSBVision commented 2 years ago

Thanks for pointing out! I think this is exactly what I was looking for such that the error reporting in general can stay enabled👍