UB-Mannheim / tesseract

Tesseract Open Source OCR Engine (main repository)
Apache License 2.0
3.16k stars 439 forks source link

Windows binaries very big when I compile myself #43

Closed dmigowski closed 3 years ago

dmigowski commented 3 years ago

Hello,

I am trying to compile my own tesseract to be able to test some code. I am doing this:

apt-get install git curl
git clone https://github.com/UB-Mannheim/tesseract.git --branch windows
cd tesseract/
.github/workflows/build.sh x86_64

Binaries seem to be created at ... ./bin/ndebug/x86_64-w64-mingw32-/usr/x86_64-w64-mingw32/bin ...

However, when I compare them to the binaries build by you I find that all of them are 50% bigger

Your binary My size Your size
libtesseract-5.dll 94,3 MB 66,8 MB
tesseract.exe 1,03 MB 0,65 MB

My System:

root@devbox:~# uname -a
Linux www 4.19.0-10-amd64 #1 SMP Debian 4.19.132-1 (2020-07-24) x86_64 GNU/Linux
root@devbox:~# apt-cache show mingw-w64 | grep Version
Version: 6.0.0-3

Any ideas?

mzettwitz commented 3 years ago

You seem to have a debug build, which is larger by nature.

stweil commented 3 years ago

The script .github/workflows/build.sh runs a release build, but all binaries include debug information (important for debugging and profiling).

Use x86_64-w64-mingw32-strip to remove that debug information.

stweil commented 3 years ago

Here you can see the effect of x86_64-w64-mingw32-strip libtesseract-5.dll:

# size with debug information
-rwxr-xr-x 1 debian debian 110669963  6. Mai 12:35 libtesseract-5.dll
# size without debug information
-rwxr-xr-x 1 debian debian  3172864  6. Mai 12:48 libtesseract-5.dll
stweil commented 3 years ago

Different compilers (gcc, clang) and compiler versions also create different sizes (more or less optimizations, more or less debug information). As the debug information also includes path names, the name of your local source and build directory also contributes to the debug information and to the binary code (because that also includes assertions with the full name of the source file).

stweil commented 3 years ago

@dmigowski, can I close this issue or do you need more information?

dmigowski commented 3 years ago

Wasn't able to test this sooner but now it works and the binaries are even smaller than the VC binaries. Thank you!