Sicos1977 / TesseractOCR

A .net library to work with Google's Tesseract
161 stars 20 forks source link

How to build Tesseract & Leptonica for testing target .net 8.0 on Windows. #59

Closed zydjohnHotmail closed 4 months ago

zydjohnHotmail commented 5 months ago

Hello: I found your repo, it looks a better repo than most other C# version. So I want to use your repo to test some pictures using OCR. But since .net 8.0 is out and it is under long term support, so I decided to try to change your repo to target only .net 8.0 on Windows for only x64 CPU, not for x86 CPU. I can write one class library for targeting .net 8.0 on x64 cpu and copy all the C# code to related folders, and compile the code, and the compiling worked, so I got one TesseractOCR8.dll with size of 172KB. And then I followed your instructions to build Tesseract & Leptonica libraries. Here are major steps:

  1. For tesseract with the latest version (5.3.4) git clone https://github.com/tesseract-ocr/tesseract.git cd tesseract git checkout -b 5.3.4 5.3.4 mkdir vs22-x64 & cd vs22-x64 cmake .. -G "Visual Studio 17 2022" -A x64 -DAUTO_OPTIMIZE=OFF -DSW_BUILD=OFF -DBUILD_TRAINING_TOOLS=OFF -DBUILD_SHARED_LIBS=ON -DCMAKE_INSTALL_PREFIX=....\build\x64 cmake --build . --config Release --target install

The above command finished successfully, even I saw some minor warnings, but I can see the DLL file is built:

D:\RaceOCR\HLSVideo\TesseractOCR8\TesseractOCR8\x64>dir tesseract53.dll /w Directory of D:\RaceOCR\HLSVideo\TesseractOCR8\TesseractOCR8\x64 tesseract53.dll 1 File(s) 2,741,248 bytes

D:\RaceOCR\HLSVideo\TesseractOCR8\TesseractOCR8\x64>

  1. For Leptonica with the latest version (1.84.1) vcpkg install giflib:x64-windows-static libjpeg-turbo:x64-windows-static liblzma:x64-windows-static libpng:x64-windows-static tiff:x64-windows-static zlib:x64-windows-static

git clone https://github.com/DanBloomberg/leptonica.git & cd leptonica
git checkout -b 1.84.1 1.84.1

mkdir vs22-x64 & cd vs22-x64

cmake .. -G "Visual Studio 17 2022" -A x64 -DSW_BUILD=OFF -DBUILD_SHARED_LIBS=ON -DCMAKE_TOOLCHAIN_FILE=%VCPKG_HOME%\scripts\buildsystems\vcpkg.cmake -DVCPKG_TARGET_TRIPLET=x64-windows-static -DCMAKE_INSTALL_PREFIX=....\build\x64 cmake --build . --config Release --target install

The above command finished successfully, even I saw some minor warnings, but I can see the DLL file is built: D:\RaceOCR\HLSVideo\leptonica\vs22-x64>dir *.dll /s /w Directory of D:\RaceOCR\HLSVideo\leptonica\vs22-x64\bin\Release leptonica-1.84.1.dll 1 File(s) 4,404,224 bytes D:\RaceOCR\HLSVideo\leptonica\vs22-x64>

Then I write one WinForms project to test the class library, I added the project reference to TesseractOCR8.dll, which is OK. But when I want to add reference to newly built tesseract53.dll, I got error: The reference is invalid or unsupported!

You can see the error in the screen shot. Please let me know how can I fix this issue. The following is my environment: OS: Windows 10 (Version 22H2: OS Build 19045.4291) IDE: Visual Studio 2022 (Version 17.9.6, latest version as of today: April 16, 2024) I also installed: cmake (version: cmake version 3.28.0-msvc1) vcpkg (version: vcpkg package management program version 2024-03-14-7d353e869753e5609a1f1a057df3db8fd356e49d)

If you need other information, please let me know. By the way, how do you load the tessdata for the language file (English), I can’t see this from the repo. Please advise, Thanks, AddTesseractOCR53Dll_NOK

Sicos1977 commented 4 months ago

You cant add the Tesseract53.dll like that. It is a c++ library that is called from TesseractOCR