Shreeshrii / tessdata_shreetest

finetuned traineddata files for tesseract 4.0.0 for testing
153 stars 30 forks source link

Size of the dictionary file is too big #8

Closed nkhdiscovery closed 5 years ago

nkhdiscovery commented 5 years ago

Hi, I am using these files and they are great, I tested with real world Persian (Farsi) numeric data and it works great, many thanks for your work and sharing.

The problem is, the traineddata file for fas in the main repository is almost 500KB while these traineddata files are almost huge (4M to 10M), why is that? How can I reduce the size? This has become a bottleneck in my android app using Tesseract 4.

Shreeshrii commented 5 years ago
  1. These are float models. They can be compressed to integer format. See combine_tessdata -c

  2. They may have large dictionary files - dawg. You can remove them and check for impact on accuracy. combine_tessdata -u to unpack And combine_tessdata to combine