Open amitdo opened 8 years ago
Needed to build leptonica and also download development packages for pango, cairo etc with c++ bindings
The dependencies are the same as Tesseract's training tools (Tesseract itself is not needed).
When you provide an output, please mark the output blocks with the mouse/keyboard and then press the 'insert code' button above the comment's text editing area.
Thanks Amit for the tip regarding 'insert code'.
There is one error.
In file included from /usr/include/stdlib.h:11:0,
from ./training/pango_font_info.cpp:30:
/usr/include/string.h:76:7: error: conflicting declaration of ‘char* strcasestr(const char*, const char*)’ with ‘C’ linkage
char *_EXFUN(strcasestr,(const char *, const char *));
^
Compiled ok.
ra@Shree ~/tesseract-ocr/text2tif
$ ./text2tif
Text file missing!
!FLAGS_text.empty():Error:Assert failed:in file ./training/text2image.cpp, line 427
Segmentation fault (core dumped)
I pushed a new commit, please check that it did not break anything.
compiled ok.
Please see Issue https://github.com/amitdo/text2tif/issues/5
ra@Shree ~/tesseract-ocr/text2tif
$ ./text2tif --fonts_dir= --list_available_fonts
(process:8744): Pango-CRITICAL **: pango_font_description_set_size: assertion 'size >= 0' failed
0: 8514fix
(process:8744): Pango-CRITICAL **: pango_font_description_set_size: assertion 'size >= 0' failed
1: 8514fix Bold
It does list the fonts, but with the pango messages coming in between also.
1) Did these messages appear with the previous commit? 2) Do these messages appear with Tesseract ?
I think I had installed pango debug info also on cygwin - possibly that is giving extra info.
ra@Shree ~/tesseract-ocr/text2tif
$ ./text2tif --list_available_fonts
FcInitiReinitialize failed!!
Segmentation fault (core dumped)
$ ./text2tif --list_available_fonts --fonts_dir=
Because these messages also appear when you run Tesseract, retesting the previous commit is not needed.
ra@Shree ~/tesseract-ocr/text2tif
$ ./text2tif --fonts_dir= --text ../langdata/ara/ara.training_text --font FreeSerif --outputbase ara.FreeSerif.exp0
Could not find font named FreeSerif. Pango suggested font DejaVu Serif
Please correct --font arg.:Error:Assert failed:in file ./training/text2image.cpp, line 437
Segmentation fault (core dumped)
works if all info is given correctly
ra@Shree ~/tesseract-ocr/text2tif
$ ./text2tif --fonts_dir= --text ../langdata/san/san.training_text --font Kokila --outputbase san.Kokila.exp0
Rendered page 0 to file san.Kokila.exp0.tif
Rendered page 1 to file san.Kokila.exp0.tif
Rendered page 2 to file san.Kokila.exp0.tif
Rendered page 3 to file san.Kokila.exp0.tif
Rendered page 4 to file san.Kokila.exp0.tif
Rendered page 5 to file san.Kokila.exp0.tif
Rendered page 6 to file san.Kokila.exp0.tif
Rendered page 7 to file san.Kokila.exp0.tif
Rendered page 8 to file san.Kokila.exp0.tif
Rendered page 9 to file san.Kokila.exp0.tif
Rtl = 0 ,vertical=0
ra@Shree ~/tesseract-ocr/text2tif
$ ./text2tif --fonts_dir= --text ../langdata/eng/eng.training_text --font Arial --outputbase eng.Arial.exp0
Rendered page 0 to file eng.Arial.exp0.tif
Rendered page 1 to file eng.Arial.exp0.tif
Rtl = 0 ,vertical=0
ra@Shree ~/tesseract-ocr/text2tif
$ ./text2tif --fonts_dir= --text ../langdata/ara/ara.training_text --font Arial --outputbase ara.Arial.exp0
Rendered page 0 to file ara.Arial.exp0.tif
Rendered page 1 to file ara.Arial.exp0.tif
Rtl = 1 ,vertical=0
ra@Shree ~/tesseract-ocr/text2tif
$ ./text2tif --fontconfig_refresh_cache
Text file missing!
!FLAGS_text.empty():Error:Assert failed:in file ./training/text2image.cpp, line 427
Segmentation fault (core dumped)
ra@Shree ~/tesseract-ocr/text2tif
$ ./text2tif --fonts_dir= --text ../langdata/san/san.training_text --fontconfig_refresh_cache
Output file missing!
!FLAGS_outputbase.empty():Error:Assert failed:in file ./training/text2image.cpp, line 428
Segmentation fault (core dumped)
Please see Issue https://github.com/amitdo/text2tif/issues/6
ra@Shree ~/tesseract-ocr/text2tif
$ ./text2tif --fonts_dir= --text ../langdata/san/san.training_text --fontconfig_refresh_cache --outputbase san.Kokila.exp0
Stripped 2226 unrenderable words
Error in boxaGetExtent: boxa not defined
Error in boxaAddBox: box not defined
Rendered page 0 to file san.Kokila.exp0.tif
Stripped 2148 unrenderable words
Error in boxaGetExtent: boxa not defined
Error in boxaAddBox: box not defined
Rendered page 1 to file san.Kokila.exp0.tif
Stripped 2173 unrenderable words
Error in boxaGetExtent: boxa not defined
Error in boxaAddBox: box not defined
Rendered page 2 to file san.Kokila.exp0.tif
Stripped 1844 unrenderable words
Rendered page 3 to file san.Kokila.exp0.tif
Stripped 2603 unrenderable words
Rendered page 4 to file san.Kokila.exp0.tif
Stripped 1760 unrenderable words
Error in boxaGetExtent: boxa not defined
Error in boxaAddBox: box not defined
Rendered page 5 to file san.Kokila.exp0.tif
Rtl = 0 ,vertical=0
If font is not specified, default font Arial is used. If it does not have coverage for the script then the tif file will be blank.
To help find available fonts for a particular script/language eg. ta for Tamil
$ fc-list :lang=ta -f "%{file}\n%{family}\n%{style}\n\n"
/usr/share/fonts/win-fonts/Nirmala.ttf
Nirmala UI
Regular,Normal,obyčejné,Standard,Κανονικά,Normaali,Normál,Normale,Standaard,Normalny,Обычный,Normálne,Navadno,Arrunta
/usr/share/fonts/unifont/unifont.ttf
Unifont
Medium
/usr/share/fonts/win-fonts/NirmalaS.ttf
Nirmala UI,Nirmala UI Semilight
Semilight,Normal,obyčejné,Standard,Κανονικά,Regular,Normaali,Normál,Normale,Standaard,Normalny,Обычный,Normálne,Navadno,Arrunta
/usr/share/fonts/win-fonts/NirmalaB.ttf
Nirmala UI
Bold,Negreta,tučné,fed,Fett,Έντονα,Negrita,Lihavoitu,Gras,Félkövér,Grassetto,Vet,Halvfet,Pogrubiony,Negrito,Полужирный,Fet,Kalın,Krepko,Lodia
/usr/share/fonts/lohit-tamil/Lohit-Tamil.ttf
Lohit Tamil
Regular
/usr/share/fonts/lohit-tamil-classical/Lohit-Tamil-Classical.ttf
Lohit Tamil Classical
Regular
fc-list :lang=en -f "%{family[0]} %{style[0]}\n" | sort -u > en-fonts-list
We cannot use ALL fonts for a particular language as some of them may not have correct rendering, specially for devanagari etc.
However such a list can be useful for fixing the language specific.sh file to only list available fonts.
Sample of incorrect rendering for devanagari:
Someone needs to test it...