amitdo / text2tif-2016

A fork of Tesseract's text2image program
Other
0 stars 0 forks source link

Training Process on CYGWIN - for reference #4

Closed Shreeshrii closed 8 years ago

Shreeshrii commented 8 years ago

English training using tesstrain.sh with three fonts and at all three exposure levels

ra@Shree ~/tesseract-ocr/tesseract/training
$ ./tesstrain.sh --lang eng --langdata_dir ../../langdata --tessdata_dir .. --fontlist Arial Cambria Calibri --fonts_dir /usr/share/fonts --exposures "-1 0 1" --overwrite

=== Starting training for language 'eng'
[Fri, Mar 25, 2016 3:06:25 PM] /usr/bin/text2image --fonts_dir=/usr/share/fonts --font=Arial --outputbase=/tmp/font_tmp.Jix4iCDJgQ/sample_text.txt --text=/tmp/font_tmp.Jix4iCDJgQ/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.Jix4iCDJgQ
Rendered page 0 to file /tmp/font_tmp.Jix4iCDJgQ/sample_text.txt.tif
Rtl = 0 ,vertical=0

=== Phase I: Generating training images ===
Rendering using Arial
Rendering using Cambria
Rendering using Calibri
[Fri, Mar 25, 2016 3:06:28 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Jix4iCDJgQ --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=-1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1 --font=Calibri --text=../../langdata/eng/eng.training_text
[Fri, Mar 25, 2016 3:06:28 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Jix4iCDJgQ --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=-1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1 --font=Arial --text=../../langdata/eng/eng.training_text
[Fri, Mar 25, 2016 3:06:28 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Jix4iCDJgQ --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=-1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1 --font=Cambria --text=../../langdata/eng/eng.training_text
Rendered page 0 to file /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1.tif
Rendered page 0 to file /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1.tif
Rendered page 0 to file /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1.tif
Rendered page 1 to file /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1.tif
Rendered page 1 to file /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1.tif
Rtl = 0 ,vertical=0
Rtl = 0 ,vertical=0
Rendered page 1 to file /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1.tif
Rtl = 0 ,vertical=0
Extracting font properties of ArialExtracting font properties of Cambria
Extracting font properties of Calibri
[Fri, Mar 25, 2016 3:06:31 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Jix4iCDJgQ --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=-1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1 --font=Arial --ligatures=false --text=../../langdata/eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32[Fri, Mar 25, 2016 3:06:31 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Jix4iCDJgQ --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=-1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1 --font=Cambria --ligatures=false --text=../../langdata/eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32
[Fri, Mar 25, 2016 3:06:31 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Jix4iCDJgQ --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=-1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1 --font=Calibri --ligatures=false --text=../../langdata/eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32
Extracting font properties only
Extracting font properties only
Extracting font properties only
Done!
Done!
Done!
Rendering using Arial
Rendering using Cambria
Rendering using Calibri
[Fri, Mar 25, 2016 3:06:32 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Jix4iCDJgQ --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0 --font=Arial --text=../../langdata/eng/eng.training_text
[Fri, Mar 25, 2016 3:06:32 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Jix4iCDJgQ --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0 --font=Calibri --text=../../langdata/eng/eng.training_text[Fri, Mar 25, 2016 3:06:32 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Jix4iCDJgQ --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0 --font=Cambria --text=../../langdata/eng/eng.training_text
Rendered page 0 to file /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0.tif
Rendered page 0 to file /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0.tif
Rendered page 0 to file /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0.tif
Rendered page 1 to file /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0.tif
Rendered page 1 to file /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0.tifRendered page 1 to file /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0.tif

Rtl = 0 ,vertical=0Rtl = 0 ,vertical=0
Extracting font properties of Arial
Extracting font properties of Cambria
Extracting font properties of Calibri
[Fri, Mar 25, 2016 3:06:34 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Jix4iCDJgQ --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0 --font=Arial --ligatures=false --text=../../langdata/eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32
[Fri, Mar 25, 2016 3:06:34 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Jix4iCDJgQ --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0 --font=Cambria --ligatures=false --text=../../langdata/eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32
[Fri, Mar 25, 2016 3:06:34 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Jix4iCDJgQ --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0 --font=Calibri --ligatures=false --text=../../langdata/eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32
Extracting font properties only
Extracting font properties only
Extracting font properties only
Done!
Done!
Done!
Rendering using Arial
Rendering using Cambria
Rendering using Calibri
[Fri, Mar 25, 2016 3:06:36 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Jix4iCDJgQ --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1 --font=Cambria --text=../../langdata/eng/eng.training_text
[Fri, Mar 25, 2016 3:06:36 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Jix4iCDJgQ --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp1 --font=Calibri --text=../../langdata/eng/eng.training_text
[Fri, Mar 25, 2016 3:06:36 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Jix4iCDJgQ --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1 --font=Arial --text=../../langdata/eng/eng.training_text
Rendered page 0 to file /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1.tif
Rendered page 0 to file /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1.tif
Rendered page 0 to file /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp1.tif
Rendered page 1 to file /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1.tif
Rtl = 0 ,vertical=0
Extracting font properties of Cambria
[Fri, Mar 25, 2016 3:06:39 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Jix4iCDJgQ --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1 --font=Cambria --ligatures=false --text=../../langdata/eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32
Rendered page 1 to file /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1.tifRendered page 1 to file /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp1.tif

Rtl = 0 ,vertical=0
Rtl = 0 ,vertical=0
Extracting font properties only
Extracting font properties of Calibri
Extracting font properties of Arial
[Fri, Mar 25, 2016 3:06:39 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Jix4iCDJgQ --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp1 --font=Calibri --ligatures=false --text=../../langdata/eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32
Done!
[Fri, Mar 25, 2016 3:06:39 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Jix4iCDJgQ --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1 --font=Arial --ligatures=false --text=../../langdata/eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32
Extracting font properties only
Extracting font properties only
Done!
Done!

=== Phase UP: Generating unicharset and unichar properties files ===
[Fri, Mar 25, 2016 3:06:40 PM] /usr/bin/unicharset_extractor -D /tmp/tmp.xzVHiaORBx/eng/ /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0.box /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1.box /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1.box /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0.box /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp1.box /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1.box /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0.box /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1.box /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1.box
Extracting unicharset from /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0.box
Extracting unicharset from /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1.box
Extracting unicharset from /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1.box
Extracting unicharset from /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0.box
Extracting unicharset from /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp1.box
Extracting unicharset from /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1.box
Extracting unicharset from /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0.box
Extracting unicharset from /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1.box
Extracting unicharset from /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1.box
Wrote unicharset file /tmp/tmp.xzVHiaORBx/eng//unicharset.
[Fri, Mar 25, 2016 3:06:40 PM] /usr/bin/set_unicharset_properties -U /tmp/tmp.xzVHiaORBx/eng/eng.unicharset -O /tmp/tmp.xzVHiaORBx/eng/eng.unicharset -X /tmp/tmp.xzVHiaORBx/eng/eng.xheights --script_dir=../../langdata
Loaded unicharset of size 118 from file /tmp/tmp.xzVHiaORBx/eng/eng.unicharset
Setting unichar properties
Other case É of é is not in unicharset
Other case FF of ff is not in unicharset
Other case TI of ti is not in unicharset
Other case FI of fi is not in unicharset
Other case FT of ft is not in unicharset
Other case Ffi of ffi is not in unicharset
Warning: properties incomplete for index 25 = ~
Writing unicharset to file /tmp/tmp.xzVHiaORBx/eng/eng.unicharset

=== Phase D: Generating Dawg files ===
Generating word Dawg
[Fri, Mar 25, 2016 3:06:41 PM] /usr/bin/wordlist2dawg -r 1 ../../langdata/eng/eng.wordlist /tmp/tmp.xzVHiaORBx/eng/eng.word-dawg /tmp/tmp.xzVHiaORBx/eng/eng.unicharset
Set reverse_policy to RRP_REVERSE_IF_HAS_RTL
Loading unicharset from '/tmp/tmp.xzVHiaORBx/eng/eng.unicharset'
Reading word list from '../../langdata/eng/eng.wordlist'
Reducing Trie to SquishedDawg
Writing squished DAWG to '/tmp/tmp.xzVHiaORBx/eng/eng.word-dawg'
Generating frequent-word Dawg
[Fri, Mar 25, 2016 3:06:47 PM] /usr/bin/wordlist2dawg -r 1 /tmp/tmp.xzVHiaORBx/eng/eng.wordlist.clean.freq /tmp/tmp.xzVHiaORBx/eng/eng.freq-dawg /tmp/tmp.xzVHiaORBx/eng/eng.unicharset
Set reverse_policy to RRP_REVERSE_IF_HAS_RTL
Loading unicharset from '/tmp/tmp.xzVHiaORBx/eng/eng.unicharset'
Reading word list from '/tmp/tmp.xzVHiaORBx/eng/eng.wordlist.clean.freq'
Reducing Trie to SquishedDawg
Writing squished DAWG to '/tmp/tmp.xzVHiaORBx/eng/eng.freq-dawg'
[Fri, Mar 25, 2016 3:06:47 PM] /usr/bin/wordlist2dawg -r 0 ../../langdata/eng/eng.punc /tmp/tmp.xzVHiaORBx/eng/eng.punc-dawg /tmp/tmp.xzVHiaORBx/eng/eng.unicharset
Set reverse_policy to RRP_DO_NO_REVERSE
Loading unicharset from '/tmp/tmp.xzVHiaORBx/eng/eng.unicharset'
Reading word list from '../../langdata/eng/eng.punc'
Reducing Trie to SquishedDawg
Writing squished DAWG to '/tmp/tmp.xzVHiaORBx/eng/eng.punc-dawg'
[Fri, Mar 25, 2016 3:06:47 PM] /usr/bin/wordlist2dawg -r 0 ../../langdata/eng/eng.numbers /tmp/tmp.xzVHiaORBx/eng/eng.number-dawg /tmp/tmp.xzVHiaORBx/eng/eng.unicharset
Set reverse_policy to RRP_DO_NO_REVERSE
Loading unicharset from '/tmp/tmp.xzVHiaORBx/eng/eng.unicharset'
Reading word list from '../../langdata/eng/eng.numbers'
Reducing Trie to SquishedDawg
Writing squished DAWG to '/tmp/tmp.xzVHiaORBx/eng/eng.number-dawg'
[Fri, Mar 25, 2016 3:06:47 PM] /usr/bin/wordlist2dawg -r 1 ../../langdata/eng/eng.word.bigrams /tmp/tmp.xzVHiaORBx/eng/eng.bigram-dawg /tmp/tmp.xzVHiaORBx/eng/eng.unicharset
Set reverse_policy to RRP_REVERSE_IF_HAS_RTL
Loading unicharset from '/tmp/tmp.xzVHiaORBx/eng/eng.unicharset'
Reading word list from '../../langdata/eng/eng.word.bigrams'
Reducing Trie to SquishedDawg
Writing squished DAWG to '/tmp/tmp.xzVHiaORBx/eng/eng.bigram-dawg'

=== Phase E: Extracting features ===
Using TESSDATA_PREFIX=..
[Fri, Mar 25, 2016 3:07:10 PM] /usr/local/bin/tesseract /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1.tif /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1 box.train
[Fri, Mar 25, 2016 3:07:10 PM] /usr/local/bin/tesseract /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1.tif /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1 box.train
[Fri, Mar 25, 2016 3:07:10 PM] /usr/local/bin/tesseract /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1.tif /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1 box.train
[Fri, Mar 25, 2016 3:07:10 PM] /usr/local/bin/tesseract /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0.tif /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0 box.train[Fri, Mar 25, 2016 3:07:10 PM] /usr/local/bin/tesseract /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0.tif /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0 box.train
[Fri, Mar 25, 2016 3:07:10 PM] /usr/local/bin/tesseract /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1.tif /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1 box.train
[Fri, Mar 25, 2016 3:07:10 PM] /usr/local/bin/tesseract /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0.tif /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0 box.train
[Fri, Mar 25, 2016 3:07:10 PM] /usr/local/bin/tesseract /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp1.tif /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp1 box.train
Tesseract Open Source OCR Engine v3.05.00dev-292-g66f37f0 with LeptonicaTesseract Open Source OCR Engine v3.05.00dev-292-g66f37f0 with Leptonica

Page 1Page 1

Tesseract Open Source OCR Engine v3.05.00dev-292-g66f37f0 with LeptonicaTesseract Open Source OCR Engine v3.05.00dev-292-g66f37f0 with Leptonica
Page 1act Open Source OCR Engine v3.05.00dev-292-g66f37f0 with Leptonica
Page 1
Page 1
Tesseract Open Source OCR Engine v3.05.00dev-292-g66f37f0 with Leptonica
Page 1
Tesseract Open Source OCR Engine v3.05.00dev-292-g66f37f0 with LeptonicaTesseract Open Source OCR Engine v3.05.00dev-292-g66f37f0 with Leptonica
Page 1Page 1
FAIL!
FAIL!
APPLY_BOXES: boxfile line 74/_ ((137,4565),(166,4569)): FAILURE! Couldn't find a matching blob
APPLY_BOXES: boxfile line 74/_ ((129,4563),(148,4566)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 75/_ ((148,4563),(167,4566)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 74/_ ((137,4565),(166,4569)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 831/— ((641,3703),(691,3707)): FAILURE! Couldn't find a matching blob
FAIL!APPLY_BOXES:
FAIL!_BOXES: boxfile line 1172/- ((2029,3345),(2043,3349)): FAILURE! Couldn't find a matching blob
FAIL!_BOXES:
APPLY_BOXES: boxfile line 1645/i ((668,2724),(673,2760)): FAILURE! Couldn't find a matching blobAPPLY_BOXES: boxfile line 75/_ ((148,4563),(167,4566)): FAILURE! Couldn't find a matching blobAPPLY_BOXES: boxfile line 75/_ ((165,4564),(194,4569)): FAILURE! Couldn't find a matching blob   Boxes read from boxfile:    3800
   Boxes failed resegmentation:       279),(160,4583)): FAILURE! Couldn't find a matching blob
FAIL!FAIL!APPLY_BOXES: boxfile line 1096/_ ((1759,3372),(1778,3375)): FAILURE! Couldn't find a matching blob
   Found 3798 good blobs. 1128/r ((778,3292),(796,3316)): FAILURE! Couldn't find a matching blob
APPLY_BOXES: boxfile line 831/— ((641,3703),(691,3707)): FAILURE! Couldn't find a matching blobAPPLY_BOXES: boxfile line 2927/” ((1047,1250),(1059,1262)): FAILURE! Couldn'FAIL!_BOXES: boxfile line 1160/- ((1837,3442),(1850,3446)): FAILURE! Couldn't find a matching blobAPPLY_BOXES:
APPLY_BOXES: boxfile line 3225/g ((714,866),(737,903)): FAILURE! Couldn't find a matching blob   Boxes read from boxfile:    3800
APPLY_BOXES:d from boxfile:    4131
   Boxes failed resegmentation:       4
   Boxes read from boxfile:    3871FAIL!
APPLY_BOXES:d from boxfile:    3871
   Boxes read from boxfile:    4131   Found 3796 good blobs.URE! Couldn't find a matching blob
   Found 3868 good blobs.
   Leaving 2 unlabelled blobs in 0 words.506),(1688,3510)): FAILURE! Couldn't find a matching blob
   Boxes failed resegmentation:       3
   Boxes failed resegmentation:       6   Leaving 1 unlabelled blobs in 0 words.
   Boxes failed resegmentation:       2APPLY_BOXES: boxfile line 74/_ ((137,4565),(166,4569)): FAILURE! Couldn't find a matching blob   Found 3865 good blobs.
   Found 4129 good blobs.
APPLY_BOXES: boxfile line 75/_ ((165,4564),(194,4569)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 831/— ((641,3703),(691,3707)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 1096/_ ((1832,3413),(1861,3417)): FAILURE! Couldn't find a matching blob
APPLY_BOXES:
   Boxes read from boxfile:    3871
   Boxes failed resegmentation:       4
   Found 3867 good blobs.
Generated training data for 947 words
Page 2
Generated training data for 947 words
Page 2
Generated training data for 1018 wordsGenerated training data for 920 words
Generated training data for 929 words
Page 2
Generated training data for 1016 words
Page 2
APPLY_BOXES:
   Boxes read from boxfile:    1496
   Found 1496 good blobs.
APPLY_BOXES:
   Boxes read from boxfile:    1496APPLY_BOXES:
   Boxes read from boxfile:    1189
APPLY_BOXES:APPLY_BOXES: boxfile line 89/r ((585,4576),(603,4600)): FAILURE! Couldn't find a matching blob
APPLY_BOXES:   Boxes read from boxfile:    1566
   Found 1566 good blobs.e:    1566
   Boxes failed resegmentation:       1
   Found 1189 good blobs.   Found 1565 good blobs.
   Leaving 2 unlabelled blobs in 0 words.
   Boxes read from boxfile:    1189
Generated training data for 946 words
Page 2nd 1189 good blobs.
Generated training data for 1016 words
Page 2
APPLY_BOXES:
   Boxes read from boxfile:    1496
   Found 1496 good blobs.
APPLY_BOXES:
   Boxes read from boxfile:    1189
   Found 1189 good blobs.
Generated training data for 308 words
Generated training data for 381 words
Generated training data for 308 words
Generated training data for 378 words
Generated training data for 400 words
Generated training data for 393 words
Generated training data for 308 words
Generated training data for 378 words
[Fri, Mar 25, 2016 3:08:06 PM] /usr/local/bin/tesseract /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1.tif /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1 box.train
Tesseract Open Source OCR Engine v3.05.00dev-292-g66f37f0 with Leptonica
Page 1
FAIL!
APPLY_BOXES: boxfile line 74/_ ((129,4563),(148,4566)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 1096/_ ((1759,3372),(1778,3375)): FAILURE! Couldn't find a matching blob
APPLY_BOXES: boxfile line 1128/r ((778,3292),(796,3316)): FAILURE! Couldn't find a matching blob
APPLY_BOXES:
   Boxes read from boxfile:    3800
   Boxes failed resegmentation:       3
   Found 3797 good blobs.
   Leaving 2 unlabelled blobs in 0 words.
Generated training data for 927 words
Page 2
APPLY_BOXES:
   Boxes read from boxfile:    1566
   Found 1566 good blobs.
Generated training data for 400 words

=== Phase C: Clustering feature prototypes (cnTraining) ===
[Fri, Mar 25, 2016 3:08:19 PM] /usr/bin/cntraining -D /tmp/tmp.xzVHiaORBx/eng/ /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0.tr /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1.tr /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1.tr /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0.tr /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp1.tr /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1.tr /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0.tr /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1.tr /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1.tr
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0.tr ...
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1.tr ...
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1.tr ...
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0.tr ...
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp1.tr ...
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1.tr ...
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0.tr ...
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1.tr ...
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1.tr ...
Clustering ...

Writing /tmp/tmp.xzVHiaORBx/eng//normproto ...
./tesstrain.sh: line 65: fic.sh: command not found
[Fri, Mar 25, 2016 3:08:36 PM] /usr/bin/text2image --fonts_dir=/usr/share/fonts --font=Arial --outputbase=/tmp/font_tmp.iPNkm7FgJu/sample_text.txt --text=/tmp/font_tmp.iPNkm7FgJu/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.iPNkm7FgJu
Rendered page 0 to file /tmp/font_tmp.iPNkm7FgJu/sample_text.txt.tif
Rtl = 0 ,vertical=0

=== Phase I: Generating training images ===
Rendering using Arial
Rendering using Cambria
Rendering using Calibri
[Fri, Mar 25, 2016 3:08:40 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.iPNkm7FgJu --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=-1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1 --font=Cambria --text=../../langdata/eng/eng.training_text
[Fri, Mar 25, 2016 3:08:40 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.iPNkm7FgJu --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=-1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1 --font=Arial --text=../../langdata/eng/eng.training_text
[Fri, Mar 25, 2016 3:08:40 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.iPNkm7FgJu --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=-1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1 --font=Calibri --text=../../langdata/eng/eng.training_text
Rendered page 0 to file /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1.tif
Rendered page 0 to file /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1.tif
Rendered page 0 to file /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1.tif
Rendered page 1 to file /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1.tifRendered page 1 to file /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1.tif

Rtl = 0 ,vertical=0Rtl = 0 ,vertical=0

Rtl = 0 ,vertical=0file /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1.tif
Extracting font properties of Arial
Extracting font properties of Cambria
Extracting font properties of Calibri
[Fri, Mar 25, 2016 3:08:42 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.iPNkm7FgJu --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=-1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1 --font=Cambria --ligatures=false --text=../../langdata/eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32[Fri, Mar 25, 2016 3:08:42 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.iPNkm7FgJu --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=-1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1 --font=Arial --ligatures=false --text=../../langdata/eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32

[Fri, Mar 25, 2016 3:08:42 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.iPNkm7FgJu --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=-1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1 --font=Calibri --ligatures=false --text=../../langdata/eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32
Extracting font properties only
Extracting font properties only
Extracting font properties only
Done!
Done!
Done!
Rendering using Arial
Rendering using Cambria
Rendering using Calibri
[Fri, Mar 25, 2016 3:08:43 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.iPNkm7FgJu --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0 --font=Arial --text=../../langdata/eng/eng.training_text
[Fri, Mar 25, 2016 3:08:44 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.iPNkm7FgJu --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0 --font=Cambria --text=../../langdata/eng/eng.training_text
[Fri, Mar 25, 2016 3:08:44 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.iPNkm7FgJu --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0 --font=Calibri --text=../../langdata/eng/eng.training_text
Rendered page 0 to file /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0.tif
Rendered page 0 to file /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0.tif
Rendered page 0 to file /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0.tif
Rendered page 1 to file /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0.tif
Rtl = 0 ,vertical=0
Extracting font properties of Arial
Rendered page 1 to file /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0.tif
Rtl = 0 ,vertical=0
[Fri, Mar 25, 2016 3:08:46 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.iPNkm7FgJu --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0 --font=Arial --ligatures=false --text=../../langdata/eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32
Extracting font properties of Cambria
Rendered page 1 to file /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0.tif
Rtl = 0 ,vertical=0
[Fri, Mar 25, 2016 3:08:46 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.iPNkm7FgJu --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0 --font=Cambria --ligatures=false --text=../../langdata/eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32
Extracting font properties only
Extracting font properties of Calibri
Extracting font properties only
Done!
[Fri, Mar 25, 2016 3:08:46 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.iPNkm7FgJu --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0 --font=Calibri --ligatures=false --text=../../langdata/eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32
Done!
Extracting font properties only
Done!
Rendering using Arial
Rendering using Cambria
Rendering using Calibri
[Fri, Mar 25, 2016 3:08:47 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.iPNkm7FgJu --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1 --font=Arial --text=../../langdata/eng/eng.training_text
[Fri, Mar 25, 2016 3:08:47 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.iPNkm7FgJu --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp1 --font=Calibri --text=../../langdata/eng/eng.training_text[Fri, Mar 25, 2016 3:08:47 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.iPNkm7FgJu --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1 --font=Cambria --text=../../langdata/eng/eng.training_text

Rendered page 0 to file /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp1.tif
Rendered page 0 to file /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1.tif
Rendered page 0 to file /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1.tif
Rendered page 1 to file /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1.tif
Rendered page 1 to file /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp1.tif
Rtl = 0 ,vertical=0
Rtl = 0 ,vertical=0
Rendered page 1 to file /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1.tif
Rtl = 0 ,vertical=0
Extracting font properties of ArialExtracting font properties of Calibri
Extracting font properties of Cambria
[Fri, Mar 25, 2016 3:08:51 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.iPNkm7FgJu --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1 --font=Cambria --ligatures=false --text=../../langdata/eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32
[Fri, Mar 25, 2016 3:08:51 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.iPNkm7FgJu --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp1 --font=Calibri --ligatures=false --text=../../langdata/eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32[Fri, Mar 25, 2016 3:08:51 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.iPNkm7FgJu --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=1 --outputbase=/tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1 --font=Arial --ligatures=false --text=../../langdata/eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32

Extracting font properties only
Extracting font properties only
Extracting font properties only
Done!
Done!
Done!

=== Phase UP: Generating unicharset and unichar properties files ===
[Fri, Mar 25, 2016 3:08:52 PM] /usr/bin/unicharset_extractor -D /tmp/tmp.xzVHiaORBx/eng/ /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0.box /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1.box /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1.box /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0.box /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp1.box /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1.box /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0.box /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1.box /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1.box
Extracting unicharset from /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0.box
Extracting unicharset from /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1.box
Extracting unicharset from /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1.box
Extracting unicharset from /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0.box
Extracting unicharset from /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp1.box
Extracting unicharset from /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1.box
Extracting unicharset from /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0.box
Extracting unicharset from /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1.box
Extracting unicharset from /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1.box
Wrote unicharset file /tmp/tmp.xzVHiaORBx/eng//unicharset.
[Fri, Mar 25, 2016 3:08:52 PM] /usr/bin/set_unicharset_properties -U /tmp/tmp.xzVHiaORBx/eng/eng.unicharset -O /tmp/tmp.xzVHiaORBx/eng/eng.unicharset -X /tmp/tmp.xzVHiaORBx/eng/eng.xheights --script_dir=../../langdata
Loaded unicharset of size 118 from file /tmp/tmp.xzVHiaORBx/eng/eng.unicharset
Setting unichar properties
Other case É of é is not in unicharset
Other case FF of ff is not in unicharset
Other case TI of ti is not in unicharset
Other case FI of fi is not in unicharset
Other case FT of ft is not in unicharset
Other case Ffi of ffi is not in unicharset
Warning: properties incomplete for index 25 = ~
Writing unicharset to file /tmp/tmp.xzVHiaORBx/eng/eng.unicharset

=== Phase D: Generating Dawg files ===
Generating word Dawg
[Fri, Mar 25, 2016 3:08:53 PM] /usr/bin/wordlist2dawg -r 1 ../../langdata/eng/eng.wordlist /tmp/tmp.xzVHiaORBx/eng/eng.word-dawg /tmp/tmp.xzVHiaORBx/eng/eng.unicharset
Set reverse_policy to RRP_REVERSE_IF_HAS_RTL
Loading unicharset from '/tmp/tmp.xzVHiaORBx/eng/eng.unicharset'
Reading word list from '../../langdata/eng/eng.wordlist'
Reducing Trie to SquishedDawg
Writing squished DAWG to '/tmp/tmp.xzVHiaORBx/eng/eng.word-dawg'
Generating frequent-word Dawg
[Fri, Mar 25, 2016 3:08:58 PM] /usr/bin/wordlist2dawg -r 1 /tmp/tmp.xzVHiaORBx/eng/eng.wordlist.clean.freq /tmp/tmp.xzVHiaORBx/eng/eng.freq-dawg /tmp/tmp.xzVHiaORBx/eng/eng.unicharset
Set reverse_policy to RRP_REVERSE_IF_HAS_RTL
Loading unicharset from '/tmp/tmp.xzVHiaORBx/eng/eng.unicharset'
Reading word list from '/tmp/tmp.xzVHiaORBx/eng/eng.wordlist.clean.freq'
Reducing Trie to SquishedDawg
Writing squished DAWG to '/tmp/tmp.xzVHiaORBx/eng/eng.freq-dawg'
[Fri, Mar 25, 2016 3:08:59 PM] /usr/bin/wordlist2dawg -r 0 ../../langdata/eng/eng.punc /tmp/tmp.xzVHiaORBx/eng/eng.punc-dawg /tmp/tmp.xzVHiaORBx/eng/eng.unicharset
Set reverse_policy to RRP_DO_NO_REVERSE
Loading unicharset from '/tmp/tmp.xzVHiaORBx/eng/eng.unicharset'
Reading word list from '../../langdata/eng/eng.punc'
Reducing Trie to SquishedDawg
Writing squished DAWG to '/tmp/tmp.xzVHiaORBx/eng/eng.punc-dawg'
[Fri, Mar 25, 2016 3:08:59 PM] /usr/bin/wordlist2dawg -r 0 ../../langdata/eng/eng.numbers /tmp/tmp.xzVHiaORBx/eng/eng.number-dawg /tmp/tmp.xzVHiaORBx/eng/eng.unicharset
Set reverse_policy to RRP_DO_NO_REVERSE
Loading unicharset from '/tmp/tmp.xzVHiaORBx/eng/eng.unicharset'
Reading word list from '../../langdata/eng/eng.numbers'
Reducing Trie to SquishedDawg
Writing squished DAWG to '/tmp/tmp.xzVHiaORBx/eng/eng.number-dawg'
[Fri, Mar 25, 2016 3:08:59 PM] /usr/bin/wordlist2dawg -r 1 ../../langdata/eng/eng.word.bigrams /tmp/tmp.xzVHiaORBx/eng/eng.bigram-dawg /tmp/tmp.xzVHiaORBx/eng/eng.unicharset
Set reverse_policy to RRP_REVERSE_IF_HAS_RTL
Loading unicharset from '/tmp/tmp.xzVHiaORBx/eng/eng.unicharset'
Reading word list from '../../langdata/eng/eng.word.bigrams'
Reducing Trie to SquishedDawg
Writing squished DAWG to '/tmp/tmp.xzVHiaORBx/eng/eng.bigram-dawg'

=== Phase E: Extracting features ===
Using TESSDATA_PREFIX=..
[Fri, Mar 25, 2016 3:09:21 PM] /usr/local/bin/tesseract /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1.tif /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1 box.train
[Fri, Mar 25, 2016 3:09:21 PM] /usr/local/bin/tesseract /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1.tif /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1 box.train
[Fri, Mar 25, 2016 3:09:21 PM] /usr/local/bin/tesseract /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1.tif /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1 box.train[Fri, Mar 25, 2016 3:09:21 PM] /usr/local/bin/tesseract /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0.tif /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0 box.train
[Fri, Mar 25, 2016 3:09:21 PM] /usr/local/bin/tesseract /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0.tif /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0 box.train[Fri, Mar 25, 2016 3:09:21 PM] /usr/local/bin/tesseract /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1.tif /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1 box.train[Fri, Mar 25, 2016 3:09:21 PM] /usr/local/bi[Fri, Mar 25, 2016 3:09:21 PM] /usr/local/bin/tesseract /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0.tif /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0 box.train
Tesseract Open Source OCR Engine v3.05.00dev-292-g66f37f0 with Leptonica
Tesseract Open Source OCR Engine v3.05.00dev-292-g66f37f0 with Leptonica
Tesseract Open Source OCR Engine v3.05.00dev-292-g66f37f0 with Leptonica
Page 1
Page 1
Page 1
Tesseract Open Source OCR Engine v3.05.00dev-292-g66f37f0 with Leptonica
Page 1
Tesseract Open Source OCR Engine v3.05.00dev-292-g66f37f0 with Leptonica
Page 1
Tesseract Open Source OCR Engine v3.05.00dev-292-g66f37f0 with Leptonica
Tesseract Open Source OCR Engine v3.05.00dev-292-g66f37f0 with Leptonica
Page 1
Page 1
Tesseract Open Source OCR Engine v3.05.00dev-292-g66f37f0 with Leptonica
Page 1
FAIL!
APPLY_BOXES: boxfile line 74/_ ((137,4565),(166,4569)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 75/_ ((165,4564),(194,4569)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 831/— ((641,3703),(691,3707)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 1096/_ ((1832,3413),(1861,3417)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 74/_ ((137,4565),(166,4569)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 75/_ ((165,4564),(194,4569)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 831/— ((641,3703),(691,3707)): FAILURE! Couldn't find a matching blob
FAIL!APPLY_BOXES:APPLY_BOXES:
APPLY_BOXES: boxfile line 74/_ ((129,4563),(148,4566)): FAILURE! Couldn't find a matching blob
FAIL!xes failed resegmentation:       3
APPLY_BOXES:d from boxfile:    3871
APPLY_BOXES: boxfile line 74/_ ((129,4563),(148,4566)): FAILURE! Couldn't find a matching blob
FAIL!xes read from boxfile:    4131
   Found 4131 good blobs.FAIL!APPLY_BOXES: boxfile line 72/_ ((134,4579),(160,4583)): FAILURE! Couldn't find a matching blobFAIL!
FAIL!xes failed resegmentation:       4
APPLY_BOXES: boxfile line 75/_ ((148,4563),(167,4566)): FAILURE! Couldn't find a matching blob
FAIL!   Found 3867 good blobs./_ ((1662,3506),(1688,3510)): FAILURE! Couldn't find a matching blobAPPLY_BOXES: boxfile line 75/_ ((148,4563),(167,4566)): FAILURE! Couldn't
APPLY_BOXES: boxfile line 1096/_ ((1759,3372),(1778,3375)): FAILURE! Couldn't find a matching blobAPPLY_BOXES: boxfile line 1160/- ((1837,3442),(1850,3446)): FAILURE! Coul
   Boxes read from boxfile:    4131S: boxfile line 831/— ((641,3703),(691,3707)): FAILURE! Couldn't find a matching blob
   Boxes failed resegmentation:       2    3800
APPLY_BOXES: boxfile line 1128/r ((778,3292),(796,3316)): FAILURE! Couldn't find a matching blobAPPLY_BOXES: boxfile line 1172/- ((2029,3345),(2043,3349)): FAILURE! CouldnAPPLY_BOXES:tching blob
   Found 3798 good blobs.ation:       2
FAIL!   Boxes read from boxfile:    3800  Found 4129 good blobs.
   Boxes failed resegmentation:       4APPLY_BOXES: boxfile line 1645/i ((668,2724),(673,2760)): FAILURE! Couldn't find a matching blob
   Found 3796 good blobs.lobs in 0 words.
   Leaving 2 unlabelled blobs in 0 words.
APPLY_BOXES: boxfile line 2927/” ((1047,1250),(1059,1262)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 3225/g ((714,866),(737,903)): FAILURE! Couldn't find a matching blob
APPLY_BOXES:
   Boxes read from boxfile:    3871
   Boxes failed resegmentation:       6
   Found 3865 good blobs.
Generated training data for 947 words
Page 2
Generated training data for 946 words
Page 2
Generated training data for 947 wordsGenerated training data for 1018 words
Page 2
Generated training data for 920 words
Page 2
Generated training data for 929 words
Page 2Generated training data for 1016 words
Page 2ted training data for 1016 words
Page 2
APPLY_BOXES:
   Boxes read from boxfile:    1496
   Found 1496 good blobs.APPLY_BOXES:APPLY_BOXES:
APPLY_BOXES:   Boxes read from boxfile:    1189
   Found 1496 good blobs.
   Boxes read from boxfile:    1496
   Found 1496 good blobs.APPLY_BOXES:
APPLY_BOXES: boxfile line 89/r ((585,4576),(603,4600)): FAILURE! Couldn't find a matching blob
   Found 1566 good blobs.APPLY_BOXES:
   Found 1189 good blobs.
   Boxes read from boxfile:    1566
   Boxes failed resegmentation:       1
   Found 1565 good blobs.
   Leaving 2 unlabelled blobs in 0 words.
Generated training data for 308 words
Generated training data for 381 wordsGenerated training data for 308 words
Generated training data for 308 words
Generated training data for 378 words
Generated training data for 378 words
Generated training data for 393 words
Generated training data for 400 words
[Fri, Mar 25, 2016 3:10:19 PM] /usr/local/bin/tesseract /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1.tif /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1 box.train
Tesseract Open Source OCR Engine v3.05.00dev-292-g66f37f0 with Leptonica
Page 1
FAIL!
APPLY_BOXES: boxfile line 74/_ ((129,4563),(148,4566)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 1096/_ ((1759,3372),(1778,3375)): FAILURE! Couldn't find a matching blob
APPLY_BOXES: boxfile line 1128/r ((778,3292),(796,3316)): FAILURE! Couldn't find a matching blob
APPLY_BOXES:
   Boxes read from boxfile:    3800
   Boxes failed resegmentation:       3
   Found 3797 good blobs.
   Leaving 2 unlabelled blobs in 0 words.
Generated training data for 927 words
Page 2
APPLY_BOXES:
   Boxes read from boxfile:    1566
   Found 1566 good blobs.
Generated training data for 400 words

=== Phase C: Clustering feature prototypes (cnTraining) ===
[Fri, Mar 25, 2016 3:10:33 PM] /usr/bin/cntraining -D /tmp/tmp.xzVHiaORBx/eng/ /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0.tr /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1.tr /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1.tr /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0.tr /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp1.tr /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1.tr /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0.tr /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1.tr /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1.tr
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0.tr ...
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1.tr ...
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1.tr ...
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0.tr ...
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp1.tr ...
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1.tr ...
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0.tr ...
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1.tr ...
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1.tr ...
Clustering ...

Writing /tmp/tmp.xzVHiaORBx/eng//normproto ...

=== Phase M : Clustering microfeatures (mfTraining) ===
[Fri, Mar 25, 2016 3:10:49 PM] /usr/bin/mftraining -D /tmp/tmp.xzVHiaORBx/eng/ -U /tmp/tmp.xzVHiaORBx/eng/eng.unicharset -O /tmp/tmp.xzVHiaORBx/eng/eng.mfunicharset -F ../../langdata/font_properties -X /tmp/tmp.xzVHiaORBx/eng/eng.xheights /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0.tr /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1.tr /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1.tr /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0.tr /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp1.tr /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1.tr /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0.tr /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1.tr /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1.tr
Warning: No shape table file present: /tmp/tmp.xzVHiaORBx/eng//shapetable
fontinfo table is of size 6164
Reading x-heights from /tmp/tmp.xzVHiaORBx/eng/eng.xheights ...
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0.tr ...
Reading spacing from /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp0.fontinfo for font 275...
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1.tr ...
Reading spacing from /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp1.fontinfo for font 275...
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1.tr ...
Reading spacing from /tmp/tmp.xzVHiaORBx/eng/eng.Arial.exp-1.fontinfo for font 275...
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0.tr ...
No font found matching fontinfo filename /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp0.fontinfo
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp1.tr ...
No font found matching fontinfo filename /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp1.fontinfo
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1.tr ...
No font found matching fontinfo filename /tmp/tmp.xzVHiaORBx/eng/eng.Calibri.exp-1.fontinfo
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0.tr ...
No font found matching fontinfo filename /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp0.fontinfo
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1.tr ...
No font found matching fontinfo filename /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp1.fontinfo
Reading /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1.tr ...
No font found matching fontinfo filename /tmp/tmp.xzVHiaORBx/eng/eng.Cambria.exp-1.fontinfo
Flat shape table summary: Number of shapes = 225 max unichars = 1 number with multiple unichars = 0
Warning: no protos/configs for Joined in CreateIntTemplates()
Warning: no protos/configs for |Broken|0|1 in CreateIntTemplates()
Done!

=== Phase B : ambiguities training ===
Found file ../../langdata/eng/eng.unicharambigs

=== Making final traineddata file ===
Copying ../../langdata/eng/eng.cube-unicharset to /tmp/tmp.xzVHiaORBx/eng
Copying ../../langdata/eng/eng.cube-word-dawg to /tmp/tmp.xzVHiaORBx/eng
[Fri, Mar 25, 2016 3:12:02 PM] /usr/bin/combine_tessdata /tmp/tmp.xzVHiaORBx/eng/eng.
TessdataManager combined tesseract data files.
Offset for type  0 (/tmp/tmp.xzVHiaORBx/eng/eng.config                ) is -1
Offset for type  1 (/tmp/tmp.xzVHiaORBx/eng/eng.unicharset            ) is 140
Offset for type  2 (/tmp/tmp.xzVHiaORBx/eng/eng.unicharambigs         ) is 7972
Offset for type  3 (/tmp/tmp.xzVHiaORBx/eng/eng.inttemp               ) is 9030
Offset for type  4 (/tmp/tmp.xzVHiaORBx/eng/eng.pffmtable             ) is 931802
Offset for type  5 (/tmp/tmp.xzVHiaORBx/eng/eng.normproto             ) is 932685
Offset for type  6 (/tmp/tmp.xzVHiaORBx/eng/eng.punc-dawg             ) is 946700
Offset for type  7 (/tmp/tmp.xzVHiaORBx/eng/eng.word-dawg             ) is 951022
Offset for type  8 (/tmp/tmp.xzVHiaORBx/eng/eng.number-dawg           ) is 5162208
Offset for type  9 (/tmp/tmp.xzVHiaORBx/eng/eng.freq-dawg             ) is 5165714
Offset for type 10 (/tmp/tmp.xzVHiaORBx/eng/eng.fixed-length-dawgs    ) is -1
Offset for type 11 (/tmp/tmp.xzVHiaORBx/eng/eng.cube-unicharset       ) is 5167068
Offset for type 12 (/tmp/tmp.xzVHiaORBx/eng/eng.cube-word-dawg        ) is 5168579
Offset for type 13 (/tmp/tmp.xzVHiaORBx/eng/eng.shapetable            ) is 6230685
Offset for type 14 (/tmp/tmp.xzVHiaORBx/eng/eng.bigram-dawg           ) is 6234739
Offset for type 15 (/tmp/tmp.xzVHiaORBx/eng/eng.unambig-dawg          ) is -1
Offset for type 16 (/tmp/tmp.xzVHiaORBx/eng/eng.params-model          ) is -1
Combining tessdata files
Output /tmp/tmp.xzVHiaORBx/eng/eng.traineddata created successfully.
Moving /tmp/tmp.xzVHiaORBx/eng/eng.traineddata to /tmp/tesstrain/tessdata

Completed training for language 'eng'

ra@Shree ~/tesseract-ocr/tesseract/training
$
amitdo commented 8 years ago

Thanks. It's good to know that the training process works on windows with cygwin.