Closed GoogleCodeExporter closed 9 years ago
Message is clear: No shape table file present: shapetable
So shapeclustering did not create needed file
You uploaded file newlang.jbcgulliver.exp0.tr, but you wrote you run command:
shapeclustering -F font_properties -U unicharset datamouse.jbcgulliver.exp0.tr
"Font id = -1/0" indicates that your font_properties is not correct. Check
Requirements_for_text_input_files[1] once again.
[1]
https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Requirements_for
_text_input_files
Original comment by zde...@gmail.com
on 13 Feb 2014 at 9:18
I repeated the process again, double-checking the file requirements and my
command inputs, but I am still receiving the same error.
1) tesseract datamouse.jbcgulliver.exp0.png datamouse.jbcgulliver.exp0
batch.nochop makebox
2) tesseract datamouse.jbcgulliver.exp0.png datamouse.jbcgulliver.exp0 box.train
output:
APPLY_BOXES:
Boxes read from boxfile: 2694
Found 2694 good blobs.
TRAINING ... Font name = jbcgulliver
Generated training data for 639 words
3) unicharset_extractor datamouse.jbcgulliver.exp0.box
output:
Wrote unicharset file ./unicharset.
4) shapeclustering -F font_properties -U unicharset
datamouse.jbcgulliver.exp0.tr
output:
Reading datamouse.jbcgulliver.exp0.tr ...
Font id = -1/0, class id = 1/76 on sample 0
font_id >= 0 && font_id < font_id_map_.SparseSize():Error:Assert failed:in file
trainingsampleset.cpp, line 622
Abort trap: 6
Please see attached files
Original comment by jstak...@gmail.com
on 13 Feb 2014 at 5:25
Attachments:
I just discovered the issue. There appears to be an error in the documentation?
For me to get the files working the names MUST be of the format
[lang].[fontname].exp[num].tr
and the name in the font_properties file MUST be JUST the font name.
I was confused because in the documentation it states that
:
The name of the .tr file may be either fontname.tr or
[lang].[fontname].exp[num].tr and fontname.tr did not appear to work for me.
But when I switched the format back to [lang].[fontname].exp[num].tr I followed
the line in the documentation that "each .tr filename must match an entry in
the font_properties file" so in this case it would have been
[lang].[fontname].exp[num] in font_properties, but that did not work either.
In summary, the only formatting that worked for me was:
[lang].[fontname].exp[num].tr (not [fontname].tr !)
fontname (not filename!) in font_properties
Original comment by jstak...@gmail.com
on 13 Feb 2014 at 5:38
Original issue reported on code.google.com by
jstak...@gmail.com
on 13 Feb 2014 at 2:58Attachments: