Closed chrox closed 10 years ago
Nice work! I think this patch could be further improved by publishing Tesseract variables as JavaScript attributes (automatic type conversion, enumeration of all variables in Tesseract
objects), like this:
tesseract.tessedit_make_boxes_from_boxes = 1;
// or using some magic:
tesseract.tesseditMakeBoxesFromBoxes = true;
tesseract.findText('hocr', 0);
Do you like this idea?
It will be really cool to enumerate Tesseract
variables and convert them automatically to JavaScript attributes. The only way I can find to list available variables is calling Tesseract::PrintVariables
API which requires a file stream(C FILE pointer) to receive variables dump. I'm wondering if it's OK to stream the dump to a tmp file and read back and parse the variables.
Streaming and Parsing is way too complicated - GlobalParams()
can be used for this ( see: baseapi.cpp:155 ).
Now all tesseract variables including global variables and member variables of Tesseract
class are automatically converted to attributes in JavaScript object.
Thanks a lot! I've applied it with some changes for MSVC.
In this way we can generate hOCR file without recognition.