desmondmorris / node-tesseract

A simple wrapper for the Tesseract OCR package
Other
676 stars 118 forks source link

Add format selection (HOCR/TSV) #33

Open rtrvrtg opened 8 years ago

rtrvrtg commented 8 years ago

I noticed that there didn't seem to be a way for node-tesseract to specify whether it should generate output in either HOCR or TSV. This PR provides a new option - format - which can be either "hocr" or "tsv" to specify one output format or the other. If it's absent, it outputs plain text as normal.

Let me know if you'd prefer there to be another way to determine the output format - such as inference from the output file suffix?