I noticed that there didn't seem to be a way for node-tesseract to specify whether it should generate output in either HOCR or TSV. This PR provides a new option - format - which can be either "hocr" or "tsv" to specify one output format or the other. If it's absent, it outputs plain text as normal.
Let me know if you'd prefer there to be another way to determine the output format - such as inference from the output file suffix?
I noticed that there didn't seem to be a way for node-tesseract to specify whether it should generate output in either HOCR or TSV. This PR provides a new option -
format
- which can be either "hocr" or "tsv" to specify one output format or the other. If it's absent, it outputs plain text as normal.Let me know if you'd prefer there to be another way to determine the output format - such as inference from the output file suffix?