Open vdende opened 2 years ago
Hi @dadoonet , can someone follow up on this? We'll need to send the OCR text to Elasticsearch and store the 'hocr' output. In the documentation of Tesseract I see it is possible by adding 'hocr' at the end of the command: https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html
I answered in https://github.com/dadoonet/fscrawler/discussions/1594
Let me know :)
I answered there as well 😀
Hi @dadoonet. Any news on this topic? Your last remark was:
I think we need to see how Tika supports this option and if something is needed in FSCrawler to enable this.
I have set the
output_type
tohocr
.But where can I find it? I would expect the output to be stored somewhere. I read in the Tesseract documentation it is possible.