Open hakankaraoguz opened 4 months ago
Hi @hakankaraoguz
Can you please share the code you're trying and more details about the OCR metadata you want to get?
Hi @christinestraub
According to documentation if auto
strategy is used , there is no indicator in the element metadata when unstructured falls back to OCR strategy. However here I can see that OCR confidence is extracted in pytesseract
. I would like to have the OCR confidence information present along with a strategy flag in the element metadata so that I can filter out low quality text after parsing stage.
Any updates on this?
@hakankaraoguz Did you try with hi_res
strategy? Is the detection_class_prob
metadata field not working for your case?
I will try it out but according to this Article detection_class_prob
is about the class confidence of the extracted section (Table, Header etc) in the PDF. I am more interested in having the OCR quality result if the algorithm falls back to OCR. Thank you @christinestraub
Hi,
When using auto partitioning to partition pdfs, is it possible to get ocr metadata (quality, used or not etc) when pdf parser falls back to ocr strategy?