Closed DiegoPino closed 3 weeks ago
@giancarlobi you might be interested in this one! @alliomeria shared that the offending PDF was built byBluebeam program
used to transform architectural drawings into PDFs. So the PDF might is not standard
Having a config solves the problem here. So closing as solved
What?
This is a multi-issue issue. We found a PDF that when processed through PDFALTO did generate correct OCR but also was throwing thousands on PDF standard syntax errors. Because the output of PDFALTO goes to the console directly (terminal) the resulting XML could not be processed. But here is where the larger issue happened, when Hydroponics was set 0 (means run until finishing) the failure was triggering an eternal re-enqueing (I'm pretty sure I coded 3x max retries) and getting stuck for days trying over and over.
-q
as a PDFALTO argument to the OCR processor via the form but this should be a standard argument to be honestFor reference, the command run manually threw this type of syntax errors (PDF Standard non-compliant issue)