Closed onixterry closed 3 years ago
Those warnings are indeed harmless for extracting the text but can be annoying for sure. To get rid of them, you can probably change the log level to ERROR.
Locate your log4j.properties file and add this line somewhere:
log4j.logger.org.apache.fontbox=ERROR
Or a bit broader:
log4j.logger.org.apache=ERROR
If those already exist, do not duplicate them, change the existing log level instead.
I have a collector set up to crawl an Intranet site with many PDFs. There are many cases of errors reading embedded fonts in the PDFs.
Is there anything that can be done to prevent the parsing process from throwing an exception here? i.e. a directive to ignore embedded fonts? Getting rid of these errors will make it easier to focus on any "real" errors in the logs.
The issue does not appear to cause any 'real' problem for the collector.
Terry