allenai / pdffigures2

Given a scholarly PDF, extract figures, tables, captions, and section titles.
http://pdffigures2.allenai.org/
Apache License 2.0
600 stars 117 forks source link

Page 2 is an image and allow OCR is false, giving up #23

Open vikram-ce opened 5 years ago

vikram-ce commented 5 years ago

10:40:30.718 [main] DEBUG o.a.pdffigures2.GraphicsExtractor$ - Page 2 is an image and allow OCR is false, giving up Exception in thread "main" org.allenai.pdffigures2.FigureExtractor$OcredPdfException: Page 2 is an image and allow OCR is turned off at org.allenai.pdffigures2.GraphicsExtractor$.extractRawGraphics(GraphicsExtractor.scala:66) at org.allenai.pdffigures2.GraphicsExtractor$.extractGraphics(GraphicsExtractor.scala:32) at org.allenai.pdffigures2.FigureExtractor$$anonfun$7.apply(FigureExtractor.scala:133)

Did anyone get this error?

luomancs commented 4 years ago

10:40:30.718 [main] DEBUG o.a.pdffigures2.GraphicsExtractor$ - Page 2 is an image and allow OCR is false, giving up Exception in thread "main" org.allenai.pdffigures2.FigureExtractor$OcredPdfException: Page 2 is an image and allow OCR is turned off at org.allenai.pdffigures2.GraphicsExtractor$.extractRawGraphics(GraphicsExtractor.scala:66) at org.allenai.pdffigures2.GraphicsExtractor$.extractGraphics(GraphicsExtractor.scala:32) at org.allenai.pdffigures2.FigureExtractor$$anonfun$7.apply(FigureExtractor.scala:133)

Did anyone get this error?

have you solve this problem? if so, could you point your solution, I have the same issue, thank you

lucaslioli commented 4 years ago

You can try to enable the OCR option (allowOcr) in file pdffigures2/src/main/resources/application.conf

rdverse commented 2 years ago

There was no application.conf as indicated by lucaioli in resources folder. However, you can change allowOCR = true here - /src/main/scala/org/allenai/pdffigures2/FigureExtractor.scala

lucaslioli commented 2 years ago

The file application.conf has been removed from the project in this merge (after my comment), and the OCR enable configuration has been moved to the file pointed by rdverse. Thanks for your update.