allenai / pdffigures2

Given a scholarly PDF, extract figures, tables, captions, and section titles.
http://pdffigures2.allenai.org/
Apache License 2.0
611 stars 122 forks source link

Error Not a PDF file though my input is pdf file #48

Open ammaarahmad1999 opened 2 years ago

ammaarahmad1999 commented 2 years ago

[info] running org.allenai.pdffigures2.FigureExtractorVisualizationCli ./input/Dynamic_Memory_Network_Sochar_QA.pdf Error: File ./input/Dynamic_Memory_Network_Sochar_QA.pdf is not a PDF file figure-extractor-visualize

I am getting the error above. Entire Console Output Below

[info] Loading settings for project pdffigures2-master-build from plugins.sbt ... [info] Loading project definition from /Data/tanik/Multi-modalQA/pdffigures2-master/project [info] Loading settings for project root from build.sbt ... [info] Set current project to pdffigures2 (in build file:/Data/tanik/Multi-modalQA/pdffigures2-master/) [warn] Multiple main classes detected. Run 'show discoveredMainClasses' to see the list [info] running org.allenai.pdffigures2.FigureExtractorVisualizationCli ./input/Dynamic_Memory_Network_Sochar_QA.pdf Error: File ./input/Dynamic_Memory_Network_Sochar_QA.pdf is not a PDF file figure-extractor-visualize Usage: figure-extractor-visualize [options]

input PDF file -s, --show-steps Show all intermediate steps -g, --show-graphic-clustering 0s Show graphical elements found and how they were clustered -x, --show-cleaned-figure-regions Shows figure regions after being post-processed using the rasterized PDF at the given DPI -e, --show-extractions Show the bounding boxes of the text and graphics that were extracted -r, --show-regions Show the different regions the PDF was broken into -c, --show-captions Show the location of the captions -t, --show-sections Show the location of sections and paragraphs -d, --display-dpi DPI to display figures at (default 55) -p, --pages Pages to extract from (defaults to all), 1 is the first page

Exception: sbt.TrapExitSecurityException thrown from the UncaughtExceptionHandler in thread "run-main-0" [error] Nonzero exit code: 1 [error] (Compile / runMain) Nonzero exit code: 1 [error] Total time: 1 s, completed 18-Mar-2022, 12:53:44 PM

Any idea how to resolve the issue?

ranok92 commented 1 year ago

Try passing the absolute path of the file instead.

val2021-svg commented 1 year ago

From what I have seen in my case, the filename should not contain "-" nor "_". You can try renaming the file. Also, as @ranok92 said, you should also try passing the absolute path of the file.