facebookresearch / nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents
https://facebookresearch.github.io/nougat/
MIT License
8.81k stars 560 forks source link

Error in generating binary.jar file from PDF-FIGURES2 #172

Open hmhm1190 opened 10 months ago

hmhm1190 commented 10 months ago

I am trying to generate dataset. I completed all the three steps under dataset generation, but it gives error as:

You need to configure the path to the pdffigures2 executable in this file (nougat/dataset/pdffigures.py) or set the environment variable 'PDFFIGURES_PATH'.
Namespace(dpi=96, figure=PosixPath('/home/patidarritesh/Nougat/nougat/path/figures'), html=PosixPath('/home/patidarritesh/Nougat/nougat/scr_html'), markdown=None, out=PosixPath('/home/patidarritesh/Nougat/nougat/path/paired'), pdfs=PosixPath('/home/patidarritesh/Nougat/nougat/Pdfs'), recompute=False, tesseract=False, timeout=120, workers=48)
  0%|                                                                                                                        | 0/4 [00:00<?, ?it/s]INFO:root:2301.00001 is faulty
INFO:root:2301.00002 is faulty
 50%|████████████████████████████████████████████████████████                                                        | 2/4 [00:00<00:00, 16.73it/s]INFO:root:2301.00005 is faulty
INFO:root:2301.00003 is faulty
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 33.25it/s]

I encountered following error in binary.jar generation, so generated jar file may be corrupted because: Upon checking, every other command is also giving this error other than SBT ASSEMBLY

The command(and all other commands too) sbt "runMain org.allenai.pdffigures2.FigureExtractorVisualizationCli /path/to/pdf" gives error and I believe that the binary.jar file that I am using may be corrupted due to these errors:

[info] Updated file /home/husainmalwat/nougat/nougat/project/build.properties: set sbt.version to 1.4.9
[info] welcome to sbt 1.4.9 (Red Hat, Inc. Java 1.8.0_382)
[info] loading project definition from /home/husainmalwat/nougat/nougat/project
[info] set current project to nougat (in build file:/home/husainmalwat/nougat/nougat/)
[info] running org.allenai.pdffigures2.FigureExtractorVisualizationCli /home/husainmalwat/src_pdf
[error] (run-main-0) java.lang.ClassNotFoundException: org.allenai.pdffigures2.FigureExtractorVisualizationCli
[error] java.lang.ClassNotFoundException: org.allenai.pdffigures2.FigureExtractorVisualizationCli
[error]         at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
[error] stack trace is suppressed; run last Compile / bgRunMain for the full output
[error] Nonzero exit code: 1
[error] (Compile / runMain) Nonzero exit code: 1
[error] Total time: 1 s, completed 11 Nov, 2023 1:04:59 AM
CHENG-EMMA1 commented 6 months ago

Hi, I would like to ask if you solved the problem, and then the markdown of the data you generated has tables and images? It seems like they are all omitted?

hmhm1190 commented 6 months ago

Hii Thanks for replying I had errors in setup so I was not getting tables...

Now it's working fine.

Regards and Thanks Husain Malwat

On Thu, Mar 14, 2024, 9:46 AM Cheng Hiuyi @.***> wrote:

Hi, I would like to ask if you solved the problem, and then the markdown of the data you generated has tables and images? It seems like they are all omitted?

— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/nougat/issues/172#issuecomment-1996370630, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZOPDDKVH2DVQJXD4MUI5ATYYEQCHAVCNFSM6AAAAAA7GT465WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJWGM3TANRTGA . You are receiving this because you authored the thread.Message ID: @.***>

CHENG-EMMA1 commented 6 months ago

Thanks for your reply. The data I converted, if it contains tables, the .mmd file doesn't seem to contain the table information at the end. Do you have the same problem? 1710391597859