PRImA-Research-Lab / prima-page-viewer

Java based viewer for PAGE XML files (layout + text content). Also supports ALTO XML, FineReader XML, and HOCR.
Apache License 2.0
34 stars 9 forks source link

install and use in WSL or Windows 10? #20

Closed SB2020-eye closed 3 years ago

SB2020-eye commented 3 years ago

Hi. I am new to coding in 2020 and would like to use prima-page-viewer to see .xml files, in order to view .xml results I have obtained using OCR-D.

I have OCR-D running on my Windows 10 laptop, utilizing WSL2 and the Ubuntu 18.04 app from the Windows Store.

Would someone be able to guide me how to use prima-page-viewer to view my .xml files in this set-up? I tried downloading the repo and also PageViewer 1.4.07 into my ubuntu root a couple of times (and, if I remember correctly, I followed some brief instructions in a README like running a make and/or .sh file). I also added Java 11 to Ubuntu after some failures. After that, when I tried to open the file, I got

Exception in thread "main" org.eclipse.swt.SWTError: No more handles [gtk_init_check() failed]
        at org.eclipse.swt.SWT.error(Unknown Source)
        at org.eclipse.swt.widgets.Display.createDisplay(Unknown Source)
        at org.eclipse.swt.widgets.Display.create(Unknown Source)
        at org.eclipse.swt.graphics.Device.<init>(Unknown Source)
        at org.eclipse.swt.widgets.Display.<init>(Unknown Source)
        at org.eclipse.swt.widgets.Display.<init>(Unknown Source)
        at org.primaresearch.page.viewer.PageViewer.main(PageViewer.java:63)

Someone on the OCR-D/Lobby in gitter said, "I don't think you can use the Gtk/SWT bridge from the WSL layer, because AFAIK that's non-X11. You have to install Java in Windows itself and run the .cmd script natively."

After looking around how to do this, I'm not sure how.

I would be very grateful if someone could please guide me (if indeed it's possible) to either:

  1. set page-viewer up in WSL/Ubuntu given my set-up, if perhaps it is, after all, possible. or
  2. set it up in Windows 10. (Then I believe, hopefully, I can just open the same .xml file on the Windows side.)

PS If on the Windows 10 side, I think I can handle installing Java, as far as that goes. :)

bertsky commented 3 years ago

Hi again,

2. set it up in Windows 10. (Then I believe, hopefully, I can just open the same .xml file on the Windows side.)

if you downloaded the release archive (zip file), there's a Windows subdirectory which contains Start Page Viewer.bat. You just need to double-click that from explorer. You can then click the Open button and navigate to your PAGE-XML file. (It will also require you to navigate to the matching image file after that, because PageViewer expects relative path names in PAGE-XML's /Page/@imageFilename to be relative to the PAGE-XML file, whereas OCR-D uses another convention, where the METS file is always the point of reference.)

If you are on a Windows command line, you can also call that .bat file with the files to open as arguments, see Open example.bat.

(And yes, you will need to install Java 8 or higher natively on Windows.)

SB2020-eye commented 3 years ago

Super-helpful, @bertsky , once again! I have it working.

Jim-Salmons commented 3 years ago

Hey hi @SB2020-eye & @bertsky! Fancy meeting you here... 🤗 Hi, too, to @chris1010010 & Apostolos! (Christian, let Apostolos know I have been giving you good folks a rest from my relentless mentoring inquiries to give him space to cope with Salford’s collegial political demands he mentioned a few years back. But the “game’s afoot” and I will be getting in touch with you both soon.)

Per this thread to @SB2020-eye I will chime in as a Windows-based researcher that the best full featured and well supported XML IDE/editor is OxygenXML (https://www.oxygenxml.com) which has a dramatically cheap educator/researcher discount price that is a true bargain. I bit the bullet years ago for the SMP license and it’s annual renewal is dirt cheap. It supports many OS platforms including a native Windows version. I depend on it to generate the evolving schema for the MAGAZINEgts format.