OCR4all / LAREX

A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.
MIT License
179 stars 33 forks source link

allow referencing by page and book in the browser (URI) #245

Open bertsky opened 3 years ago

bertsky commented 3 years ago

It would be extremely helpful if each individual loaded page could be directly navigated to (and thus persisted and shared) via query strings in the URL path. That link could then be embedded as anchor text/button somewhere in the top toolbar or left navigation pane. The latter currently already contains anchors, but only for the images themselves (images/books/...). It could even be embedded into the saved PAGE XML (perhaps somewhere under Metadata).

Jim-Salmons commented 3 years ago

@bertsky, thank you for suggesting this enhancement. You may be interested in my and Timlynn Babitsky’s #DATeCH2017 poster that addresses this issue within the #MAGAZINEgts ground truth storage format. At the Internet Archive, image IDs are called ‘leafs’ so we include a ‘leaf2ppg’, leaf to print page number map, in the Metadata branch of this evolving format which uses a metamodel subgraph design pattern to support an integrated complex document structure and content depiction model. Such a mapping for digitized document collections is especially important for newspaper and magazine serial publications as try are very intentionally nonlinear in their issue wide layout design. A link to an example of this ground truth storage format can be found on the About page of the Softalk magazine collection at the Internet Archive: https://archive.org/details/softalkapple?tab=about. SalmonsBabitsky_FactMinersSoftalk_poster

bertsky commented 3 years ago

Thanks @Jim-Salmons for sharing that related info on wider image-vs-page representation in more complex documents! (I do think GT tools like LAREX are the place to start implementing meta-models, because otherwise it will stay a hen-egg problem – correct me if I'm wrong.)

But this issue is only scratching on the surface of the representation side, this is more about a UI feature first – being able to come back to a specific page directly, and share hyperlinks to it.