Open bertsky opened 1 year ago
@bertsky in https://github.com/OCR-D/core/pull/966#pullrequestreview-1261544355 (posting here so does not get lost when resolving that discussion):
Moreover, what about MODS queries? ATM it's only a minor use-case (
ocrd-segment-extract-lines
wants to know themods:recordIdentifier
). But IIUC this will be the only way processors can query meta-data (whether passed from manual input or previous processors). So IMO we must (at some point, not necessarily right now) provide some OcrdMods and wrap that object via HTTP as well, e.g. inOcrdMets
:@property def mods(self): return parsexml(...)
and then wrapping a
/mods
entry point inOcrdMetsServer
and then inClientSideOcrdMets
:@property def mods(self): r = self.session.request('GET', f'{self.url}/mods') return r.json()
Yes, and an OcrdMods would also be needed if we were to extend #698 (automatic inheritance in OcrdPage hierarchy) with the document-wide lang/script features.
Yes, and an OcrdMods would also be needed if we were to extend #698 (automatic inheritance in OcrdPage hierarchy) with the document-wide lang/script features.
However, this could also be achieved via a dedicated (specialised) processor (which merely fills page-level lang/script from the MODS)...
Valuable functionality that could be reused for OcrdMods can also be found in:
For processors consuming MODS metadata, it would help (as in: easier and more efficient code) being able to use the Python object model. For example, querying
language
orscript
by XPath is painful.The interface could be something like
ocrd_mets.OcrdMets.dmdSec
(as adict
of IDs toocrd_mods.OcrdMods
instances).Remotely related: #783