bertsky / workflow-configuration

a makefilization for OCR-D workflows, with configuration examples
Apache License 2.0
9 stars 4 forks source link

Produces mets:file/ID that are not a valid xs:ID with certain configurations #18

Closed kba closed 4 years ago

kba commented 4 years ago

In particular, + and / are not allowed in mets:file/@ID.

bertsky commented 4 years ago

What part of this repo do you mean in particular? The workflows makefilization itself, or ocrd-import?

kba commented 4 years ago

The makefilization, which can lead to fileGrps that contain e.g. Fraktur+Latin or file IDs concatentating fileGrp and ID with /. For example, https://github.com/OCR-D/assets/tree/master/data/kant_aufklaerung_1784-complex/data

bertsky commented 4 years ago

The makefilization, which can lead to fileGrps that contain e.g. Fraktur+Latin

ah, I see, you mean the configuration examples. Yes, now that I know + is forbidden, I should rename these target fileGrps. (But I don't think the makefilization itself should do anything to check output fileGrps, just as ocrd process or the single-CLI decorator don't.)

or file IDs concatentating fileGrp and ID with /.

oh, how is that? You surely mean there are certain processors which are behaving that way, not the makefilization, right?

kba commented 4 years ago

oh, how is that? You surely mean there are certain processors which are behaving that way, not the makefilization, right?

tbh, I was just noting the issue with the "complex" sample while fixing it in OCR-D/assets. I'll check what exactly went awry with that particular workflow next week.

bertsky commented 4 years ago

ah, I see, you mean the configuration examples. Yes, now that I know + is forbidden, I should rename these target fileGrps.

see https://github.com/bertsky/workflow-configuration/commit/7edcb9054778c418a484119854d3343066ef7f85