bertsky / ocrd_detectron2

OCR-D wrapper for detectron2 based segmentation models
16 stars 5 forks source link

Created PAGE is not correct #16

Closed stefanCCS closed 1 year ago

stefanCCS commented 1 year ago

Tried out ocrd-detectron2-segment and got a non-correct PAGE. Model used: "DocBank_X101" Preset used: presets_DocBank_X101.json This is the PAGE file: OCR-D-DETECTRON2-DocBank_X101_00001.zip

XSD-Validation-Result using xml-validator-xsd

    Cvc-datatype-valid.1.2.1: 'region0001_TextRegion:paragraph' Is Not A Valid Value For 'NCName'., Line '42', Column '78'.
    Cvc-attribute.3: The Value 'region0001_TextRegion:paragraph' Of Attribute 'id' On Element 'pc:TextRegion' Is Not Valid With Respect To Its Type, 'ID'., Line '42', Column '78'.
    Cvc-datatype-valid.1.2.1: 'region0002_TextRegion:paragraph' Is Not A Valid Value For 'NCName'., Line '45', Column '78'.
    Cvc-attribute.3: The Value 'region0002_TextRegion:paragraph' Of Attribute 'id' On Element 'pc:TextRegion' Is Not Valid With Respect To Its Type, 'ID'., Line '45', Column '78'.
    Cvc-datatype-valid.1.2.1: 'region0003_TextRegion:paragraph' Is Not A Valid Value For 'NCName'., Line '48', Column '78'.
    Cvc-attribute.3: The Value 'region0003_TextRegion:paragraph' Of Attribute 'id' On Element 'pc:TextRegion' Is Not Valid With Respect To Its Type, 'ID'., Line '48', Column '78'.
    Cvc-datatype-valid.1.2.1: 'region0004_TextRegion:paragraph' Is Not A Valid Value For 'NCName'., Line '51', Column '78'.
    Cvc-attribute.3: The Value 'region0004_TextRegion:paragraph' Of Attribute 'id' On Element 'pc:TextRegion' Is Not Valid With Respect To Its Type, 'ID'., Line '51', Column '78'.
stefanCCS commented 1 year ago

Looked a bit by myself: It looks like that the id attribute does not allow :. In my example I have replaced it with _ -> works ok.

kba commented 1 year ago

The categories mapping uses : to separate region class from region @type. This is used as-is for generating the region_id. https://github.com/bertsky/ocrd_detectron2/pull/17 might fix that, haven't tested it yet.

stefanCCS commented 1 year ago

checked #17 --> works fine !

bertsky commented 1 year ago

Sorry, did not see this earlier. @kba was right, but #17 is not how I'd like to do it. Fix is on master.