altoxml / schema

ALTO XML schema - latest and all former versions
51 stars 4 forks source link

Restrict float attribute values where possible to allow for better xml-validation. #62

Open jukervin opened 5 years ago

jukervin commented 5 years ago

For example ALTO schema allows negative float values in attributes like WIDTH, HEIGHT, HPOS, VPOS where values should be positive.

Validating against schema doesn't catch documents where software has created nonsensical values.

xsd:float is used in following attributes:

PageType

ParagraphStyle

BlockType

SPType

StringType

PageSpaceType

EllipseType

CircleType A circle shape. HPOS and VPOS describe the center of the circle.

formattingAttributeGroup

HYP

TextLine

GlyphType

jukervin commented 4 years ago

Based on the discussion in the meeting 2020-12-13 the ROTATION can have valid negative values.

jukervin commented 4 years ago

Limiting values to positive will break backward compatibility so this can be changed in ALTO 5.0 release at the earliest.

Ra1phM commented 4 years ago

I agree that limiting WIDTH, HEIGHT, HPOS, VPOS to positive values would make sense. From my understanding, HPOS and VPOS are always in relation to the entire page, so values outside of the page's real dimensions (e.g. (-100, -100)) should not exist.

However, for the ParagraphStyle LEFT and RIGHT indent, I am not sure if positive values should be enforced, because the indent value is relative to the paragraph and even if it is not a good practice, it would still be valid in the same way that negative values (or positions, margins and paddings) in HTML are accepted.

jukervin commented 1 year ago

Maybe XSD 1.1 schema asserts could be used to create validations that rely on other element values: PrintSpace can't be larger than Page etc.

cipriandinu commented 10 months ago

A new branch added (issue-62) to make changes for this topic. There are several things to discuss, since the only option to completely implement this is to use xsd 1.1 (to restrict not only some values to be positive, but also restrictions like height of a block + vpos < page height). On the new branch, there are just fie restrictions on Page level, but there is much more to be done.

One topic to clarify is if we want to go with this solution and enforce validation with xsd 1.1 processors. Or if we implement only simple restrictions (positive values, no relative restrictions)? If we go for full solution, probably when switching to 5.0 would be a good moment