OpenGridForum / DFDL

10 stars 1 forks source link

Simple type used as prefixLengthType with assert or discriminator #29

Closed mbeckerle closed 1 year ago

mbeckerle commented 2 years ago

We want to put facet constraints on the numeric value of a prefixLengthType.

This allows superior diagnostics on bad data where the value of a prefix length field is corrupted, or one is simply backtracking through alternatives, and the alternative fails due to the prefix length not making sense.

A diagnostic to the effect of "prefix length value is not facet valid with respect to maxInclusive 2000." is superior to something like "3928881818 bytes were needed but only 1229 were available".

So the idea is one could put an assert like this on a xs:simpleType that is used as a dfdl:prefixLengthType:

<dfdl:assert>{ $someControlingVar or dfdl:checkConstraints(.) }</dfdl:assert>

So that range checks would apply to a prefix length value before it is used as a length to retrieve/pull data for parsing.

Sub issues: what path expressions are allowed? Possibly only "." because anything else could be bound to a variable.

The argument for "." only is that the prefix is not an element attached to the infoset; hence ".." isn't necessarily meaningful.

Absolute paths could however be allowed, but don't need to be as they can be bound to a variable.

We don't have a use case for other than "." as the path expression. Where the "." refers to the value of the prefix length "quasi-element" itself. This prefix length value is the only thing you don't have a way to reference in an expression.

One alternative notion is a DFDL function like e.g, dfdl:prefixLength() that returns the value of the prefix length.

Another sub issue is whether the value of "." is before, or after the value is adjusted for the dfdl:prefixIncludesPrefixLength='yes' property. Since the schema author does not have access to the length of the simpleType used for the prefix itself, that suggests the adjustment subtracting away that length, so that "." is the post-adjustment value is the correct choice.

Another possible annotation on the simpleType that is the prefixLengthType would be setVariable: <dfdl:setVariable ref="ex:PrefixLen">{ . }</dfdl:setVariable> This would allow capturing the prefixLength's value for use elsewhere in the schema.

mbeckerle commented 2 years ago

I am creating a separate document in the working-drafts folder for iterative refinement of the suggested errata language for this issue. This will be a ".md" text file that will allow us to use normal code-review type comments to refine and review it.

mbeckerle commented 2 years ago

See PR https://github.com/OpenGridForum/DFDL/pull/33

mbeckerle commented 1 year ago

Attached is a zip of two files.

One is a schema with a simple prefix-length situation with pattern assert, setVariable, and regular assert annotations on the prefix-prefix length type, the prefix-length type, and the element's type.

I numbered them with <!-- Eval N --> to indicate evaluation order of the annotations.

The other file is a TDML test file showing what I think this schema does with a small piece of sample data designed to illustrate what all happens, what the variables capture, etc. PrefixedLengthsWithAnnotations.zip

smhdfdl commented 1 year ago

In passing, identified an undocumented corner case; it is a Processing Error if a prefix-prefix-length type evaluates to 0.

mbeckerle commented 1 year ago

The whole prefix-prefix thing comes from an ASN.1 use case. We should find out what ASN.1 does with prefix-prefix if the prefix-prefix value is 0.

smhdfdl commented 1 year ago

Agreed on 23-05-11 WG call to adopt a simpler approach whereby any facets on a simple type that is used as a prefix-length type get acted upon when parsing & unparsing and if violated cause a Processing Error. This becomes the normal behaviour, not under property control. Implementations may choose to provide their own switch if the change is problematic to their users. Noted that the DFDL schemas for ISO8583 use prefix-length types, but without facets so no behaviour change.

mbeckerle commented 1 year ago

The ASN.1 BER/DER rules for nested length work in such a way that there is no possible way that the length field itself can be zero length. The length is either 1 byte holding length 0 to 127, or if the MSB is set, then the remaining bits give the length of the length field itself. A byte of 0x80 would mean that the length field itself is length 0, but this value is instead reserved to indicate that a different indefinite length representation is being used.

So there is no way ASN.1 can have the situation where the length of a length is 0. At least as far as BER/DER are concerned.

I have not investigated whether ASN.1 ECN can express a different length-encoding scheme where the length of the length can end up being zero or what would happen in that case.

I think it is prudent for us just to make the conservative decision that when a prefixLength itself has a prefixLengthType, and the length of the length is 0, that it is a parse error, and the minimum length (of a length) when unparsing is 1, not zero.

mbeckerle commented 1 year ago

Suggested language changes for the spec:

Section 12.3.4 dfdl:lengthKind 'prefixed'

In the definition of dfdl:prefixLengthType, add this after the bullets about Schema Definition Errors.

If the xs:simpleType is constrained by facets (minInclusive, maxInclusive, minExclusive, maxExclusive) these constraints are always checked, both when parsing and unparsing. It is a Processing Error if the value of the xs:simpleType does not conform to the facet constraints. It is a Processing Error if the value of the xs:simpleType is less than zero.

Section 12.3.4.1 Nested Prefix Lengths

Insert the additional sentence

It is a Processing Error if the nested prefix length value is less than 1.

mbeckerle commented 1 year ago

@smhdfdl I have edited a new version of the DFDL spec with all the agreed language errata and all the minor/typo items to date.

This is the only remaining issue tracker that I believe we wanted to get into this next version. So if you can take a look at the proposed language in the comments above and provide any further suggestions, I can then prepare the revised spec document.

smhdfdl commented 1 year ago

The section 12.3.4 property changes look fine.

I don't think section 12.3.4.1 reads well at the moment. I think it should be refactored slightly, see what you think.

"It is possible for a prefix length, as specified by dfdl:prefixLengthType, to itself have a prefix length. That is, an element can have a prefix length, which defines a PrefixLength region (see Section 9.2 DFDL Data Syntax Grammar), and the PrefixLength region can itself have a type which specifies a prefix length, thereby defining a PrefixPrefixLength region.

It is a Schema Definition Error if the type associated with the PrefixPrefixLength is the same as the type associated with the PrefixLength. It is a Schema Definition Error if a PrefixPrefixLength region has a type which specifies a prefix length, that is, nesting of prefix lengths must not exceed a depth of 1. It is a Processing Error if a PrefixPrefixLength region has zero length. "

mbeckerle commented 1 year ago

Do we also need to take a position on whether statement annotations are expressed on prefixLengthType are evaluated, and when? I think since the question arises of what "." means in any statement annotation, we should just disallow any statement annotations that involve expressions certainly.

But assert/discriminator with testKind 'pattern' do not involve an expression.

I think we should simply disallow statement annotations on a simpleType that is used as a prefixLengthType.

This would require adding one additional bullet to the description of the prefixLengthType property in section 12.3.4 in the list of bullets saying what is NOT allowed on the simpleType used as a prefixLengthType.

mbeckerle commented 1 year ago

The assert, discriminator, or setVariable restriction also applies to any base type thereof.

mbeckerle commented 1 year ago

Fixed by https://github.com/OpenGridForum/DFDL/pull/45