TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
282 stars 84 forks source link

PureODD may need other forms of datatype restrictions besides regular expressions #1473

Closed raffazizzi closed 8 years ago

raffazizzi commented 8 years ago

References to datatypes with dataRef can be restricted with a regular expression provided with the attribute @restriction.

There are cases in which regular expressions are not really adequate, however.

For example if one wanted to define a datatype for arc degrees:

<rng:data type="decimal">
   <rng:param name="maxInclusive">360.0</rng:param>
   <rng:param name="minInclusive">-360.0</rng:param>
</rng:data>

How could this be defined in PureODD?

lb42 commented 8 years ago

A fair cop. I have no idea, though of course I think you can still define a TEI datatype with RNG content. We just don't like doing it.

hcayless commented 8 years ago

1344 is an example where regexes won't help at all.

hcayless commented 8 years ago

Wouldn't the obvious thing be to define a <dataParam> with @name and @value attributes?

raffazizzi commented 8 years ago

@hcayless perhaps. Though we would need to invalidate @restriction when <dataParam> is included. Schematron + privileging <dataParam> when processing?

Also what parameters would we support? The ones that RNG does? I'm confused because https://www.w3.org/2001/XMLSchema-datatypes lists the datatypes, but not parameters.

hcayless commented 8 years ago

@raffazizzi I believe they're what XSD calls "facets".

raffazizzi commented 8 years ago

@lou any thoughts on the relationship between @restriction and a possible implementation of <dataParam>?

raffazizzi commented 8 years ago

Ok I think the main issue here is that using something like <dataParam> only makes sense in the context of XML Schema datatypes. It would be nonsense for other datatypes references. On the other hand, @restriction, being just a regex pattern, can be applied to any datatype since attribute values are always strings.

So I think the only way to solve this within <dataRef> is by doing the following:

How does this sound? Nod and I'll implement.

Without this, we'll need a separate element, unless someone else can come up with a better idea. Optimistically, I'll keep this in the current milestone.

raffazizzi commented 8 years ago

Also interesting to notice that we have dataNode currently specified in the TEI source, but it's unused. https://github.com/TEIC/TEI/blob/53ec5343f3eeb1b4214dd62d90efe5138cfa37e1/P5/Source/Specs/dataNode.xml

This looks like an early attempt to be able to represent more complex datatypes, but must have been discarded. In its current form it could not work together with dataRef, so implementing dataParam still makes the most sense.

lb42 commented 8 years ago

No, <dataNode> was an early version of what is now (more reasonably) called <dataSpec>. Not relevant. If we want to add this facility, we need to do it by modifying <dataRef> as you suggest. The least disruptive would be to allow it to contain <dataFacet> children (I really dont like dataParam as a name):

<dataSpec ident="arcDegree">
<dataRef name="decimal">
   <dataFacet name="maxInclusive">360.0</dataFacet>
   <dataFacet name="minInclusive">-360.0</dataFacet>
</dataRef>
</dataSpec>

Are the values for the @name attribute here enumerable? presumably they are in the XSD spec somewhere?

The trouble with this is that it gives us two ways of doing the same thing : for some cases, the same thing (e.,g. an integer value less than 100) could be expressed either by a pair of facets or by a regexp. I suppose we have to live with that. Certainly we shouldn't allow specifications that use both mechanisms.

We could also (as I suggested earlier) of course do this by just using the RNG equivalent directly within the dataSpec.

hcayless commented 8 years ago

I suppose they are technically enumerable...different facets apply to different datatypes, so enumerating them only helps you so much, but I'm sure we could generate a list.

raffazizzi commented 8 years ago

The list is short. I'm working on a draft, more later.

raffazizzi commented 8 years ago

Here is a draft implementation, including schematron rules: https://github.com/TEIC/TEI/blob/dataParam/P5/Source/Specs/dataFacet.xml I tested it locally and it seems to do what it needs to. I opted for using @name + @value like @hcayless suggested instead of specifying the value as content. It seemed more ODD-like.

I also updated the XSLTs to process dataParam: https://github.com/TEIC/Stylesheets/blob/dataFacet/odds/odd2relax.xsl#L720 Tested with a mock ODD and got good results. I'm not 100% confident this is all that I need to change in the XSLTs, but I think so. Someone with more knowledge than me should double check.

I was looking for a place where to update the guidelines, but I realized that they currently don't seem to discuss @restriction, please tell me if I'm wrong.

lb42 commented 8 years ago

Nice work! Couple of typos (lenght???) And i think should the examples show a parent datatype but otherwise good to go.

lb42 commented 8 years ago

I corrected the typos in your branch : then I noticed that there;s no schematron rule to enforce the constraint mentioned in remarks. is that on its way?

@restriction is only documented in the tagdoc I think. Your choice as to whether the description goes in http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ST.html#STmacros or http://www.tei-c.org/release/doc/tei-p5-doc/en/html/TD.html#TD-datatypes : I think I'd go for the latter myself.

raffazizzi commented 8 years ago

Thanks for the corrections @lou! I put the schematron rules in dataRef, do you think they'd be better placed in dataFacet?