Closed averbraeck closed 11 months ago
In terms of parsing, Xerces-J 2.12.2 implements XML Schema 1.1 with XPath 2.0. This means that we can parse XML-files according to an XML Schema version 1.1. See https://xerces.apache.org/xerces2-j/releases.html. However, the following important ingredients are missing:
Uptake in other projects and languages is also slow or non-existent. .NET does not support XSD 1.1, see https://stackoverflow.com/questions/61293382/xsd-1-1-validation-for-both-java-and-net-c. C++ has no (free) support, only commercial external libraries, see https://stackoverflow.com/questions/13057249/c-implementation-of-xml-schema-xsd-1-1. In Python it has become possible since 2022: https://stackoverflow.com/questions/19809141/is-it-possible-to-validate-an-xml-file-against-xsd-1-1-in-python. So, uptake of this 2009 standard is extremely slow, with a few recent updates, and most libraries are commercial.
This points into the direction of NOT using XML Schema 1.1 at the moment.
Can we accomplish what we want using the XSD Schema standard version 1.0? Up to a certain extent, this would be possible, but it would restrict the user a little bit (but, IMHO, not in a bad way). Let's break down very precisely what we try to accomplish in our XML files, for which XML Schema 1.1 might be a solution.
6.28 rad
, we can now specify {2*PI() [rad]}
. Using a xsd:union
tag, this is easily solved within the XSD specification. {maxspeed}
can be defined with a certain value, and {maxspeed}
can subsequently be used as the value of a field, or within an expression such as {maxspeed + 10.0[km/h]}
, or {1.1 * maxspeed}
. (The djutils evaluator knows that if a name is followed by (
, it is a function or constant; if not, it is a variable name). The XSD does not define what an expression should look like, it rather indicates that a value is either a string without curly braces, indicating a constant value rather than an expression, or a string starting and ending with curly braces and no curly braces inside, indicating an expression that needs to be evaluated.xsd:key
is expected. This is the one causing the problem -- the key cannot be validated against either the defined keys or the expressions, since the expressions are not in the key-list, and can contain any information. Let's zoom in on the last case.
{startNodeId}
. key
value that is checked against definitions using a keyref
, we need to explore the combination of two lists rather than one: the value of the field should either be in the defined keys (for the example, the list of defined nodes), or in the list of defined variables.xsd:key
can be defined from multiple XPath strings that are combined into one list, we can combine the names of the id's that can be chosen and the defined variables.keyref
field.Based on the above restrictions: (1) no expressions for keyrefs which sounds totally reasonable, and (2) define the variables in XML using curly braces, which is, in my opinion, not an issue, the validation of key/keyref combinations with expressions can be addressed using XML Schema 1.0. Even better: the expressions are really checked, and a validation error is given when the defined variable between curly braces does not exist. Pick lists will show both the defined keys (e.g., nodes) and a list of the defined variables in the scenario as an extra benefit.
Below, the corresponding XSD definitions will be shown.
Suppose we have a simple xsd-file as an example that defines two generic simple types:
VariableType
as a string that starts and ends with a curly brace; inside the curly braces it starts with a letter and has no further braces, curly braces, spaces, or other difficult characters inside; IdType
that starts with a letter and has no further braces, curly braces, spaces, or other difficult characters inside. <xsd:simpleType name="VariableType">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\{[A-Za-z][A-Za-z0-9_\-\.%!@#\^]*\}"></xsd:pattern>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="IdType">
<xsd:restriction base="xsd:string">
<xsd:pattern value="[A-Za-z][A-Za-z0-9_\-\.%!@#\^]+"></xsd:pattern>
</xsd:restriction>
</xsd:simpleType>
Suppose we define a NodeType
as an id or a variable
<xsd:simpleType name="NodeType">
<xsd:union memberTypes="test:IdType test:VariableType" />
</xsd:simpleType>
Now, we define a Variable
, Node
, and Link
with references to a start node and end node:
<xsd:element name="Variable">
<xsd:complexType>
<xsd:attribute name="Id" type="test:VariableType" use="required" />
</xsd:complexType>
</xsd:element>
<xsd:element name="Node">
<xsd:complexType>
<xsd:attribute name="Id" type="test:IdType" use="required" />
</xsd:complexType>
</xsd:element>
<xsd:element name="Link">
<xsd:complexType>
<xsd:attribute name="Id" type="test:IdType" use="required" />
<xsd:attribute name="NodeStart" type="test:NodeType" use="required" />
<xsd:attribute name="NodeEnd" type="test:NodeType" use="required" />
</xsd:complexType>
</xsd:element>
and finally a Network
consisting of variables, nodes and links:
<xsd:element name="Network">
<xsd:complexType>
<xsd:sequence>
<xsd:choice minOccurs="0" maxOccurs="unbounded">
<xsd:element ref="test:Node" minOccurs="0" maxOccurs="1" />
<xsd:element ref="test:Link" minOccurs="0" maxOccurs="1" />
<xsd:element ref="test:Variable" minOccurs="0" maxOccurs="1" />
</xsd:choice>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
The next post will show how to define the keys and keyrefs.
The xsd:key
and xsd:keyref
entries to enforce that the NodeStart
and NodeEnd
tags in the Link
either contain a node id (no curly braces), or a variable name between curly braces, can be defined as follows:
<xsd:key name="nodeKey">
<xsd:selector xpath=".//test:Network/test:Node|.//test:Network/test:Variable" />
<xsd:field xpath="@Id" />
</xsd:key>
<xsd:keyref name="linkNodeStartNodeIdRef" refer="test:nodeKey">
<xsd:selector xpath=".//test:Network/test:Link" />
<xsd:field xpath="@NodeStart" />
</xsd:keyref>
<xsd:keyref name="linkNodeEndNodeIdRef" refer="test:nodeKey">
<xsd:selector xpath=".//test:Network/test:Link" />
<xsd:field xpath="@NodeEnd" />
</xsd:keyref>
The vertical bar (or-operator) indicates that the key for a node is either a node id, or a variable id.
This works nicely. When the variable special-node
is defined:
<test:Variable Id="{special-node}" />
and regular nodes are defined, e.g.,
<test:Node Id="TREC" />
then we can validly define a link as follows:
<test:Link Id="ECSC" NodeStart="TREC" NodeEnd="{special-node}" />
This validates correctly. Any change in either the node name or the variable name will render the XML invalid.
This has been tested with the Eclipse editors, and it works flawlessly. Parsing with Xerces or JAXB/XJC will work fine as well.
Great, this seems like a solid solution. An open question is how we define the input parameter types. If "NAME" is some valid variable name, the input parameters that will be used as ID replacement, will themselves have an ID of "{NAME}".
In both of these cases, the braces are not strictly required. If either is answered with yes, then "NAME" needs to be a valid variable name in the expression editor. For consistency and clarity I'd say 'yes' to both questions.
The above is topic of issue #83.
Currently, Eclipse does not seem to support XML Schema 1.1. We might need this to parse a string that can either be an expression, or a reference to an existing element or a constant. The
xsd:keyref
andxsd:key
elements do not seem to be able to handle this diversity using XML Schema 1.0 -- but this is to be researched as well.This leads to two solution paths: