Restrict Id names in OTS definitions

averbraeck / opentrafficsim

Open Source Multi-Level Traffic Simulator

BSD 3-Clause "New" or "Revised" License

28 stars 8 forks source link

Restrict Id names in OTS definitions #83

Closed averbraeck closed 11 months ago

averbraeck commented 12 months ago

Right now, a name for an Id field of, e.g., a Node can be {{{{ or {PI()} which will cause major problems later when a reference to this Id is made in, e.g., the definition of a Link. The current definition for any Id in the OTS XSD's is:

  <xsd:attribute name="Id" type="xsd:string" use="required" />

The same holds for a variable name that is used as an input name for an expression variable in a scenario.

We have to see what characters to include. I would like any variable to start with a letter, and avoid:

any type of brace in the name: braces are used in the expression editor to denote functions, so an Id like PI() would be very confusing;
any type of straight brackets: straight brackets are used to denote units, so an Id like [km/h] would be very confusing;
any type of curly braces: curly braces are used to denote variables and expressions, so an Id like {}}} would be very confusing and would lead immediately to parsing errors if used as a keyref;
spaces in the names; this can cause confusion when using the names in an expression;
special characters such as tabs, newlines, etc, which are all allowed right now;
zero-length strings for an Id, which are okay right now.

Definition of an IdType could be something like:

  <xsd:simpleType name="IdType">
    <xsd:restriction base="xsd:string">
      <xsd:pattern value="[A-Za-z][A-Za-z0-9_\-\.%!@#\^]+"></xsd:pattern>
    </xsd:restriction>
  </xsd:simpleType>

forcing the Id to start with a letter and have at least one character, and allowing for letters, digits and the special characters: _-.%!@#^. We could also allow *, &, :, ;, /, >, < and ?. We might not allow single and double quotes in a variable name.

averbraeck commented 12 months ago

Variable names in the scenario should have a restricted string type like the above, but start and end with { and }. Otherwise, a variable name could be FUNC() which would cause a major problem in the evaluator.

averbraeck commented 12 months ago

We could possibly allow Ids to start with a number, or just be a number, so you can make id's 1, 2,, 3, etc. We have to see how strict we want to make the restriction. There is nothing wrong with a number as an Id, whereas an id like {{}} would cause real issues.

WJSchakel commented 12 months ago

There is no technical reason to be very strict with Id's (although we may be more strict than technically required). Let's work out the example of dynamic link nodes. First we have two nodes with confusing but technically functioning Id's:

Node[1].Id=[km/h] Node[2].Id=PI()

We define a link that will dynamically start from either node:

Link.StartNode={my_var}

For this an input parameter needs to be specified in two scenarios:

Scenario[1].InputParameterString.Id={my_var} Scenario[1].InputParameterString.Value=[km/h] Scenario[2].InputParameterString.Id={my_var} Scenario[2].InputParameterString.Value=PI()

The reason this works is that not all fields will be parsed as a type that will then evaluate an expression.

Field	XML type	Parser type	Remark
Node.Id	ots:IdType	String	String, so no expression is evaluated in the parser.
Link.StartNode	ots:string	StringType	With expression. xsd:keyref checks whether this is a Node.Id or InputParameterString.Id.
Scenario.InputParameterString.Id	ots:InputParameterIdType	String	InputParameterIdType forces { } but will be parsed as String; no expression is evaluated.
Scenario.InputParameterString.Value	xsd:String	String	This value results from an expression. When used to ref to an Id it should obey IdType, but input parameters may be used for other purposes.

We can see that [km/h] and PI() will never be evaluated as an expression. They only result from an expression through being the value in an input variable of the expression. Note that all the above will still work if the node Id's were {[km/h]} and {PI()} because these values are never evaluated as an expression.

Still, the only real technical restriction is this that Id's should not be recognizable as an expression, i.e. not start with { and not end with }. This is due to a normal Id reference. A simple example shows this does not work:

Node.Id={PI()} Link.StartNode={PI()} <-- oops, recognized as an expression while directly referring to a node

So therefore, no curly braces:

Node.Id=[km/h] Link.StartNode=[km/h]

WJSchakel commented 12 months ago

Still, to avoid confusion, I would be in favor of not allowing any sort of brackets. Numbers for Id's seem very logical and may be very helpful when parsing from an external network format. In understanding or communicating between formats/programs, having the same node Id's is helpful. The pattern would then become:

<xsd:pattern value="[A-Za-z0-9_\-\.%!@#\^*&amp;:;\\/>&lt;?]+"></xsd:pattern>

Note that < and & are not allowed other than as < and & as that will not work in XML.

averbraeck commented 12 months ago

The characters to escape are, by the way, according to https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#cite_ref-semicolon_2-0, and https://www.ibm.com/docs/en/was-liberty/base?topic=SSEQTP_liberty/com.ibm.websphere.wlp.doc/ae/rwlp_xml_escape.htm:

Original character	Escaped character
"	\"
'	\'
<	\<
>	\>
&	\&

averbraeck commented 12 months ago

I agree that the Id fields do not pose a problem, and we can allow almost anything, where starting and ending with curly braces is the only clash in the parsing.

What about the name of a scenario parameter that will be used as part of an expression that will be evaluated? Suppose I call such a scenario parameter FUN() (possibly defined with its curly braces as {FUN()} depending on whether you want to separate key/keyref parameters and numerical parameters), and I use an expression for a numerical entry: {2 * FUN()}. The evaluator will try to call function FUN() and not use the value of the parameter with that name. It becomes even more clear when you use PI() as the scenario parameter name with a value of -1 in a scenario. How should the evaluator know whether to use your PI() value, or use the built-in PI() function?

WJSchakel commented 12 months ago

Good point. We have to look at how the expression evaluator recognizes variables. Or more to the point, how it recognizes its demarcation. Based on org.djutils.eval.Eval.evalLhs() the first character needs to be such that Character.isLetter(char) holds. Based on org.djutils.eval.Eval.handleFunctionOrVariableOrNamedConstant() the variable name continues while Character.isLetterOrDigit(c) || '.' == c || '_' == c. This would bring the pattern down to:

  <xsd:simpleType name="InputParameterIdType">
    <xsd:restriction base="xsd:string">
      <xsd:pattern value="\{[A-Za-z][A-Za-z0-9_\.]*\}"></xsd:pattern>
    </xsd:restriction>
  </xsd:simpleType>

Perhaps the expression evaluator could allow more, such as @ and #. Both as first character, and anywhere within the name. It could also allow _ as a first character. The first character can never be ., as a number is then recognized. If the expression evaluator allows more, we could also allow more. Note however that the following characters are also not allowed given their usage in org.djutils.eval.Eval.evalRhs(): ), ^, *, /, +, -, &, |, <, >, =, !, ?, : and ,. Lastly, ( is not allowed as it will be recognized as a function rather than a variable. If the expression evaluator will allow more, the full pattern would become:

  <xsd:simpleType name="InputParameterIdType">
    <xsd:restriction base="xsd:string">
      <xsd:pattern value="\{[A-Za-z_@#%][A-Za-z0-9_@#%\.]*\}"></xsd:pattern>
    </xsd:restriction>
  </xsd:simpleType>

averbraeck commented 12 months ago

Agree. And indeed, when a variable name would be /k or val* it could not denote the difference between the '/' or '*' being part of the variable name or the formula. A variable with id a+b where also variables with id a and b exist, would be ambiguous, and should therefore not be allowed. For me, the above pattern works. A couple of challenging unit tests should check if no ambiguities remain. @OTSim can for sure think of a few!

WJSchakel commented 11 months ago

Most Id attributes are now of ots:IdType, which prohibits the use of { and }.
The Id of input paramters are of type ots:InputParameterIdType with pattern \{[A-Za-z][A-Za-z0-9_\.]*\}.
The KeyValidator class of the editor no longer accepts all expressions.
The xsd:selector in all xsd:key are amended with |.//ots:DefaultInputParameters/ots:String, which means they may also refer to a String input parameter. This was done by replacing expression (<xsd:key name="[^"]+">\s*<xsd:selector xpath="[^"]+)("\s*\/>\s*<xsd:field xpath="@Id"\s*\/>) with $1|.//ots:DefaultInputParameters/ots:String$2.

Note that dynamically referring to an input parameter will only work for keys that are on the field Id, as that same field is also referred to in an input parameter.