averbraeck / opentrafficsim

Open Source Multi-Level Traffic Simulator
BSD 3-Clause "New" or "Revised" License
28 stars 8 forks source link

Restrict Id names in OTS definitions #83

Closed averbraeck closed 11 months ago

averbraeck commented 12 months ago

Right now, a name for an Id field of, e.g., a Node can be {{{{ or {PI()} which will cause major problems later when a reference to this Id is made in, e.g., the definition of a Link. The current definition for any Id in the OTS XSD's is:

  <xsd:attribute name="Id" type="xsd:string" use="required" />

The same holds for a variable name that is used as an input name for an expression variable in a scenario.

We have to see what characters to include. I would like any variable to start with a letter, and avoid:

Definition of an IdType could be something like:

  <xsd:simpleType name="IdType">
    <xsd:restriction base="xsd:string">
      <xsd:pattern value="[A-Za-z][A-Za-z0-9_\-\.%!@#\^]+"></xsd:pattern>
    </xsd:restriction>
  </xsd:simpleType>

forcing the Id to start with a letter and have at least one character, and allowing for letters, digits and the special characters: _-.%!@#^. We could also allow *, &, :, ;, /, >, < and ?. We might not allow single and double quotes in a variable name.

averbraeck commented 12 months ago

Variable names in the scenario should have a restricted string type like the above, but start and end with { and }. Otherwise, a variable name could be FUNC() which would cause a major problem in the evaluator.

averbraeck commented 12 months ago

We could possibly allow Ids to start with a number, or just be a number, so you can make id's 1, 2,, 3, etc. We have to see how strict we want to make the restriction. There is nothing wrong with a number as an Id, whereas an id like {{}} would cause real issues.

WJSchakel commented 12 months ago

There is no technical reason to be very strict with Id's (although we may be more strict than technically required). Let's work out the example of dynamic link nodes. First we have two nodes with confusing but technically functioning Id's:

Node[1].Id=[km/h] Node[2].Id=PI()

We define a link that will dynamically start from either node:

Link.StartNode={my_var}

For this an input parameter needs to be specified in two scenarios:

Scenario[1].InputParameterString.Id={my_var} Scenario[1].InputParameterString.Value=[km/h] Scenario[2].InputParameterString.Id={my_var} Scenario[2].InputParameterString.Value=PI()

The reason this works is that not all fields will be parsed as a type that will then evaluate an expression.

FieldXML typeParser typeRemark
Node.Idots:IdTypeStringString, so no expression is evaluated in the parser.
Link.StartNodeots:stringStringTypeWith expression. xsd:keyref checks whether this is a Node.Id or InputParameterString.Id.
Scenario.InputParameterString.Idots:InputParameterIdTypeStringInputParameterIdType forces { } but will be parsed as String; no expression is evaluated.
Scenario.InputParameterString.Valuexsd:StringStringThis value results from an expression. When used to ref to an Id it should obey IdType, but input parameters may be used for other purposes.

We can see that [km/h] and PI() will never be evaluated as an expression. They only result from an expression through being the value in an input variable of the expression. Note that all the above will still work if the node Id's were {[km/h]} and {PI()} because these values are never evaluated as an expression.

Still, the only real technical restriction is this that Id's should not be recognizable as an expression, i.e. not start with { and not end with }. This is due to a normal Id reference. A simple example shows this does not work:

Node.Id={PI()} Link.StartNode={PI()} <-- oops, recognized as an expression while directly referring to a node

So therefore, no curly braces:

Node.Id=[km/h] Link.StartNode=[km/h]

WJSchakel commented 12 months ago

Still, to avoid confusion, I would be in favor of not allowing any sort of brackets. Numbers for Id's seem very logical and may be very helpful when parsing from an external network format. In understanding or communicating between formats/programs, having the same node Id's is helpful. The pattern would then become:

<xsd:pattern value="[A-Za-z0-9_\-\.%!@#\^*&amp;:;\\/>&lt;?]+"></xsd:pattern>

Note that < and & are not allowed other than as &lt; and &amp; as that will not work in XML.

averbraeck commented 12 months ago

The characters to escape are, by the way, according to https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#cite_ref-semicolon_2-0, and https://www.ibm.com/docs/en/was-liberty/base?topic=SSEQTP_liberty/com.ibm.websphere.wlp.doc/ae/rwlp_xml_escape.htm:

Original character Escaped character
" \"
' \'
< \<
> \>
& \&
averbraeck commented 12 months ago

I agree that the Id fields do not pose a problem, and we can allow almost anything, where starting and ending with curly braces is the only clash in the parsing.

What about the name of a scenario parameter that will be used as part of an expression that will be evaluated? Suppose I call such a scenario parameter FUN() (possibly defined with its curly braces as {FUN()} depending on whether you want to separate key/keyref parameters and numerical parameters), and I use an expression for a numerical entry: {2 * FUN()}. The evaluator will try to call function FUN() and not use the value of the parameter with that name. It becomes even more clear when you use PI() as the scenario parameter name with a value of -1 in a scenario. How should the evaluator know whether to use your PI() value, or use the built-in PI() function?

WJSchakel commented 12 months ago

Good point. We have to look at how the expression evaluator recognizes variables. Or more to the point, how it recognizes its demarcation. Based on org.djutils.eval.Eval.evalLhs() the first character needs to be such that Character.isLetter(char) holds. Based on org.djutils.eval.Eval.handleFunctionOrVariableOrNamedConstant() the variable name continues while Character.isLetterOrDigit(c) || '.' == c || '_' == c. This would bring the pattern down to:

  <xsd:simpleType name="InputParameterIdType">
    <xsd:restriction base="xsd:string">
      <xsd:pattern value="\{[A-Za-z][A-Za-z0-9_\.]*\}"></xsd:pattern>
    </xsd:restriction>
  </xsd:simpleType>

Perhaps the expression evaluator could allow more, such as @ and #. Both as first character, and anywhere within the name. It could also allow _ as a first character. The first character can never be ., as a number is then recognized. If the expression evaluator allows more, we could also allow more. Note however that the following characters are also not allowed given their usage in org.djutils.eval.Eval.evalRhs(): ), ^, *, /, +, -, &, |, <, >, =, !, ?, : and ,. Lastly, ( is not allowed as it will be recognized as a function rather than a variable. If the expression evaluator will allow more, the full pattern would become:

  <xsd:simpleType name="InputParameterIdType">
    <xsd:restriction base="xsd:string">
      <xsd:pattern value="\{[A-Za-z_@#%][A-Za-z0-9_@#%\.]*\}"></xsd:pattern>
    </xsd:restriction>
  </xsd:simpleType>
averbraeck commented 12 months ago

Agree. And indeed, when a variable name would be /k or val* it could not denote the difference between the '/' or '*' being part of the variable name or the formula. A variable with id a+b where also variables with id a and b exist, would be ambiguous, and should therefore not be allowed. For me, the above pattern works. A couple of challenging unit tests should check if no ambiguities remain. @OTSim can for sure think of a few!

WJSchakel commented 11 months ago

Note that dynamically referring to an input parameter will only work for keys that are on the field Id, as that same field is also referred to in an input parameter.