cellml / libcellml

Repository for libCellML development.
https://libcellml.org
Apache License 2.0
17 stars 21 forks source link

Validating ID attributes #451

Open nickerso opened 4 years ago

nickerso commented 4 years ago

As per #438, we are not currently validating id attributes. These should likely be checked by the validator, both to make sure they meet the requirements for ID type attributes (e.g., http://books.xmlschemata.org/relaxng/ch19-77215.html) and also that they are unique in the model being validated.

kerimoyle commented 4 years ago

The spec says:

Any element information item in the CellML namespace MAY contain an unprefixed attribute information item with local name id. This attribute information item SHALL be treated as having attribute type ID, as defined in section 3.3.1 of XML 1.1.

... is that still current? We don't for example have default ids?

Validity constraint: ID Attribute Default An ID attribute must have a declared default of #IMPLIED or #REQUIRED.

nickerso commented 4 years ago

Yes, pretty sure that is still current. I can't quite wrap my mind about what the XML spec is saying about default IDs. Will try again when I get a chance to look at the spec - but maybe @jonc125 or @MichaelClerx could help clarify?

MichaelClerx commented 4 years ago

What would a default ID be? It's a unique identifier, right? So how could you have a default for that?

nickerso commented 4 years ago

yeah, not sure. More just trying to interpret what it means to have an IMPLIED value as we definitely don't want a REQUIRED value. Or is that just that if the attribute is provided it must have a value?

MichaelClerx commented 4 years ago

The naming is funny but I'm fairly sure that REQUIRED means "must be user specified", while IMPLIED means "the application will find a value if no value is given", which includes "null" or "None" or "undefined" or anything of the sort! So it covers both "has a sensible default value" and "may or may not be specified". Will double-check in a book tomorrow morning, unless Jonathan beats me to it :D

kerimoyle commented 4 years ago

Is this issue still current? Running this model below doesn't generate an error about ids from the validator, and I can't figure out if it should or not? (see model id). Reading this https://www.w3.org/TR/xml11/#sec-attribute-types hasn't made it any clearer ... what kind of attribute is an id?

<?xml version="1.0" encoding="UTF-8"?>
<model xmlns="http://www.cellml.org/cellml/2.0#"
  xmlns:cellml="http://www.cellml.org/cellml/2.0#" name="tutorial_2_model" id="tutorial 2 id has spaces">
  <units name="i_am_a_units_item"/>
  <component name="i_am_a_component" id="my_component_id">
    <variable units="dimensionless" name="1st" interface="public_and_private"/>
    <variable name="b" interface="public_and_private" />
    <variable units="dimensionless" name="c" initial_value="this_variable_doesnt_exist" interface="public_and_private" />
    <variable units="i_dont_exist" name="d" interface="public_and_private"/>
    <math xmlns="http://www.w3.org/1998/Math/MathML">
      <apply>
        <eq/>
        <ci>a</ci>
        <apply>
          <plus/>
          <ci>b</ci>
          <ci>c</ci>
        </apply>
      </apply>
    </math>
  </component>
</model>
The validator has found 5 issues!
  Validator issue[0]:
     Description: Variable '1st' in component 'i_am_a_component' does not have a valid name attribute. CellML identifiers must not begin with a European numeric character [0-9].
     Type of item stored: VARIABLE
     URL: https://cellml-specification.readthedocs.io/en/latest/reference/formal_and_informative/specB08.html?issue=2.8.1.1
    See section 2.8.1.1 in the CellML specification.
  Validator issue[1]:
     Description: Variable 'b' in component 'i_am_a_component' does not have any units specified.
     Type of item stored: VARIABLE
     URL: https://cellml-specification.readthedocs.io/en/latest/reference/formal_and_informative/specB08.html?issue=2.8.1.2
    See section 2.8.1.2 in the CellML specification.
  Validator issue[2]:
     Description: Variable 'c' in component 'i_am_a_component' has an invalid initial value 'this_variable_doesnt_exist'. Initial values must be a real number string or a variable reference.
     Type of item stored: VARIABLE
     URL: https://cellml-specification.readthedocs.io/en/latest/reference/formal_and_informative/specB08.html?issue=2.8.2.2
    See section 2.8.2.2 in the CellML specification.
  Validator issue[3]:
     Description: Variable 'd' in component 'i_am_a_component' has a units reference 'i_dont_exist' which is neither standard nor defined in the parent model.
     Type of item stored: VARIABLE
     URL: https://cellml-specification.readthedocs.io/en/latest/reference/formal_and_informative/specB08.html?issue=2.8.1.2
    See section 2.8.1.2 in the CellML specification.
  Validator issue[4]:
     Description: MathML ci element has the child text 'a' which does not correspond with any variable names present in component 'i_am_a_component'.
     Type of item stored: MATH
     URL: https://cellml-specification.readthedocs.io/en/latest/reference/formal_and_informative/specB12.html?issue=2.12.3
    See section 2.12.3 in the CellML specification.
hsorby commented 3 years ago

For me the 'id' should only raise an error if it fails this test:


[4] | NameStartChar | ::= | ":" \| [A-Z] \| "_" \| [a-z] \| [#xC0-#xD6] \| [#xD8-#xF6] \| [#xF8-#x2FF] \| [#x370-#x37D] \| [#x37F-#x1FFF] \| [#x200C-#x200D] \| [#x2070-#x218F] \| [#x2C00-#x2FEF] \| [#x3001-#xD7FF] \| [#xF900-#xFDCF] \| [#xFDF0-#xFFFD] \| [#x10000-#xEFFFF]
-- | -- | -- | --
[4a] | NameChar | ::= | NameStartChar \| "-" \| "." \| [0-9] \| #xB7 \| [#x0300-#x036F] \| [#x203F-#x2040]
[5] | Name | ::= | NameStartChar (NameChar)*

I don't know what characters are contained within those ranges but I believe whitespace is in there somewhere. The use of whitespace is discouraged but not disallowed so the model id is a valid ID.

nickerso commented 3 years ago

maybe just a counter suggestion that there are no whitespace characters in those unicode ranges. see https://stackoverflow.com/a/46637274. The use of whitespace is not allowed in ID's and the model ID in the above example is invalid.

hsorby commented 3 years ago

That resolves that then clearly, whitespace is not allowed in an ID attribute value and the ID in the above example is indeed invalid and should be reported as such.