Open dladd opened 8 years ago
I'm very much in favour of (A). For consistency doing the same for serialization would make sense, but I wouldn't make this a high priority!
Regarding clearing errors, I don't really see any benefit to doing something more complex than (1). Surely the most common case is that you want to report errors straight after validating a model? If you do want to validate several models and keep all errors, you can create multiple Validator instances to do so.
I'm in favour of (A), separate enablers for the model. For me it keeps the model classes clean and encapsulated better, allows for more consistency if the parser is definitely an external object, and other people can use the same form if they need to add anything bespoke.
I am in favour of (A) and (1). (Agreed with Jonathan regarding (1).)
A and 1 for me too.
I am in favour of option 2. I wouldn't want the errors to be cleared until I said so.
Depends what you mean by "said so" - for me, asking to validate a different model means you no longer care about errors from a previous model!
Yes in the main I would agree but we would be restricting the ways in which the library can be used. By requiring a developer to explicitly clear the errors and leaving errors to accumulate you are not restricting them to single passes. Also, a lot of other libraries work this way where errors are not reset until instructed.
While I still believe errors should be cleared whenever validate()
is called, perhaps a better example for the alternate is not model-level validation but thinking of the case in the future where a user might want to validate a series of components manually?
But I still can't see why you'd want to have the added overhead of tracking errors from multiple sources in the same validator instance?
So then let's just have a call to clear errors at the start of the validate method and we are in a fine position going forward. Sorted.
In the case of the Parser
, we decided to only parse from the model level and save parsing from other entities for later (if ever). However, am I correct in thinking we don't necessarily want to do the same for the Validator
? I would think it would be quite useful for a user to, for instance, check if a component they just modified or constructed is valid before adding it to a model.
For me that is a second level priority. Let's not concern ourselves with that at this time, that functionality can easily be added once the validator can validate models.
I agree, getting validation of a model is the top priority.
Fair enough- I suppose it is difficult to define what a 'valid' state is for an entity outside of its model context anyway. For instance, there is no way to determine if a variable initial_value
as a variable reference is valid without knowing what the other variables in the parent component are.
How far do we want to go with MathML validation at this stage? I think we should probably try to validate the math string by parsing it with libxml2 + the W3C MathML DTD. That will also give us a something to test the MathML printer/parser with when we move on to integrating a CAS.
yep - that seems reasonable. It is also needed to be able to easily iterate over ci
elements and cellml:units
attributes to check them.
In a quick discussion at the ABI last week, it was brought up that it would be useful for the Validator
to be able to point to the particular Specification rule a given validation error violates. As an example, let's consider an unnamed model, which violates rule 4.2.1:
Every model element MUST contain a name attribute in the CellML namespace. The value of the name attribute MUST be a valid CellML identifier, and SHALL be interpreted as a unique identifier for the CellML infoset.
I think it makes sense to add another attribute to our Error
class- say mSpecificationRule
. This would allow us to specify "why" the error occurred with mSpecificationRule
pointing to the rule above, "what" went wrong with mDescription
i.e.,
Model does not have a valid name attribute.
and "where" the error occurred by allowing the the user to do error->getModel()
. mSpecificationRule
could then be:
SpecificationRule::ModelName
, which could then be mapped to a string for reporting.Personally, I don't think this needs to be more complicated than (A) unless we still expect the Spec rule ordering to shift substantially.
(A) would indeed be nice, but I would find it even more useful if libCellML could also provide us with the URL to the full specification rule (unless we could safely determine that URL from mSpecificationRule
?).
I would find it even more useful if libCellML could also provide us with the URL
Just had a quick chat with @nickerso and it sounds like a unique identifier (B) might be the way to go. Then we can have separate methods to convert that identifier to the section title, a URL, or the full rule with a map/dictionary.
Before we start adding URLs, we probably want to wait until the Specification's resting place is agreed upon (there is apparently some discussion ongoing on whether to keep it as a GoogleDoc for ease of contribution). But we can start with the mSpecificationRule
and a method to get the section title. Other maps and methods can be added later.
Another minor issue in terms of implementing our Validator
(and eventually the Printer
) as a separate class is that we don't currently have a Unit
object in the library. We just have add/remove/countUnit()
methods in the Units
class.
I can see why this was done- we don't want to confuse a user with the differences between Unit
and Units
and the manipulations we need to do to a given Unit
are relatively limited. But we need to be able to getUnit
if we want to validate/print outside of the Units
class.
If we want to keep Unit
private but still do all the validation/printing in the separate enablers, we could take a similar approach to how we deal with private XML classes in the Parser
:
Unit
class, which we #include
in validator.cpp. Methods that use Unit
will go in the private implementation of the Validator/Printer
. Other possibilities are:
Unit
objectUnit
in the Units
class.I would personally be fine with (3) as the most expedient option but (1) is probably the most "correct" way to sort this out.
(3) is not only the most expedient, but also the "correct" way in my view of libCellML.
Ah OK maybe I was overthinking this then. Is just a case of the more straightforward approach being the correct one?
Just had a quick chat with @nickerso and it sounds like a unique identifier (B) might be the way to go. Then we can have separate methods to convert that identifier to the section title, a URL, or the full rule with a map/dictionary.
I actually agree with that. My bad for originally suggesting an 'improved' version of (A).
Otherwise, agreed on (3) too.
Building on the decision to go with (3)(validation of each Unit
from within the Units
class), we'll need to get access to the unit-validating method (Units::getUnitValidationErrors()
) from the Validator
. But we also probably don't want to publicly expose that method- would this be a reasonable situation to use friend
?
When I originally saw the friend
keyword, I immediately thought "no way" and decided to re-read the latest messages and... I get the feeling that option (3) (i.e. do validation/printing of each Unit
in the Units
class) goes against what we had originally agreed and what has already been implemented in the Validator
class so far.
Indeed, from what I remember and see in the Validator
class, we do all the validation at the Validator
class level and not at the Model
, Component
, etc. class levels. So, why should we be doing things differently for units? In other words, I would expect units validation to be done in the Validator
class and not in the Units
class.
... or am I missing something?...
That was my original thinking behind suggesting (1):
- (1) Make a private
Unit
class, which we#include
in validator.cpp. Methods that useUnit
will go in the private implementation of theValidator/Printer
.
I was trying to make it so we could have all our validation operations within the Validator
. The issue was that we need to get access to each Unit
in a given Units
during validation but we don't have a Unit
object- just a struct Unit
in Units
that contains the necessary attributes (prefix, exponent, etc.).
My thinking with (3) is that (1) might overcomplicate things for what is ultimately one or two methods. But for (3) we also need to get access to a Units
method from the Validator
so friend
is a workaround to avoid publicly exposing an internal method.
Of course, this is also open to any further suggestions :-)
Well, (3) is not an option for me anymore since it effectively breaks our agreed approach with regards to the Validator
class.
So, maybe the solution consists of making the Validator
class a friend (!!) of the Units
class, and have in the Units
class some private methods that will allow the Validator
class to retrieve anything it needs about a given Unit
?
Just thinking about this, I think this is the same issue that is going to crop up when we want to actually do anything with Units
so it might be worth getting a "better" solution than befriending classes. But not going quite so far as adding a Unit
class to the public API.
How about adding a method like this:
Units::getUnit(int index, std::string& name, double& prefix, double& exponent, double& multiplier, double& offset) const;
or maybe:
const std::string& Units::getUnitName(int index);
double Units::getUnitPrefix(int index);
double Units::getUnitExponent(int index);
...
or both to match the addUnit
methods.
@nickerso, I indeed had something like that in mind, but... then the question is whether we want libCellML users to be able to access those methods? I thought the answer was no and that it was the reason the Unit
structure is declared in the units.cpp
and not in units.h
.
So, if we don't mind libCellML users to have access to those methods, then your suggestion looks fine to me. On the other hand, if we don't want libCellML users to have access to them, then we would have no other choice but to make Validator
a friend of Units
.
I think @nickerso's getUnit()
and getUnitAttribute()
suggestions are the most logical so far. And they might actually be useful for a user wanting to, for instance, check the units reference in order to remove a specific unit.
I think I personally have a slight preference for the getUnitAttribute()
method as I see that being more useful in practice but getUnit()
matches up better with addUnit()
so I am happy either way.
I'm now leaning toward users have access to that information, as I think this is the "cleanest" way to expose this information. And if we want to keep the math handling external to libCellML, then we'll likely need access to that information without being able to use friend
.
+1 for Units::getUnit()
being public
.
The current plan is to use the MathML DTD to separately validate our opaque math string. However, when parsing a valid CellML model, we may end up with a situation where the cellml
namespace has been declared in a parent element (model, component). Then the math string is not self-contained and we would get validation errors for any cellml:units
attributes.
I had a chat with @nickerso and a possible solution might be to just say that we expect a valid math string to be independently valid. During parsing, We would copy any namespaces declared in parent elements into <math>
.
In addition to helping out with validation, this "math as a valid subdocument" requirement will probably also be useful if we want to hang on to units attributes when parsing our math string into a CAS.
I had a chat with @nickerso and a possible solution might be to just say that we expect a valid math string to be independently valid. During parsing, We would copy any namespaces declared in parent elements into
<math>
.
I am confused, are we not removing non-MathML elements from the math string anymore (see here)?
If not, then copying the namespaces into <math
> should work and should indeed be useful for anything that needs access to units attributes in the future.
In addition to helping out with validation, this "math as a valid subdocument" requirement will probably also be useful if we want to hang on to units attributes when parsing our math string into a CAS.
I clearly missed it the first time round, but what does CAS stand for exactly?
The plan is still to remove non-MathML element to validate the MathML itself, but we also need to validate the units. But to validate the units we need to keep track of the namespace of the units attributes - information that is lost if not contained in the math string.
CAS = computational algebra system.
Edit: this comment is essentially a simultaneous duplicate of @nickerso 's above
are we not removing non-MathML elements from the math string anymore?
That's still the plan but in order to get to our cellml:units
attributes, we parse the document as normal XML and have the cellml
namespace declared. Otherwise, libxml throws an undefined namespace error and ignores those attributes.
what does CAS stand for exactly?
Sorry that's a Computer Algebra System (see #95) for symbolic manipulation. Current thinking is still leaning toward SymEngine.
Ok, thanks for the clarification and the meaning of CAS.
In a meeting at the ABI on Tuesday, there was some discussion about how we might implement validation as the next step in the use-cases doc.
The team here was in favour of (A) setting up a
Validator
object, similar to how we deal with theParser
in the changeset under review in #114 . Consistent with this, it was also suggested that we should probably move our serialisation routines into a separateSerialiser
object (we might want to call thisPrinter
or something else instead), rather than doing serialisation from the objects themselves.For example, a user wishing to check if a model is valid, output it if so or output errors if not might do:
An alternative (B) might be to do validation on entities as we currently do for serialisation, for example:
While B is a bit more concise, the thinking is that it is worth the extra line or two to make our reading, writing, and validation error logging consistent. And
Parser
may need to pick up errors that do not necessarily translate into an Entity (e.g. unrecognised or mismatched XML tags).With the validator, there is also the question of when/if to automatically clear errors. For instance, consider if we call the following validation on two very simple models:
validator
might now contain the following errors regarding the requirement for valid models to have names:validate()
is called.m1
andm2
if we allow validation errors to persist until the user callsvalidator.clearErrors()
m2
only if we remove any previous errors pertaining to the object being validated from thevalidator
and add back any new errors from the updated object.I don't have a strong preference but would probably go with 3.