Closed jonc125 closed 4 years ago
Ideas / initial libraries?
You mentioned https://pint.readthedocs.io/en/0.7.2/ at one stage?
Yes, we're using pint for several other projects, and it's working nicely there!
Sympy has its own units system; @tamuri will investigate this first, as it may integrate better with the rest of Sympy.
@tamuri I've added more info in the description about the kinds of units quirks & conversions our code will need to handle.
CellML does have an 'offset' field to support e.g. Celsius & Fahrenheit. But no models use this and the semantics are broken, so we'll say it's not allowed!
Just wanted to +1 this, and mention that we removed it in CellML 2.0
(Also "e.g." here means "these 2 and these 2 alone". Apart from maybe shoe sizes)
Shoe size in the United Kingdom, Ireland, India, and South Africa is based on the length of the last used to make the shoes, measured in barleycorn (1⁄3 inch) starting from the smallest size deemed practical, which is called size zero. It is not formally standardised. Note that the last is typically longer than the foot heel to toe length by about 1⁄2 to 2⁄3 inch (13 to 17 mm).
:thinking:
@tamuri I've got a list of common/less common SI units if that helps any :-)
https://github.com/MichaelClerx/myokit/blob/master/myokit/units.py
Sympy uses gram (g) as the fundamental unit for mass (!), which leads to oddities. We'll define our own Sympy quantities.
Sympy uses gram (g) as the fundamental unit for mass
I've done the same in Myokit! It makes everything much easier, because none of the other base units have a prefix-multiplier. Didn't encounter any oddities!
You end up with volt having a multiplier of 1000, because it's defined in terms of kg...
But who looks at those? Surely a Volt is built from the fundamental units, whatever way you define them (like a linear algebra basis), and once it's defined it's fine? Why make the developers life difficult by choosing a basis where 1 of the elements has special prefixing rules?
To be honest, I've not worked it out as a system of units built on SI units directly:
Instead, I've defined units as things like kg = [1, 0, 0, 0, 0, 0, 0], 3
where the 1
means a 1 in the gram direction, and the 3 is the base-10 log of the unit's multiplier.
I then define a bunch of other units the same way, and end up defining Volt as V = J / C
(I added operator overloading for the unit type)
The difficulty with this implementation comes when creating user representations for units. There is some redundancy there so it's an intractable problem, btw (1-to-many-mapping). In my implementation that means (1) a dict lookup to see if there's a preferred representation (e.g. microampere per farad) or (2) generating a systematic (but not necessarily readable) name programmatically
Curious to hear about other approaches!
I think the Sympy implementation is pretty good but I don't agree (at least at first glance) with the comment on the linked issue "The scale factor is (temporarily) an arbitrary value." or that it is an implementation detail internal to Sympy. You can see the consequence in one of the Sympy tests which defines a unit joule-second (Js = m**2*kg*s**-1
) and then divides by the [internal implementation detail...?] kg scaling factor for the assertion. It would be more natural if kg=1
Js = Quantity("Js")
Js.set_dimension(action)
Js.set_scale_factor(S.One)
mksa = UnitSystem((m, kg, s, A), (Js,))
assert mksa.print_unit_base(Js) == m**2*kg*s**-1/1000
FYI https://github.com/sympy/sympy/pull/14737
I don't understand the comment about deprecated code. There's no alternative to configuring the units as far as I can tell.
I don't understand the comment about deprecated code. There's no alternative to configuring the units as far as I can tell.
Yeah I don't see that either
Argh - the SI unit is the kilogram not the gram! Surely if you start changing that all the universal constants that have numbers we recognise will need redefining in funny units?
I don't see the issue with using gram over kilogram. Then you define 1 kg = 1000 g Then you continue just the same as you would have if kg was a base unit. I agree with the sympy crowd that the only difference is in the internal multiplier you store, and this is hidden from users so why not
It only really matters when presenting units to users, if you convert to base units. CellML says kg
is the base unit, so having it be g
instead will surprise people. But from the reply to my question on https://github.com/sympy/sympy/pull/14737 it sounds like the sympy people are moving to allow multiple unit systems anyway, which is what we want.
I suppose from a coding / computer science point of view it is easier to check you have done things right if you are wanting nanograms and you can see a 1e-9 somewhere. But in terms of humanity having settled on a set of units where you get a 1:1:1:1 for things like Amps, Volts, Joules, Coulombs, Newtons, metres, seconds, kg, that is a lot easier to understand than some "10^-9 of this goes with 10^3 of this to give 10^-3 of this"...
this is hidden from users so why not
I think you need to know the unit system and what are the base units, because it does leak (see below).
1 joule-second (a measure of action) by definition = kg . m^2 / s^2 * s = kg . m^2 / s
>>> from sympy import *
>>> from sympy.physics.units import *
>>> # Make a joules-second quantity
... # dimension of action = m**2 * kg / s (because I assume kg is the base unit)
... Js = Quantity('Js', action, 1)
>>> # Get Js in terms of the base units m, kg & s (
... convert_to(Js, [m, kg, s])
kilogram*meter**2/(1000*second)
>>>
@mirams All that is fine! The relationships stay the same. Think of it as a linear algebra space, with base units as the vectors spanning the basis! It doesn't matter what length they are, you can still specify every point uniquely, and 2*2 stays the same etc.
@tamuri I think it only leaks when, as @jonc125 says, you ask to get a unit in terms of the base units! In my experience nobody is very interested in that :-)
I mean, the whole point of having a unit system is to get away from the question of what a unit is in terms of its base units. That's what the system is supposed to do
Noting odd Sympy prefix * unit behaviour:
>>> from sympy import *
>>> from sympy.physics.units import *
>>> milli * kilogram
1
>>> milli * kilometer
1
>>> mega * microsecond
1
>>> type(mega * microsecond)
<class 'int'>
The following units are part of the CellML (1.0) spec but not defined by default in Sympy. We need to add them to our QuantityStore explicitly (i.e. as we did for dimensionless):
A note about model versus component level units:
From the CellML 1.1 spec:
The value of the name attribute must not equal one of the names defined in the standard dictionary of units in Table 2. [ Model authors may not redefine the standard units. ]
i.e. The predefined units can't be redeclared
For units defined in a
element or declared in an element, the value of the name attribute must be unique across all elements within the and all elements. For units defined in a element, the value of the name attribute must be unique within the given element. [ Two elements in the and elements may not have the same name attribute value, although a element in a element may share the same name as a element in the element or elements. In this case, the units definition in the element supersedes the model-wide definition when referenced inside that component.
i.e. components may define local units with names that overlap the model units
I don't think any real models have unit definitions in components, and this is removed in CellML 2, so I'd be inclined to prohibit it!
Definitions for the built-in CellML derived units:
# Convenience derived units
gram = make('gram', [{'units': 'kilogram', 'multiplier': '0.001'}])
litre = make('litre', [{'multiplier': '1000', 'prefix': 'centi',
'units': 'metre', 'exponent': '3'}])
# SI derived units
radian = make('radian', [{'units': 'metre'},
{'units': 'metre', 'exponent': '-1'}])
steradian = make('steradian', [{'units': 'metre', 'exponent': '2'},
{'units': 'metre', 'exponent': '-2'}])
hertz = make('hertz', [{'units': 'second', 'exponent': '-1'}])
newton = make('newton', [{'units': 'metre'},
{'units': 'kilogram'},
{'units': 'second', 'exponent': '-2'}])
pascal = make('pascal', [{'units': 'newton'},
{'units': 'metre', 'exponent': '-2'}])
joule = make('joule', [{'units': 'newton'},
{'units': 'metre'}])
watt = make('watt', [{'units': 'joule'},
{'units': 'second', 'exponent': '-1'}])
coulomb = make('coulomb', [{'units': 'second'},
{'units': 'ampere'}])
volt = make('volt', [{'units': 'watt'},
{'units': 'ampere', 'exponent': '-1'}])
farad = make('farad', [{'units': 'coulomb'},
{'units': 'volt', 'exponent': '-1'}])
ohm = make('ohm', [{'units': 'volt'},
{'units': 'ampere', 'exponent': '-1'}])
siemens = make('siemens', [{'units': 'ampere'},
{'units': 'volt', 'exponent': '-1'}])
weber = make('weber', [{'units': 'volt'},
{'units': 'second'}])
tesla = make('tesla', [{'units': 'weber'},
{'units': 'metre', 'exponent': '-2'}])
henry = make('henry', [{'units': 'weber'},
{'units': 'ampere', 'exponent': '-1'}])
celsius = make('celsius', [{'units': 'kelvin', 'offset': '-273.15'}])
lumen = make('lumen', [{'units': 'candela'},
{'units': 'steradian'}])
lux = make('lux', [{'units': 'lumen'},
{'units': 'metre', 'exponent': '-2'}])
becquerel = make('becquerel', [{'units': 'second', 'exponent': '-1'}])
gray = make('gray', [{'units': 'joule'},
{'units': 'kilogram', 'exponent': '-1'}])
sievert = make('sievert', [{'units': 'joule'},
{'units': 'kilogram', 'exponent': '-1'}])
katal = make('katal', [{'units': 'second', 'exponent': '-1'},
{'units': 'mole'}])
for units in [becquerel, celsius, coulomb, farad, gram, gray, henry,
hertz, joule, katal, litre, lumen, lux, newton, ohm,
pascal, radian, siemens, sievert, steradian, tesla,
volt, watt, weber]:
std_units[units.name] = units
# American spellings
std_units[u'meter'] = std_units[u'metre']
std_units[u'liter'] = std_units[u'litre']
I don't think any real models have unit definitions in components, and this is removed in CellML 2, so I'd be inclined to prohibit it!
I think it's still in there in CellML 2!
- Where the parent of the
units
element is amodel
element, the value of the name attribute MUST NOT be identical to the name attribute of any otherunits
element child of thatmodel
element, or of anyimport units
element in the CellML infoset.- Where the parent of the
units
element is acomponent
element, the value of the name attribute MUST NOT be identical to the name attribute of any otherunits
element child of thatcomponent
element.- In any case, the value of the name attribute MUST NOT be equal to a cell in the name column of the Built-in units table.
Don't quite remember previous discussions about this, but I think component level units are essential for modularity
Bother!
Need all numbers in equations to be quantities with units
Hmm, looks like Sympy doesn't like it when number is 0 (zero).
In [71]: from sympy.physics.units import *
In [72]: x = symbols('x')
In [73]: x * meter
Out[73]: meter*x
In [74]: 2 * meter
Out[74]: 2*meter
In [75]: 0 * meter
Out[75]: 0
I guess that can be an exception? What does it do for 1 [meter] + 0
?
In [83]: 1 * meter + 0
Out[83]: meter
Sympy has a singleton representing zero but that doesn't help
In [89]: from sympy import S
In [90]: S.Zero * meter
Out[90]: 0
I guess zero meters/kilograms etc. really is zero, how about 0*celsius or something?
Is it a problem though? It breaks the goal of having a unit for every number, but I think it doesn't mess with the meaning or validity of any of the equations
@MichaelClerx - where are you seeing those CellML 2 units rules? As @jonc125 points out, CellML 2.0 does remove component-level units and a quick look at the spec draft doesn't seem to mention those rules...
Woops. I was looking at a 2015 draft. Why do I even have that :laughing: Thanks @nickerso that saves us a bit of effort!
@tamuri if you haven't done it already, I still think yesterday's version of the code should be updated to (1) only handled units defined at the model level and (2) raise some sort of warning or maybe even fail if there are
(1) only handled units defined at the model level and
Yeah, this is the current behaviour
(2) raise some sort of warning or maybe even fail if there are tags in any other places
Hopefully schema validation will pick up this up.
Re zero, it doesn't necessarily have to be handled specially if units consistency checks & conversions don't complain about it. If they do then we might want special cases there treating zero as 'any unit'. Though an explicit zero shouldn't occur that often in real models - just if some subset of equations are being treated as 'disabled' by the modeller.
(1) only handled units defined at the model level and
Yeah, this is the current behaviour
Thanks! I thought we were just allowing
(2) raise some sort of warning or maybe even fail if there are tags in any other places
Hopefully schema validation will pick up this up.
I think it will if we do CellML2 validation. But it raises an interesting question for how we're going to approach this! I remember @jonc125 suggesting an xsl transform to CellML2 before we parse and/or validate something, and this would definitely work, only fail in quite an ugly way if the user does stuff that's OK in 1.0/1.1 but not in 2.0
This is what I'm imaging, @jonc125 does this make sense:
CellML1.0 model > xslt > CellML 2.0 model > validation > parsing > ...
If we go down this route, then a model with e.g. component units (or other removed features) would get an error either at the transform level or at the 2.0 parsing level. So you might get a message like "Unexpected tag
At present we're parsing 1.0 models directly, so technically could get features coming through that our parser can't handle. We can easily change the 1.0 RELAX NG schema to disallow these though and validate against that before parsing.
For the migration to 2.0, then the XSLT would not translate features not present in 2.0, so you wouldn't get any component-level units definitions for instance. At present I think it drops them silently (which might lead to surprising later errors. Instead we could define a template matching these which raises an informative error for the user.
Re zero, it doesn't necessarily have to be handled specially if units consistency checks & conversions don't complain about it. If they do then we might want special cases there treating zero as 'any unit'. Though an explicit zero shouldn't occur that often in real models - just if some subset of equations are being treated as 'disabled' by the modeller.
I'm checking units(eq.lhs) == units(eq.rhs)
. Came across this error because there's a <piecewise>....<otherwise>0</otherwise></piecewise>
in one of the equations (the code checks all the possible RHSs in a piecewise). So can either keep zero as a dummy, which would check the the cellml:units attribute on the zero (i.e. <cn cellml:units="xxx">0</cn>
) or allow 0 on the RHSs to match to any unit on LHS.
OK, so especially since a piecewise could be nested anywhere within an expression, keeping zero as a dummy might actually be the easiest approach here!
I think the Dummy approach would also mean we can tell that things like 0 [mV] + 0[A]
could not have come from a valid CellML expression (allowing any units for zero would not)
Noting odd Sympy prefix * unit behaviour:
>>> from sympy import * >>> from sympy.physics.units import * >>> milli * kilogram 1 >>> milli * kilometer 1 >>> mega * microsecond 1 >>> type(mega * microsecond) <class 'int'>
Still need to figure out how to handle this. Slightly more realistic example:
>>> import sympy.physics.units as u
>>> u.milli * u.joule
1
Only one issue remaining: checking booleans / numbers are used in the correct places. Making a separate issue for this.
true
andfalse
. There could be other ways of handling this :)exp(V)
whereV
is in volts, not dimensionless (occasionally also in volt^2, millimolar, ...)tanh(V)
whereV
is in volts, not dimensionlesseq
,geq
,gt
,minus
,plus
,piecewise
. Currently we only warn about these when generating code, as otherwise we'd need to fix lots of CellML files!