ModellingWebLab / cellmlmanip

CellML loading and model equation manipulation
Other
3 stars 1 forks source link

Handle physical units #8

Closed jonc125 closed 4 years ago

jonc125 commented 6 years ago
MichaelClerx commented 6 years ago

Ideas / initial libraries?

You mentioned https://pint.readthedocs.io/en/0.7.2/ at one stage?

jonc125 commented 6 years ago

Yes, we're using pint for several other projects, and it's working nicely there!

jonc125 commented 6 years ago

Sympy has its own units system; @tamuri will investigate this first, as it may integrate better with the rest of Sympy.

jonc125 commented 6 years ago

@tamuri I've added more info in the description about the kinds of units quirks & conversions our code will need to handle.

MichaelClerx commented 6 years ago

CellML does have an 'offset' field to support e.g. Celsius & Fahrenheit. But no models use this and the semantics are broken, so we'll say it's not allowed!

Just wanted to +1 this, and mention that we removed it in CellML 2.0

(Also "e.g." here means "these 2 and these 2 alone". Apart from maybe shoe sizes)

Shoe size in the United Kingdom, Ireland, India, and South Africa is based on the length of the last used to make the shoes, measured in barleycorn (​1⁄3 inch) starting from the smallest size deemed practical, which is called size zero. It is not formally standardised. Note that the last is typically longer than the foot heel to toe length by about 1⁄2 to 2⁄3 inch (13 to 17 mm).

:thinking:

MichaelClerx commented 6 years ago

@tamuri I've got a list of common/less common SI units if that helps any :-)

https://github.com/MichaelClerx/myokit/blob/master/myokit/units.py

tamuri commented 6 years ago

Sympy uses gram (g) as the fundamental unit for mass (!), which leads to oddities. We'll define our own Sympy quantities.

See https://github.com/sympy/sympy/issues/14734

MichaelClerx commented 6 years ago

Sympy uses gram (g) as the fundamental unit for mass

I've done the same in Myokit! It makes everything much easier, because none of the other base units have a prefix-multiplier. Didn't encounter any oddities!

jonc125 commented 6 years ago

You end up with volt having a multiplier of 1000, because it's defined in terms of kg...

MichaelClerx commented 6 years ago

But who looks at those? Surely a Volt is built from the fundamental units, whatever way you define them (like a linear algebra basis), and once it's defined it's fine? Why make the developers life difficult by choosing a basis where 1 of the elements has special prefixing rules?

MichaelClerx commented 6 years ago

To be honest, I've not worked it out as a system of units built on SI units directly:

Instead, I've defined units as things like kg = [1, 0, 0, 0, 0, 0, 0], 3 where the 1 means a 1 in the gram direction, and the 3 is the base-10 log of the unit's multiplier. I then define a bunch of other units the same way, and end up defining Volt as V = J / C (I added operator overloading for the unit type)

The difficulty with this implementation comes when creating user representations for units. There is some redundancy there so it's an intractable problem, btw (1-to-many-mapping). In my implementation that means (1) a dict lookup to see if there's a preferred representation (e.g. microampere per farad) or (2) generating a systematic (but not necessarily readable) name programmatically

Curious to hear about other approaches!

tamuri commented 6 years ago

I think the Sympy implementation is pretty good but I don't agree (at least at first glance) with the comment on the linked issue "The scale factor is (temporarily) an arbitrary value." or that it is an implementation detail internal to Sympy. You can see the consequence in one of the Sympy tests which defines a unit joule-second (Js = m**2*kg*s**-1) and then divides by the [internal implementation detail...?] kg scaling factor for the assertion. It would be more natural if kg=1

Js = Quantity("Js")
Js.set_dimension(action)
Js.set_scale_factor(S.One)

mksa = UnitSystem((m, kg, s, A), (Js,))
assert mksa.print_unit_base(Js) == m**2*kg*s**-1/1000
tamuri commented 6 years ago

FYI https://github.com/sympy/sympy/pull/14737

I don't understand the comment about deprecated code. There's no alternative to configuring the units as far as I can tell.

MichaelClerx commented 6 years ago

I don't understand the comment about deprecated code. There's no alternative to configuring the units as far as I can tell.

Yeah I don't see that either

mirams commented 6 years ago

Argh - the SI unit is the kilogram not the gram! Surely if you start changing that all the universal constants that have numbers we recognise will need redefining in funny units?

MichaelClerx commented 6 years ago

I don't see the issue with using gram over kilogram. Then you define 1 kg = 1000 g Then you continue just the same as you would have if kg was a base unit. I agree with the sympy crowd that the only difference is in the internal multiplier you store, and this is hidden from users so why not

jonc125 commented 6 years ago

It only really matters when presenting units to users, if you convert to base units. CellML says kg is the base unit, so having it be g instead will surprise people. But from the reply to my question on https://github.com/sympy/sympy/pull/14737 it sounds like the sympy people are moving to allow multiple unit systems anyway, which is what we want.

mirams commented 6 years ago

I suppose from a coding / computer science point of view it is easier to check you have done things right if you are wanting nanograms and you can see a 1e-9 somewhere. But in terms of humanity having settled on a set of units where you get a 1:1:1:1 for things like Amps, Volts, Joules, Coulombs, Newtons, metres, seconds, kg, that is a lot easier to understand than some "10^-9 of this goes with 10^3 of this to give 10^-3 of this"...

tamuri commented 6 years ago

this is hidden from users so why not

I think you need to know the unit system and what are the base units, because it does leak (see below).

1 joule-second (a measure of action) by definition = kg . m^2 / s^2 * s = kg . m^2 / s

>>> from sympy import *
>>> from sympy.physics.units import *
>>> # Make a joules-second quantity
... # dimension of action = m**2 * kg / s (because I assume kg is the base unit)
... Js = Quantity('Js', action, 1)
>>> # Get Js in terms of the base units m, kg & s (
... convert_to(Js, [m, kg, s])
kilogram*meter**2/(1000*second)
>>> 
MichaelClerx commented 6 years ago

@mirams All that is fine! The relationships stay the same. Think of it as a linear algebra space, with base units as the vectors spanning the basis! It doesn't matter what length they are, you can still specify every point uniquely, and 2*2 stays the same etc.

@tamuri I think it only leaks when, as @jonc125 says, you ask to get a unit in terms of the base units! In my experience nobody is very interested in that :-)

MichaelClerx commented 6 years ago

I mean, the whole point of having a unit system is to get away from the question of what a unit is in terms of its base units. That's what the system is supposed to do

tamuri commented 6 years ago

Noting odd Sympy prefix * unit behaviour:

>>> from sympy import *
>>> from sympy.physics.units import *
>>> milli * kilogram
1
>>> milli * kilometer
1
>>> mega * microsecond
1
>>> type(mega * microsecond)
<class 'int'>
tamuri commented 6 years ago

The following units are part of the CellML (1.0) spec but not defined by default in Sympy. We need to add them to our QuantityStore explicitly (i.e. as we did for dimensionless):

MichaelClerx commented 5 years ago

A note about model versus component level units:

From the CellML 1.1 spec:

The value of the name attribute must not equal one of the names defined in the standard dictionary of units in Table 2. [ Model authors may not redefine the standard units. ]

i.e. The predefined units can't be redeclared

For units defined in a element or declared in an element, the value of the name attribute must be unique across all elements within the and all elements. For units defined in a element, the value of the name attribute must be unique within the given element. [ Two elements in the and elements may not have the same name attribute value, although a element in a element may share the same name as a element in the element or elements. In this case, the units definition in the element supersedes the model-wide definition when referenced inside that component.

i.e. components may define local units with names that overlap the model units

jonc125 commented 5 years ago

I don't think any real models have unit definitions in components, and this is removed in CellML 2, so I'd be inclined to prohibit it!

jonc125 commented 5 years ago

Definitions for the built-in CellML derived units:

        # Convenience derived units
        gram = make('gram', [{'units': 'kilogram', 'multiplier': '0.001'}])
        litre = make('litre', [{'multiplier': '1000', 'prefix': 'centi',
                                'units': 'metre', 'exponent': '3'}])
        # SI derived units
        radian = make('radian', [{'units': 'metre'},
                                 {'units': 'metre', 'exponent': '-1'}])
        steradian = make('steradian', [{'units': 'metre', 'exponent': '2'},
                                       {'units': 'metre', 'exponent': '-2'}])
        hertz = make('hertz', [{'units': 'second', 'exponent': '-1'}])
        newton = make('newton', [{'units': 'metre'},
                                 {'units': 'kilogram'},
                                 {'units': 'second', 'exponent': '-2'}])
        pascal = make('pascal', [{'units': 'newton'},
                                 {'units': 'metre', 'exponent': '-2'}])
        joule = make('joule', [{'units': 'newton'},
                               {'units': 'metre'}])
        watt = make('watt', [{'units': 'joule'},
                             {'units': 'second', 'exponent': '-1'}])
        coulomb = make('coulomb', [{'units': 'second'},
                                   {'units': 'ampere'}])
        volt = make('volt', [{'units': 'watt'},
                             {'units': 'ampere', 'exponent': '-1'}])
        farad = make('farad', [{'units': 'coulomb'},
                               {'units': 'volt', 'exponent': '-1'}])
        ohm = make('ohm', [{'units': 'volt'},
                           {'units': 'ampere', 'exponent': '-1'}])
        siemens = make('siemens', [{'units': 'ampere'},
                                   {'units': 'volt', 'exponent': '-1'}])
        weber = make('weber', [{'units': 'volt'},
                               {'units': 'second'}])
        tesla = make('tesla', [{'units': 'weber'},
                               {'units': 'metre', 'exponent': '-2'}])
        henry = make('henry', [{'units': 'weber'},
                               {'units': 'ampere', 'exponent': '-1'}])
        celsius = make('celsius', [{'units': 'kelvin', 'offset': '-273.15'}])
        lumen = make('lumen', [{'units': 'candela'},
                               {'units': 'steradian'}])
        lux = make('lux', [{'units': 'lumen'},
                           {'units': 'metre', 'exponent': '-2'}])
        becquerel = make('becquerel', [{'units': 'second', 'exponent': '-1'}])
        gray = make('gray', [{'units': 'joule'},
                             {'units': 'kilogram', 'exponent': '-1'}])
        sievert = make('sievert', [{'units': 'joule'},
                                   {'units': 'kilogram', 'exponent': '-1'}])
        katal = make('katal', [{'units': 'second', 'exponent': '-1'},
                               {'units': 'mole'}])
        for units in [becquerel, celsius, coulomb, farad, gram, gray, henry,
                      hertz, joule, katal, litre, lumen, lux, newton, ohm,
                      pascal, radian, siemens, sievert, steradian, tesla,
                      volt, watt, weber]:
            std_units[units.name] = units
        # American spellings
        std_units[u'meter'] = std_units[u'metre']
        std_units[u'liter'] = std_units[u'litre']
MichaelClerx commented 5 years ago

I don't think any real models have unit definitions in components, and this is removed in CellML 2, so I'd be inclined to prohibit it!

I think it's still in there in CellML 2!

  1. Where the parent of the units ​element is a model element, the value of the name attribute MUST NOT be identical to the name attribute of any other units element child of that model element, or of any import units element in the CellML infoset.
  2. Where the parent of the units element is a component element, the value of the name attribute MUST NOT be identical to the name attribute of any other units element child of that component element.
  3. In any case, the value of the name attribute MUST NOT be equal to a cell in the name column of the Built-in units table.

Don't quite remember previous discussions about this, but I think component level units are essential for modularity

jonc125 commented 5 years ago

Bother!

tamuri commented 5 years ago

Need all numbers in equations to be quantities with units

Hmm, looks like Sympy doesn't like it when number is 0 (zero).

In [71]: from sympy.physics.units import *

In [72]: x = symbols('x')

In [73]: x * meter
Out[73]: meter*x

In [74]: 2 * meter
Out[74]: 2*meter

In [75]: 0 * meter
Out[75]: 0
MichaelClerx commented 5 years ago

I guess that can be an exception? What does it do for 1 [meter] + 0 ?

tamuri commented 5 years ago
In [83]: 1 * meter + 0
Out[83]: meter

Sympy has a singleton representing zero but that doesn't help

In [89]: from sympy import S

In [90]: S.Zero * meter
Out[90]: 0
mirams commented 5 years ago

I guess zero meters/kilograms etc. really is zero, how about 0*celsius or something?

MichaelClerx commented 5 years ago

Is it a problem though? It breaks the goal of having a unit for every number, but I think it doesn't mess with the meaning or validity of any of the equations

nickerso commented 5 years ago

@MichaelClerx - where are you seeing those CellML 2 units rules? As @jonc125 points out, CellML 2.0 does remove component-level units and a quick look at the spec draft doesn't seem to mention those rules...

MichaelClerx commented 5 years ago

Woops. I was looking at a 2015 draft. Why do I even have that :laughing: Thanks @nickerso that saves us a bit of effort!

@tamuri if you haven't done it already, I still think yesterday's version of the code should be updated to (1) only handled units defined at the model level and (2) raise some sort of warning or maybe even fail if there are tags in any other places

tamuri commented 5 years ago

(1) only handled units defined at the model level and

Yeah, this is the current behaviour

(2) raise some sort of warning or maybe even fail if there are tags in any other places

Hopefully schema validation will pick up this up.

jonc125 commented 5 years ago

Re zero, it doesn't necessarily have to be handled specially if units consistency checks & conversions don't complain about it. If they do then we might want special cases there treating zero as 'any unit'. Though an explicit zero shouldn't occur that often in real models - just if some subset of equations are being treated as 'disabled' by the modeller.

MichaelClerx commented 5 years ago

(1) only handled units defined at the model level and

Yeah, this is the current behaviour

Thanks! I thought we were just allowing anywhere. Will have another look!

(2) raise some sort of warning or maybe even fail if there are tags in any other places

Hopefully schema validation will pick up this up.

I think it will if we do CellML2 validation. But it raises an interesting question for how we're going to approach this! I remember @jonc125 suggesting an xsl transform to CellML2 before we parse and/or validate something, and this would definitely work, only fail in quite an ugly way if the user does stuff that's OK in 1.0/1.1 but not in 2.0

This is what I'm imaging, @jonc125 does this make sense:

CellML1.0 model > xslt > CellML 2.0 model > validation > parsing > ...

If we go down this route, then a model with e.g. component units (or other removed features) would get an error either at the transform level or at the 2.0 parsing level. So you might get a message like "Unexpected tag in " (or something more abstract), when in fact your model is fine?

jonc125 commented 5 years ago

At present we're parsing 1.0 models directly, so technically could get features coming through that our parser can't handle. We can easily change the 1.0 RELAX NG schema to disallow these though and validate against that before parsing.

For the migration to 2.0, then the XSLT would not translate features not present in 2.0, so you wouldn't get any component-level units definitions for instance. At present I think it drops them silently (which might lead to surprising later errors. Instead we could define a template matching these which raises an informative error for the user.

tamuri commented 5 years ago

Re zero, it doesn't necessarily have to be handled specially if units consistency checks & conversions don't complain about it. If they do then we might want special cases there treating zero as 'any unit'. Though an explicit zero shouldn't occur that often in real models - just if some subset of equations are being treated as 'disabled' by the modeller.

I'm checking units(eq.lhs) == units(eq.rhs). Came across this error because there's a <piecewise>....<otherwise>0</otherwise></piecewise> in one of the equations (the code checks all the possible RHSs in a piecewise). So can either keep zero as a dummy, which would check the the cellml:units attribute on the zero (i.e. <cn cellml:units="xxx">0</cn>) or allow 0 on the RHSs to match to any unit on LHS.

jonc125 commented 5 years ago

OK, so especially since a piecewise could be nested anywhere within an expression, keeping zero as a dummy might actually be the easiest approach here!

MichaelClerx commented 5 years ago

I think the Dummy approach would also mean we can tell that things like 0 [mV] + 0[A] could not have come from a valid CellML expression (allowing any units for zero would not)

tamuri commented 5 years ago

Noting odd Sympy prefix * unit behaviour:

>>> from sympy import *
>>> from sympy.physics.units import *
>>> milli * kilogram
1
>>> milli * kilometer
1
>>> mega * microsecond
1
>>> type(mega * microsecond)
<class 'int'>

Still need to figure out how to handle this. Slightly more realistic example:

>>> import sympy.physics.units as u
>>> u.milli * u.joule
1
MichaelClerx commented 4 years ago

Only one issue remaining: checking booleans / numbers are used in the correct places. Making a separate issue for this.