bjornbm / dimensional

Dimensional library variant built on Data Kinds, Closed Type Families, TypeNats (GHC 7.8+).
BSD 3-Clause "New" or "Revised" License
109 stars 17 forks source link

named-units raison d'être? #46

Closed bjornbm closed 9 years ago

bjornbm commented 10 years ago

@dmcclean, I've reviewed your named-units-variants branch and lite this iteration better. However, as I asked in #7 regarding the named-units branch:

Just to make sure we're on the same page: exactly what is the problem we are trying to solve (here)? To provide the UnitMap functionality? Any other goals of the named-units branch?

Can you provide a concrete use case (perhaps pseudo-code) of what named-units will enable that cannot be handled by other means (e.g. a specialised show function taking a map of desired units)? Thanks

dmcclean commented 10 years ago

Sorry for not answering that in #7, I must have missed it somehow.

For me, having showIn :: Unit d v -> Quantity d v -> String is the most essential problem I am trying to solve, along with being able to compose units. AnyUnit and UnitMap are nice, but are probably a distraction in this conversation.

All of the Signal types in my autopilot come with a way to show the signal, which is awesome for monitoring what is going on. I use this bit of machinery to allow doing that even when abstracting over the type of the signal's content.

Right now the rest of that file is written against non-named-units dimensional-dk:

instance (Show v, Fractional v, Typeable d, Typeable v) => Representable (Quantity d v) where
type RepresentationDescriptor (Quantity d v) = Unit d v
represent unit value = show $ value /~ unit -- TODO: use the showIn function
defaultRepresentation _ = siUnit

If it was written against named-units dimensional-dk:

instance (Show v, Fractional v, Typeable d, Typeable v) => Representable (Quantity d v) where
type RepresentationDescriptor (Quantity d v) = Unit d v
represent = showIn
defaultRepresentation _ = siUnit

Now instead of just getting a bare "74.961", I would get "74.961 kt", which is huge.

This is without using UnitMap, because it doesn't work by keeping a map from dimension to unit, instead it works by tracking the preferred unit when the signal is constructed.

For PID gains, I can use "redundant" units like degree / (degree / second), and the name of that unit can be computed and displayed. I can even form this unit compositionally in the definition of the PID loop signal function, as the preferred unit of the output divided by the preferred unit of the input, which an alternative scheme based on separate names would struggle to do.

I'd like to add to dimensional-experimental a function that takes Unit d v -> Quantity d v -> Text.LaTeX.LaTeX (probably using the siunitx CTAN package to avoid reinventing too many wheels). We could add a function that takes Unit d v to the UCUM name for the unit.

You could use other means to create a specialized show function taking a map of desired units, or even just taking a particular desired unit, but to do so you'd have to figure out how to represent units. I think that the representation in the branch is the one that makes the most sense and has the most benefits, because it encodes what a unit really is (a named dimensional quantity), and because it allows you to compose units using the group action on unit names and the group action on unit values together.

AFAICT, neither the NIST Guide nor the BIPM's The International System of Units "brochure" (not sure why they call it a brochure) directly defines a "unit". The BIPM document does point to another of their documents, though, the International Vocabulary of Basic and General Terms in Metrology, which in section 1.9, Note 1, does say that "Measurement units are designated by conventionally assigned names and symbols." Indeed, it seems to me that they could hardly not be, or they would be no different than quantities. How could something be, in their words, "defined and adopted by convention", without naming it? I suppose we could just point at the prototype kilogram and say "that one", but we could also do that with any other quantity.

(Unrelated: In that document, also check out section 1.12, which defines a "coherent derived unit" to be what we implemented as siUnit, the "derived unit that, for a given system of quantities and for a chosen set of base units, is a product of powers of base units with no other proportionality factor than one". I added a note to this effect in the documentation.)

dmcclean commented 10 years ago

I should add that, as currently incarnated, it has a more general type for (*) which is where some of the type complexity is coming from.

master dimensional-dk

let x = _3 :: Dimensionless Double
let y = x * meter -- doesn't compile because x is a quantity and meter is a unit
let y' = x * (1 *~ meter) -- one way to write it
let y'' = (x /~ one) *~ meter -- the other way to write it

On the branch the y example works.

This decision is orthogonal to the question of whether to have named units, and is probably responsible for a large percentage of the parts in the branch that (I think?) you find distasteful.

I'm not at all as committed to this one as to the named-ness of the units. It has some clear cons.

I'm fine with () :: (KnownVariant var, Num v) => Dimensional var d1 v -> Dimensional var d2 v -> Dimensional var (d1 \ d2) v`

which is almost identical to master's: (the only difference is an always-satisfied-because-the-only-types-in-the-kind-all-satisfy-it constraint on the variant type variable) () :: Num a => Dimensional v d a -> Dimensional v d' a -> Dimensional v (d \ d') a

instead of the current () :: (KnownVariant v1, KnownVariant v2, KnownVariant p, p ~ (VariantCombination v1 v2), Num v) => Dimensional v1 d1 v -> Dimensional v2 d2 v -> Dimensional p (d1 \ d2) v

Adopting that change (and the one to (/)) would allow entirely killing the VariantCombination type family. We'd still need the one that drops the prefixability phantom type when exponenting a unit (iff we want to keep statically tracked prefixability, which master also doesn't have.)

dmcclean commented 10 years ago

Actually it is still a bit noisier than that, I tried it. You end up needing to drop prefixability when multiplying or dividing, too, so you need type family noise for that.

bjornbm commented 10 years ago

Thanks for elaborating. I really do want to give you some sort of closure regarding this branch, as I feel you deserve it and that it is unfair to both of us to stall on this question. Part of my difficulty with coming to conclusion regarding this branch is that:

  1. There is a lack of implementation documentation/narrative, so I see e.g. a type family but it is not immediately clear what the purpose or meaning is and I have to infer this from the context in which it is used. The makes reviewing the branch require quite high mental effort, quite a bit of guessing and back-tracking. On the other hand I must say that you are excellent at writing detailed issues. :)
  2. It is a moving target. Yesterday I printed that UnitNames and DK modules and reviewed in the evening on paper, and today when I went to try some ideas out some areas in the code have changed dramatically. I don't mean to criticise this; the iterating is for the better, but it means that I have a hard time wrapping my mind around the branch and the full implications of merging it.

Those points aside …

Regarding stand-alone units

On the branch the y example works.

This decision is orthogonal to the question of whether to have named units, and is probably responsible for a large percentage of the parts in the branch that (I think?) you find distasteful.

I'm not a huge fan of the _3 * meter functionality that you mentioned (and subsequently tried to remove), as it also allows e.g. meter / (3 *~ gram). It effectively equates a unit with a quantity (except for units having names) which arguably isn't sound. See NIST SP811 section 7.1 which says:

The value of a quantity is its magnitude expressed as the product of a number and a unit, and the number multiplying the unit is the numerical value of the quantity expressed in that unit.

Which I interpret to mean that one meter should be expressed as “1 meter” or “1 m” as opposed to just “meter”. Obviously multiplication by juxtaposition is replaced by *~ in dimensional in the case of a number and unit. (And /~ is division as used in 7.1.)

(I know you are aware of the above, just putting it here for completeness.)

Perhaps it is unfortunate that _3 is distinguished from 3, but it is not clear cut for me. See 7.10 “Values of quantities expressed simply as numbers: the unit one, symbol 1”. Even if the such quantities are expressed simply as number, that expression is arguable just a matter of convention over the more verbose 3 x 1. And I think to be precise such quantities are not actually numbers, but their units just so happen to be ratios such that the dimensions cancel.

Equating units and quantities is admittedly what I proposed in #40, but there it was for the purpose of simplifying types and code, while here it seems to cause the opposite! And having re-reviewed that NIST guide I am inclined to agree with my original design and withdraw #40.

Unless you strongly disagree I wouldn't mind seeing a branch with this feature removed per your last comment.

Regarding showIn-like functionality

OK, now it is clear what is the essential problem. It isn't as clear to me that the solution must be so tightly integrated with the core of dimensional.

How important is it that the same value level unit is used to construct quantities as to show quantities?

How would you handle UI internationalisation when the unit name is tightly integrated with the unit? Wouldn't you be forced to solve the problem with an additional layer anyway if you want a general solution?

Just brainstorming a little about the options available to get the functionality you are looking for. Feel free to comment on their (un)suitability and suggest add others.

Option 1

Take the named-units-variants branch.

Option 2

Couple the units with the names separately from the unit itself …

meter' = (atomic ("meter", "m"), meter)

… with an appropriate showIn. And use specific operators for combining units …

(n1, u1) *. (n2, u2) = (product n1 n2, u1 * u2)

If it is important to use the same value level “unit” to construct and show the unit:

x *.~ (_, u) = x *~ u

This could all go in a separate module or library.

Option 3

While it is nice to reuse the same operators (*, /, ^) for units and quantities, I would be willing to consider using separate combinators for building units if that could buy us increased flexibility and/or better compiler error messages or something. I haven't thought at all about this, and is a more general philosophical question than named-unit. But it would allow the unit stuff to go into a separate module and allow more flexibility in swapping out a vanilla units module to e.g. units with names. Is this an interesting direction at all?

dmcclean commented 10 years ago

Agreed on the entire question of _3 * meter. I will update the branch to eliminate that. I thought it made sense in my first draft, but it really isn't the greatest and has just stuck around since then.

Sorry for the moving target. I have a very non-linear working style for things that are just experimental, and that's not the best for communication. My bad.

I will refine the "why" documentation (as opposed to the surface API documentation) to address your point 1. I think also that fixing the _3 * meter thing will make it much more clear, since the primary closed type family that analyzes variants will entirely disappear.

Option 2 is certainly possible. It seems less than desirable to me, because both quantities and units (whether named or not) share essentially similar multiplicative group, they are used together, and there isn't really a good story in ordinary-Num-prelude haskell for semantically diverse syntactic overloading of the operator names that you really want ((*) ...), even moreso at the type level.

I think once I remove this type family we will have an entirely identical type API to the one in master, except for the tracking of unit prefixability. We should decide separately whether to statically track that or not. master allows kilo (kilo meter), etc.

Option 3 might be OK, but if we think master's types are at an acceptable level of confusion we can have Option 1 "for free".

Responding non-linearly to topics raised in your email which are germane to this strategic discussion:

On the units library: It seems reasonably interesting. I think it will be even moreso once we have some notion of open data kinds, until then the EmptyDataDecls kind * but-secretly-we-never-use-them-to-classify-values approach gives me a bad feeling, although there really is no reason to fear it.

It also provides type-level representation for units. We don't have that But, I also don't think I want it. The only reason I can see to have type-level for units (as opposed to dimensions) is to use fixed point arithmetic. This is a compelling use case, especially for embedded things and interop with existing libraries, but I don't think it will really be mature until we have solid type-level rationals*, which seems like it will be a while. It's also something that only benefits power users.

I'm all for borrowing the quasiquoter if we decide to build some unit name parsing. Both of those could live in -extras, though, and that would save the TH dependency from the main library.

We should definitely team up with any Haskell-using planetary scientist who wants to build related cool things, for sure. I have a bunch of astronomer friends, but they all live more in radio astronomy land and primarily use Python or IDL or Fortran.

I haven't had the chance to watch your CUFP talk video yet, I will reply on that by email.

Edward is in the room right now for Boston Haskell Users Group, I will pick his brain about his plans after the session. I was teaming with him earlier in the year with changes to his intervals library that made it more useful for use cases like ours.

Skype or phone meeting is a good idea, but I am very busy until the week after next. Here in Boston we're on UTC-4 until November.

dmcclean commented 10 years ago

Oh, and my * footnote, which is really parenthetical.

What we really need for nice type-level units isn't just type-level rationals, it's type level pairs of integers and rationals where (z, q) is interpreted as pi^z * q. But that should be very easy to implement once you have reasonable type level rationals.

See also https://github.com/adamgundry/uom-prototype (no idea if this is still alive or not) and https://ghc.haskell.org/trac/ghc/wiki/Plugins/TypeChecker.

dmcclean commented 10 years ago

One more: on the question of non-SI base dimensions: meh. I can see why it would be valuable, but it isn't in practice valuable to me. If I had it I would probably just shoot myself in the foot by trying to go on a crusade to separate angles from dimensionless things... ;)

If we had type level maps (see that same GHC trac wiki entry), I suppose we could have arbitrary base dimensions in a very principled way, because there is an obvious multiplicative group for type level maps from TheKindThatIndexesBaseDimensions to integers. That would even allow counting dimensions like "cars" and "geese", not just CGS.

dmcclean commented 10 years ago

Uh oh. I think maybe you can't do the simplification I was planning without also getting rid of statically tracked prefixability, because if you change (*) from having different variants in its two operands to having the same variant on both sides, not only do you lose the Quantity * Unit case, which we were intending to lose, you also lose the Prefixable Unit * NotPrefixable Unit case, which we of course need.

I need to let this one stew for a while, see if there is a simple solution.

dmcclean commented 10 years ago

OK, that was mind-boggling, as conversations with Edward can be.

As I understand it, which may not be very well, he is aiming to represent units (rather than dimensions) at the type level, and with a more direct encoding of what units mean that doesn't rely on decomposing them into base dimensions to perform multiplication. Utilizing (at least ideas from?) his hask library. I understood approximately 13% of his talk on that library a couple of months ago.

So, I'm not in a position to judge precisely, but it seems that it might be pragmatic to let him go his own way and continue to build something that has utility for engineering applications, and just keep an eye open for whatever staggeringly brilliant things he comes up with.

bjornbm commented 10 years ago

OK, ping me when it is a good time to take another look at named-units!

SI is all I need too, and my problems are of such scale that SI doesn't give me numerical problems.

I haven't looked closely at units yet, but I imagine type level units are handy when you use different systems. Completely made up example:

data Meter
class Unit sys dim unit where magnitude :: Num a => a
instance Unit SI (SIDim Pos1 Zero Zero Zero Zero Zero Zero) Meter where magnitude = 1
instance Unit CGS (CGSDim Pos1 Zero Zero) Meter where magnitude = 1

Agreed re Edward, let's wait and see what comes out of it.

bjornbm commented 10 years ago

Maybe I am misreading you but it seems you are less in favour of option 2 than option 3? Or do your objections with 2 applies equally to 3? API-wise, option 2 and option 3 would/could be very similar, eg:

(2 *~ second) * (5 *~ newton .* meter ./ second .^ pos2)

The difference from current master would be the different operators for units versus for quantities.

I realise though that I may not have been very clear on how I was thinking either option would be put together. Maybe I will whip up toy examples of both.

bjornbm commented 10 years ago

Oh, and also: did you have any thoughts on handling internationalisation?

dmcclean commented 10 years ago

I will let you know. Should be late next week or early the one after.

I need to think more about options 2 and 3. One question I have is why do (*~) and (*) have the same precedence in master? (I haven't changed it in the branch.) It means that you can't write 3.6 *~ newton * meter, you have to write 3.6 *~ (newton * meter). Is this desirable? Or would it be better if it was one precedence looser?

I forgot to comment on internationalisation. I think that it might be best served by going from UnitName to UCUM and allowing people to internationalize by coming back out again? Direct structural recursion on UnitName is another possibility, but our name and abbreviation choices for non-SI units are arguably less standard.

If you want source code internationalization that can be dealt with by aliases, as the metre vs meter divide is.

An approach not based on standard names would cause the expression problem, as far as I can see. And not one that could be solved by the final encoding trick, either, I don't think.

bjornbm commented 10 years ago

Re fixity: These are the same as the prelude:

infixr 8  ^, ^/, **
infixl 7  *, /
infixl 6  +, -

These are the “new” operators:

infixl 7  *~, /~, ~*, ~/
infixl 7  *~~, /~~

If I had picked infix 6 for these 3.6 *~ newton * meter would have been good, but not 12 *~ meter + 3 *~ meter. :(

Unfortunately the fixity isn't more fine-grained than that. I suppose we could e.g. demote + and - to infixl 5, but I worry that you would get other confusing behaviour/interaction with other operators. I figured the best alternative was just to accept the annoying parentheses.

If you feel strongly about it you are welcome to look into what else has fixity 5 and 6 and asses the impact. Something like 2 Prelude.+ x /~ meter (not a best practice) would break.

I think there was a proposal in the late 00s to allow relative fixity in addition to the course scale. E.g. to say that *~ has fixity 7, but also binds less tightly than *. To my knowledge the proposal didn't go anywhere.

dmcclean commented 10 years ago

Annoying parentheses it is! :+1:

dmcclean commented 9 years ago

OK, I think this is ready to be declared no longer a moving target.

If you have time, review the design that's in the named-units-2 branch in my fork.

It has everything we had mentioned wanting or is extensible without breaking changes to having everything.

The missing things being:

  1. localization for unit names
  2. a parser for unit names
  3. fixed point types with statically tracked unit values

The first two are just about legwork, all the necessary machinery is in place.

The third one is held up by not having type rationals and by it creating a requirement to use the PartialTypeSignatures language extension for prettty much anyone who declares a Unit. However it is essentially compatible with this design, because it can be implemented by creating a third Variant for fixed point quantities. You need to add an extra type parameter to the DUnit variant though and to the Unit type synonym.

dmcclean commented 9 years ago

One drawback to this, which is in my opinion fairly minor, is that we don't have https://ghc.haskell.org/trac/ghc/ticket/8177 yet and may not ever.

The role signature of Dimensional should ideally be nominal phantom representational (and in master it is), but without 8177 it gets nominal nominal nominal in the named-units-2 branch.

dmcclean commented 9 years ago

Another note: I haven't merged the angles branch into the named-units-2 branch because its fate is also up in the air, but there isn't any problem in principle with doing so, AFAICS.

dmcclean commented 9 years ago

Made a change (ec98de7f0d1c3095dd237047632f73cdc943434d) to remove existentials from the UnitName GADT. I think there was a discussion around here somewhere about whether tracking metric-ness of unit names was a step too far. I was leaning towards thinking it was, but actually most of the trickiness I was running into came from the existential types of the constructors and not from the phantom itself. Realized you don't need them because you can just use the Weaken constructor (through the weaken helper function) in a few more spots and voila they disappear.