golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.72k stars 17.5k forks source link

proposal: Go 2: Add dimensionality to enrich type system and strengthen compile-type checking #57973

Closed czetie closed 1 year ago

czetie commented 1 year ago

Author background

Related proposals

Proposal

type distance float64
type time float64
type velocity float64

var v velocity
var d distance = 100
var t time = 2

v = d / t                                                  //will not compile, incompatible types
v = velocity(float64(d) / float64(t))        //will compile, but throws away type checking of d and t

A small enhancement to the rules for type mixing in expressions would allow useful constructs as mentioned above such as distance = speed * elapsedTime, as well as other more complex calculations, while still preventing mistakes such as distance = speed + elapsedTime. The characteristic of these calculations is that they mix values of different "dimension", in this case velocity, which in turn can be defined as having dimension Length/Time; and Time. And the resulting value has dimension different from either, i.e. Length in this case. While adding values of different dimension makes no sense, multiplying or dividing them can be perfectly meaningful and extremely useful.

This proposal would allow the developer to specify, by using the type system, dimensions and valid combinations of dimensions together with the new dimension that they result in.

It could also help programmers in commercial applications where dimension might be defined more abstractly than in physics. For example, to estimate project cost we multiply "number of FTEs" by "fully loaded FTE cost" to get a result in dollars (or appropriate currency). We should not be able to use any old number defined in dollars in this calculation, but only one defined as an FTE cost. Or we might estimate project duration by dividing "number of story points" by "story points per FTE per day" and multiplying by "number of FTEs" to get a result with dimension measured in Days. Other examples of common classes of commercial mistake that can be caught with appropriate definitions include mixing quantities in different currencies; or multiplying by a fraction (0.0 to 1.0) instead of a percentage (0.0 to 100.0).

The more complex the calculation, the more valuable the checking of dimension. And to reiterate, this checking is lost today even if the programmer defines types, because in order to write calculations that mix types, they must be reduced to the underlying type, e.g. float64.

It would also MODIFY the restriction on mixing types in arithmetic expressions: it would still be forbidden to mix types in addition and subtraction, but it would be permitted to mix types in multiplication and division provide they have compatible underlying types.

type distance float64
type time float64
type mass float64
type velocity (distance / time)
type acceleration (velocity / time) 
type force (mass * acceleration)
type energy (mass * velocity * velocity)

Such type definitions could be compounded to any (reasonable) degree necessary for the domain of the application.

var v velocity var d distance = 100 var t time = 2

v = d / t //will not compile


The compiler will reject the assignment because the quotient of `d` and `t` has a different type from `v`, even though they have the same underlying types. Specifically, `d / t` has type (or _dimension_) `distance / time`, which is not the same as `velocity`- at least as far as the compiler knows! We could make it compile by the following explicit type conversions:

`v = velocity(float64(d) / float64(t))`

However, now we have sacrificed the type checking that we had hoped for by creating types for `distance` and `time`; `d` and `t` could mistakenly be variables of any type whose underlying type is `float64`.

A better approach which allows us to retain the type checking is to define the type `velocity` not as `float64` but as the quotient of types `distance` and `time`:

type distance float64 type time float64 type velocity (distance / time)

var v velocity var d distance = 100 var t time = 2

v = d / t //will compile, retains type checking


Of course, this example is very simple; real scientific and engineering formulas will often have many more variables and types. 

- **Is this change backward compatible?**
Yes. This does not remove or change any existing syntax; nor change the semantics of any existing compilable code. It extends the allowed forms for calculations (type mixing) and for specifying types in terms of other types.

- **Orthogonality: how does this change interact or overlap with existing features?**
The concept of _dimension_ is complementary to existing syntax for types, but as far as other language features are concerned, e.g. parameter passing, the end result behaves just like existing types.

- **Is the goal of this change a performance improvement?**
No.

### Costs

- **Would this change make Go easier or harder to learn, and why?**
Probably minimal change. One small addition to the syntax for types is added, and a potential confusion about mixing types is removed. 

- **What is the cost of this proposal? (Every language change has a cost).**
Modest. Existing code for determining the underlying types of variables and expressions and for converting between types, particularly to and from arithmetic types, will be relevant and can probably be reused. Additional code will be required in the code for parsing arithmetic expressions (see below under proposed implementation).

- **How many tools (such as vet, gopls, gofmt, goimports, etc.) would be affected?**
Compiler only. 

- **What is the compile time cost?**
Modest. Additional checking for type compatibility in multiplications and divisions involving defined types whereas today those expressions are quickly rejected as errors.

- **What is the run time cost?**
Nil. After compile-time checking, the code reduces to the underlying types like any other calculation.

- **Can you describe a possible implementation?**
The compiler builds a dependency tree of types defined in terms of other types all the way down to underlying types. When it encounters an expression that multiplies or divides types it traverses the tree down to _one level above_ underlying types, and verifies that dimension matches on both sides. For example, with the definitions

type distance float64 type time float64 type mass float64 type velocity (distance / time) type acceleration (velocity / time) type force (mass acceleration) type energy (mass velocity * velocity)


a variable of type `energy `has dimension `(mass * velocity * velocity)` which expands to `(mass * (distance / time) * (distance / time))`. The compiler will also expand the types of the variables to ensure that they have the same base dimensions, e.g. `(force * distance)` expands to `(mass * acceleration * distance)` and finally to `(mass * distance / time / time * distance)`. Note that in order to see that these two are the same _dimension_, they must be rearranged into a common canonical form. However, this can be as simple as a list of base dimensions and for each a list of their indexes, e.g.
mass^1
distance^2
time^-2

Therefore, verifying compatibility requires only traversing the dependency tree to the bottom and accumulating a count of the index of each base dimension.

- **Do you have a prototype? (This is not required.)**
No.

Long prose description of proposal:
[Enhancement request_ Dimensionality of Types.docx](https://github.com/golang/go/files/10490663/Enhancement.request_.Dimensionality.of.Types.docx)
ianlancetaylor commented 1 year ago

Historically Go has encouraged programming by writing code rather than coding by designing types. This change seems to make the type system more complicated in order to simplify the actual code. That's not a path that Go usually takes.

I'm also not completely clear on how this works. If velocity is distance divided by time, then distance is time multiplied by velocity. If I write type distance float64; type time float64; type velocity (distance / time) then will the compiler permit distance to be divided by time but will not permit velocity multiplied by time?

czetie commented 1 year ago
  1. The complication of the type system is minimal, and can be completely ignored by those who don't want to use it.

  2. In terms of the Go philosophy I don't see this as any less idiomatic than types in general; and certainly less so than, say, the introduction of generics; or the protections afforded by interfaces. If this is the philosophy, one might as well argue against simple types (i.e. "aliasing" of underlying types) entirely, since it creates more problems than value. In any case, I would argue that there is always value in indicating intentions to the compiler and letting it help avert mistakes, especially when (a) there is no down side as using the construct is entirely optional and (b) there is no runtime cost. I am a big proponent of simple language features that remove mental burden from the developer allowing them to focus on logic rather than, say, details of the type conversion rules.

  3. What is the programmatic alternative here? As far as I can see, the only way to do this without this enhancement is to cast everything down to float64, which eliminates the value of creating types in the first place; significantly complicates the code; obfuscates the programmer's intention for the reader (and isn't it a general principle that code is read many more times than it is written?). Am I missing something? If there were a way to achieve the same thing purely in code that I'm missing, I would be much more sympathetic to the argument that this is an unnecessary complication.

Perhaps the clarity and usefulness of this would be more clear in a more complicated, real-world example involving more variables and dimensions?

  1. I am a huge fan of the type system in Golang, which in combination with interfaces and the direction generics are going in is IMO extremely powerful while avoiding the complexity traps other languages have fallen into. And I think it can be leveraged very powerfully as a complement to coding rather than an alternative.

  2. Yes, you could also write d = v * t (with appropriate var declarations, obviously). The compiler would determine that indexes of the dimensions of v are distance 1 time -1 and of t it is time 1 leaving just distance, which matches d. And the same would apply to more complex calculations with more dimensions, e.g. energy or power.

DeedleFake commented 1 year ago

I'm unclear how often something like would be useful. Velocity, time, and distance are pretty obvious, but in a program, how often can types be related like this? Can this handle more complicated relationships that map to more programming-related problems, rather than physics? In other words, can you give an example of something that is more likely to come up in an actual project?

golightlyb commented 1 year ago

I would encapsulate the type conversions in a function like so:

func calcVelocity(d distance, t time) velocity {
   return velocity(float64(d) / float64(t))
}

Is that insufficient?

czetie commented 1 year ago

First, thank you for your engagement on this and testing whether the idea is robust.

That's as close as I could get too. (I tried doing the equivalent with an anonymous function, but for some reason couldn't get the typing to pass the compiler...).

The main difference here is that the compiler is no longer checking dimensions for us, because we have not told it explicitly that there is a relationship between velocity, distance, and time. So if we made a coding error in the function (unlikely in this simple example, more likely in realistic examples), such as mistakenly writing

return velocity(float64(t) / float64(d))

the compiler would not be able to detect it.

So I think I would say that this is considerably more verbose than being able to write v = d / t, and a lot less clear to the reader than having to jump off to read a function definition, and loses the dimension checking. It also requires a separate function for every distinct formula we want to write this way. Finally, it adds the overhead of a function call (I have no idea how efficient that is in Golang?) - unless the compiler is smart enough to inline the function, which it might do for sufficiently simple functions.

So one way to think of this is that we are in a sense "inlining" the type checking that function parameters perform and putting it into the type checking of the variables directly in the formula; and in addition we gain the dimensional analysis that guards against incorrect formulas.

Would it help if I were to show some more extended examples?

czetie commented 1 year ago

@DeedleFake I have spent the last decade primarily working with people doing scientific and engineering projects so I admit I have a kind of population bias. However, with those people dimensional analysis is something they do all the time as verification of their formulas; so it would be very valuable to that population for the compiler to verify their formulas.

However, I do think lots of other domains have the potential to benefit although in less obvious ways than physics dimensions - although it might take a bit of a mental shift to realize that their variables have dimensions. For example, I have also been working with people in distributed storage-defined software and here they have many integers being juggled in calculation such as buffer size (bytes); bytes per block; blocks per network packet; overhead (bytes); queue lengths; etc. Consequently, it is easy and far from unknown to mix up what different integers represent when calculating buffer sizes, overhead allowances, etc. So for one simple example, if we want to compute how many bytes fit in a network packet, we calculate bytes per block * blocks per network packet, which very naturally has units of bytes per network packet.

Another example comes from own experience in a factory automation system, one of the gnarliest I dealt with as field engineer. At a canned food factory there was a production line subsystem that counted cans off the line, and a warehouse management subsystem that counted them into the warehouse. However, the latter subsystem was crashing periodically with an integer overflow. Eventually I found the problem at the interface between the two subsystems: the warehouse subsystem was expecting to receive a count of the number of pallets, which it then multiplied by 16 to get the number of cases of product in stock. However, the programmer of the production line system was not passing the number of pallets, but rather the number of cases, which when multiplied by 16 caused the crash. Judicious use of definitions (dimensions) such as countCansPerCase, countCasesPerPallet, countCases, and countPallets could have prevented the confusion. Lots of errors of this class happen at the interface between subsystems (e.g. the Mars mission that was lost because of a confusion between Metric and Customary units).

For another domain example, consider project tracking/planning software. You might have lots of integer values such as number of FTEs; story points for each story; story points per day for each programmer; burn down rates; etc. As integers they can be combined in many ways, most of them meaningless. By defining dimensions such as FTEs, story points, days, etc. we can ensure that we are only combining them in meaningful ways.

czetie commented 1 year ago

@DeedleFake P.S. if you have an existing codebase I could look at and see how it might work with this proposal, I am willing to do so. I think that is a fairer test than my making up an arbitrary or hypothetical example.

OwlSaver commented 1 year ago

I really like this idea. It brings something that we naturally do every day and makes it a part of the language. In banking and insurance domains there are many dimensions that would only help to ensure the calculations are correct. It seems to me that this is a mechanical check that is actually very hard for humans to do. But it would be easy for a machine to do.

Make it a basic feature of function parameters. Then it can easily be applied to the special case of operators.

atdiar commented 1 year ago

Dimension checking is something that gets drilled into us when dabbling in physics but from a comp.sci point of view, how would someone implement this?

All I can think of seems to rely on operator overloading at its basis (which is not allowed in Go for also good reasons) . Is this what this proposal could be reduced to? Anyone has an idea?

Edit: more specifically, a subset of operation overloading restricted to subtypes. Basically, being able to further specify the operation + as a monomorphizable quasi-method plus(u, v Number) Number for instance. plus would be using a switch case and would have to be extendable for the various relations between defined subtypes of Number... Don't know if it makes sense or if it is really possible, especially when wanting to avoid whole-program analysis.

apparentlymart commented 1 year ago

I am familiar with this sort of dimensional-analysis-driven programming in some high-level languages, but I don't think any general-purpose language I've used has included it as a first-class feature.

Instead, I've typically seen it implemented by starting in a language that offers operator overloading and then using a library which allows constructing quantity types which include suitable overloads to make the arithmetic operators work correctly with them while also propagating the dimensions (and sometimes also converting units).

For example:

Of course I don't claim that the languages I'm familiar with are the only languages worth considering, but I would consider the three languages I used for the examples above to all have some intended use-cases in common with Go. (If you are thinking of some specific other languages that do have dimensions and quantities as first-class language primitives rather than as libraries, I'd love to learn about them!)

Assuming that it were valuable to model dimensions and quantities in Go, might it be more appropriate to follow the lead of these other languages and implement operator overloading as the language-level primitive, and then leave it to libraries to deal with the specific domain of engineering and scientific applications that would benefit most from dimensions and quantities?

ianlancetaylor commented 1 year ago

The proposal here is not for operator overloading. It is to extend the type system to permit certain type combinations that are currently not permitted. Right now, the language does not permit dividing a variable of type distance by a variable of type time. With this proposal, it would permit dividing variables of those types, and the result would be type velocity.

apparentlymart commented 1 year ago

I understand the proposal as allowing to define relationships specifically between types whose underlying types support the / and * operators, and nothing else.

Go today does allow multiplying two values of a single named type whose underlying type supports multiplication. For example:

type velocity float64
v1 := velocity(1.0)
v2 := velocity(2.0)
result := v1 * v2
fmt.Printf("Result is %#v, a %T", v1*v2, result)
Result is 2, a main.velocity

In a dimensional analysis system I think this is an incorrect result, because the result should be (velocity * velocity), assuming such a thing were possible to represent in Go.

I'm not sure if this one hole where the types don't work out correctly invalidates benefits of making new expressions valid, but it does give me pause.

Particularly when considering one of the examples in the proposal which already includes a "velocity squared":

type energy (mass * velocity * velocity)

This seems to suggest that products of the same type are considered for dimensional analysis when combined with another type, but what about this one?

type velocitySquared (velocity * velocity)

Would a declaration like the above change the result of my example above to be Result is 2, a main.velocitySquared?

apparentlymart commented 1 year ago

The last example in my previous comment -- where defining velocitySquared changed the meaning of velocity * velocity -- also makes me realize that as currently proposed this seems to require global analysis rather than package-scoped analysis.

By that I mean that if type Velocity were in one package and type VelocitySquared were in another package, the Go compiler might need to take into account the definition of VelocitySquared whenever it is dealing with operands of type Velocity.

Although it was the self-product case that made me think of this, I think the same seems to be true for the other examples. If type Distance and type Time are in a different package than type Velocity then the same problem would arise.

It's also possible that two different packages might define different types involving (Distance / Time), making it unclear which definition applies to a particular expression and therefore what the result type of an expression ought to be.

Perhaps it would be reasonable to limit this only to relationships between types in the same package, but I don't think anything else in Go behaves like that.


This also reminds me a little of a feature of the Rust programming language that some developers find frustrating (based on the number of grumpy questions I can find about it online): in order to call a method of a trait on a type that implements that trait, you must import the trait itself into scope even though the expression relying on it doesn't include the trait name anywhere in it.

Rust traits also deal with the problem of potentially-conflicting definitions using the so-called "orphan rules", which (roughly) require that the relationship between a type and a trait must be defined in either the crate that defined the type xor the crate that defined the trait. That means that the compiler only needs to consider two possible locations for a definition (so no global analysis) and the program is invalid if the two locations conflict with one another.

Of course this dimensional analysis problem is far less general than Rust traits, but it does seem to still have the same question of which behaviors are in scope in which locations, and of potentially-conflicting definitions in different packages affecting the same type. I'm not sure if Rust's solutions to this problem really make sense for Go.

fardream commented 1 year ago

Not specific to go, I don't quite like this kind of types in general.

1) The type is missing the unit - use the example of velocity/distance/time, what is the unit of those values. What does var v Velocity = 1 mean? Do we need to go a step further to do thing like type DistanceInMeter float64 etc? 2) Very often in numerical simulations that I worked on, I want to know what the underlying type is (for example, to call either float32 or float64 version of some library code). 3) I often reuse the memory of the same underlying type - for example, a float64 array may be used to store data for length, then updated for speed.

Because of 1) and 3), I actually have never used this kind of types for numerical simulation.

Also, not to nitpick, velocity generally also includes the direction information - and that's 3 values for a 3D velocity, plus the coordinate system. I am unsure how to properly design a system to account for those.

golightlyb commented 1 year ago

@apparentlymart I believe the result would have to be explicitly typed:

 type velocity float64
type velocitySquared float64
var result velocitySquared
v1 := velocity(1.0)
v2 := velocity(2.0)
result = v1 * v2
velocityDoubled := velocity(2.0) * v1 // type velocity
result2 := velocitySquared(v1 * v2)

@czetie thanks. I can see the benefit of the compiler checking the formula is correct.

For non-trivial formulas I would cover it by a test, but I can see how it makes a library package easier to use correctly.

I wonder if there could be some proof of concept linter that could parse a comment to start with and validate a program for correctness.

type velocity float64 // Dimension: distance / time

This could help work out things like what was raised by @fardream before commiting to a language change

czetie commented 1 year ago

Dimension checking is something that gets drilled into us when dabbling in physics but from a comp.sci point of view, how would someone implement this?

No overloading required (other than what already exists for * and /). In its simplest form it just requires lifting the restriction against mixing types in multiplication and division provided that their underlying types are numeric (while retaining it for + and - because it strongly makes sense there).

czetie commented 1 year ago

@fardream

velocity generally also includes the direction information

Actually I agree with you, but I did not want to complicate the simple examples.

  1. Do we need to go a step further to do thing like type DistanceInMeter float64 etc?

Again, yes, and real world dimensional analysis typically does this, but I didn't want to complicate the examples here. (Remember the infamous lost Mars Lander due to a Newtons vs. customary units mixup?).

Regarding the float32/float64 versions of libraries, again this makes sense. If I were designing a language from scratch I would allow function parameter overloading so that functions could be written with the same names and different parameters (32 and 64 bit versions) and the compiler would distinguish them as distinct functions, calling the matching function based on the types of the parameters. In other words, functions are distinguished not merely by name but their entire signature: name, return type, and parameter type. I suspect that may not be very Go-like, but as is probably clear by now, I like making the compiler do the work!

For the memory reuse, not to sound flippant, but that sounds more like Fortran than Go :-) which of course is what a lot of people in your kind of domain are using. I would suggest that some kind of union type would be appropriate?

czetie commented 1 year ago

Perhaps it would be reasonable to limit this only to relationships between types in the same package, but I don't think anything else in Go behaves like that.

Very good point. I also suspect that for some applications development teams will want to have a common, shared set of definitions for common dimensions and relationships (much like good old header files in C). Perhaps something that mirrors the Public/private distinction for functions would be appropriate, so that any given developer can make use of team-shared dimensionality as well as private definitions of their own? FWIW I like the idea of reusing the existing mechanism for the Public/private distinction.

czetie commented 1 year ago

@golightlyb

I wonder if there could be some proof of concept linter that could parse a comment to start with and validate a program for correctness.

type velocity float64 // Dimension: distance / time

This could help work out things like what was raised by @fardream before commiting to a language change

This is a very interesting idea. To make this work you would still need to lift the restriction on mixing different types in division or multiplication, or equivalently implicitly casting down [am I saying that right?] to the underlying types to allow it to compile, but obviously that is a much smaller work effort than implementing the analyzer.

czetie commented 1 year ago

@golightlyb

For non-trivial formulas I would cover it by a test, but I can see how it makes a library package easier to use correctly.

Yes, and I suspect you would still want tests anyway to validate the calculations. My former customers in the scientific realm used to run massive amounts of tests whenever we changed our software to ensure that previous results still came out the same, only faster.

One way to think of this is that it is essentially "folding in" a certain amount of test - which is a huge help to people who, unlike you, are less than diligent in creating or maintaining tests. (I remember seeing some astonishingly high numbers for the amount of tests that are dead code, obsolete, or trivial "return true" stubs because people see it as a burden to write.)

bcmills commented 1 year ago

20757 is closely related.

bcmills commented 1 year ago

@ianlancetaylor, I think the type-inference problem gets simpler (in some sense) if we allow variables to carry arbitrary combinations of units, and only collapse down to individual types when assigning to a variable of that type: an expression of type distance / time can stand on its own, and only needs to be collapsed to velocity when assigned to a variable of that type.

bcmills commented 1 year ago

That said, I think the velocity example is something of a category error. “Distance” and “time” are dimensions, not units — even in a system of natural units the meaning of dimensional numbers depends on exactly which system of units is in use. (For example, the existing time.Duration as a unit is properly understood as “nanoseconds”, not “abstract duration”.)

A better motivating example might be the SI base units and their derived units: Seconds, not Time; Meters, not Distance. In such a system, some meaningful units don't have widely-used names: we measure velocity in SI as “meters per second”, not simply “velocity units”.

In such a system, I think it would make more sense to separate the notion of “units” from “numeric representation”: it is more meaningful to speak of “an int64 variable storing a quantity of meters” than of “a variable of type meters”. (To me, units are more like a special kind of type parameter than like types in their own right.)

czetie commented 1 year ago

@bcmills Interestingly, in my early drafts of this idea I started from the same kind of description that you have here. And yes, strictly speaking dimension is orthogonal to type in the usual sense. I even thought about having a keyword of dimension or dim, so you might write something like:

dim meters
var poolLength, poolWidth int32 meters
var poolArea int64 meters*meters

poolArea = poolLength * poolWidth

I also had thoughts about, as mentioned upthread, about the distinction between dimensions (length, time, etc.) and units (meters, seconds, etc.). After noodling with various designs for the syntax for a while I decided that for strictly practical purposes it was less disruptive to the language (and therefore less to learn) to collapse everything into the type concept. The question would be, is it likely that in a given project we would want to have more than one variable type (int32, int64) to have type meters? If so they should be separate. However, this is very much something where arguments can be made on both sides and may the best argument win.

I also had in mind the kind of non-scientific realms where dimensions are largely "counts" rather than metrics. For example in my hypothetical project planning tool we might have something like

dim FTE                             //Full Time Equivalent workers; could be employees or freelancers
dim durationDay             //Duration measured in days
var headCount int32 FTE  //count of the number of people on the project
var projectDuration int32 durationDay
var projectCostFTEs int 64 durationDay * FTE
...
projectCostFTEs = projectDuration * headCount
atdiar commented 1 year ago

@ianlancetaylor @czetie Ah, yes it seems that it is mostly a typing concern. I initially thought about operator overloading because for the most part, current operators work for values of the same type. That would allow to have operators available between defined types. It's true that the implementation of the operation would remain the same however.

@apparentlymart Regarding the restriction to * and /, that seems wise and easier. And explicitly deals with dimensions.

I still can't help but wonder if there isn't a general treatment somewhere to deal with time and temperature (and other) where deltas can be added but not the values themselves in most cases (unless we want to compute an average but a generic average function could deal with that easily).

Seems that restricting related definitions to a same package could perhaps work. Would have to check.

@bcmills Oh that's right :) I think that's something to really think about. Often people add distances, but it doesn't make sense to add time, only durations could be added to times and return another time. Or... https://physics.stackexchange.com/questions/132720/how-do-you-add-temperatures

bcmills commented 1 year ago

I still can't help but wonder if there isn't a general treatment somewhere to deal with time and temperature (and other) where deltas can be added but not the values themselves

The general treatment is to not provide arithmetic operators for such types. (time.Time provides an Add method, not a + operator).

Note that most of the problems with time.Duration detailed in #20757 arise from interacting with unitless values (multiplying, dividing, and converting), not interacting with time.Time itself.

atdiar commented 1 year ago

I see. I think that's something that should be taken into account.

The current proposal defines time as a float64 and hence, has access to the regular operators. When would that make sense?

Ack. re. #20757, there are also similar problematics that occur when dealing with currencies in financial software (if one is not careful about number representation and precision)

apparentlymart commented 1 year ago

Returning to this again today with a fresher brain, it does kinda feel to me that this proposal reduces to being a special syntax for defining certain kinds of functions involving named types whose underlying types support multiplication and division.

(Note: I'm using "function" here only in a conceptual sense; I don't necessarily mean literally func statements in the existing language.)

The notable differences from normal function declarations are:

Of these two differences, the second one could in theory be handled by code generation: an input file describing just the dimensional relationships could be transformed into a set of functions covering all of the possible operations and the types they ought to produce. The content of those functions would not be checked by the compiler (presumably they'd use conversions to and from the underlying type) but the calls to them would be checked, so assuming that the code generator is correct it would create a type-safe abstraction.

The goal of using built in operators rather than function call syntax can only be met by changing the language though, and so for me this seems to be the key question: is this a broad enough need to warrant growing the language to support it, rather than relying on third party tools and/or libraries?

(I do see that calling generated functions with function call syntax would not be nearly as ergonomic as using operators, but there are lots of things that would be more ergonomic with language support that we accept as being libraries instead because they are not broad enough to justify new language complexity.)

czetie commented 1 year ago

@atdiar timein this context should be duration. It makes sense to multiply durationby speed but not time(as properly defined) by speed.

czetie commented 1 year ago

@apparentlymart That's certainly one way of looking at it. Many years ago I worked a lot with code generators when they were popular, and finally a wise man told me: "Anything that can be generated can be generalized. And anything that can be generalized should be pushed down into the framework" (in this case, the "framework" being the language itself). It might have been Rod Johnson, creator of the Spring IOC framework, but this was a very long time ago!

I do think your parenthetical here is very important: is this one of the things that is useful enough to justify new language constructs? Obviously, I think so or I wouldn't have proposed it, but that is always the balancing act and honestly I think a lot of languages in the Java/C#/C++ world (and even Python) have strayed too far from simplicity.

So I would note in passing that there is a minimalist version of this proposal that could test the usefulness with zero language change and that is simply to loosen the existing restriction that numeric types cannot be mixed in arithmetic expressions: keep the restriction on + and - because it rarely makes sense to add numbers of different types; but allow for example expressions such as speed * duration. Without the formal dimensional analysis the onus is on the programmer to get the units and dimensions correct, but the use of naming and types can still be used to prevent other type errors such as passing numerics of the wrong domain into expressions or as parameters (e.g. passing float64 speed when float64 distance was expected). Today that kind of exploitation of typing is all but impossible because of the constraint on mixing types in arithmetic.

Note that this could even be done non-disruptively in Go 1 since it doesn't affect current correct code at all, as a testbed for whether people do even want to use types this way and whether it is worth then investing in the full dimensional analysis in Go 2.

golightlyb commented 1 year ago

Unless im misunderstanding, I feel like dropping that restriction would make errors more likely.

I think the value would be in the compiler or a linter being able to automatically catch mistakes like return velocity(float64(t) / float64(d)) and thats (to me!) the important bit independent of the other discusions around changing how operators work.

apparentlymart commented 1 year ago

I would agree with @golightlyb's reaction.

With the current rules in place it forces a developer to either explicitly lower to naked float64 or to write a wrapper function to hide that behind a safe abstraction (a function with named types as its arguments and return value) that is harder to use incorrectly.

Weakening the rules would reduce the effectiveness of the safe abstraction because the most obvious code to write -- value / otherValue -- would compile and work without errors, and so a busy author not paying close attention to the library docs may not even be prompted to think about whether they are using the named types consistently or whether there might be a named function that encapsulates the calculation they were trying to make.

I think I prefer the current state where some explicit action is required, because it at least gives the developer a signal to pause and think a bit more deeply about what they are doing. If the goal were to just make the / and * operators work without any additional constraint then you could achieve that today by just using float64 directly as your type.


Separate reaction to a different part of @czetie's response:

I would agree that code generation is rarely my first choice for solving a problem, but in the Go community in particular it is established as one of the available options when solving certain kinds of problems, to the extent that the toolchain has built-in support to help with it. For example, it's the idiomatic answer to generating fmt.Stringer implementations for types that are behaving as enumerations.

I will acknowledge that the earlier problem of parameterized types was originally solved via code generation tools but did later become a built-in language feature, I imagine in part due to the inherent limitations of code generation as a solution to that particular problem. This new problem might have some similar challenges for code generation, because it does seem to be in a similar problem space as generics.

But it seems to me that Go in particular doesn't subscribe uncritically to that wisdom about generating and generalizing; for some problems that is true, but for other problems code generation is viable and pragmatic to lower overall language complexity. It's important to justify this particular need not being a good fit for code generation, rather than to just rule out code generation as a solution for all problems.

czetie commented 1 year ago

@apparentlymart If you just use float64, you lose the protection that using types gives you against mixing types in addition and subtraction. At it's most fundamental, the observation here is that (assuming appropriate type declarations)

distance + duration does not make sense, and it's helpful for it to be caught by the compiler;

distance \ duration can make sense if the programmer explicitly says so, and it's helpful for it to be allowed by the compiler.

With Go's rules as currently defined you can either use float64 and lose the former; or use explicit types and jump through hoops to achieve the latter. Hence my comment that this loosening of the limitations ought to be considered even without dimensional analysis of the resulting formula as it makes typing of numerical values much more useful.

Or to put it another, allowing this extension to the semantics of types is a way of putting the encapsulation inline, where the formula is explicitly visible, which may well be preferable in many cases. Bearing in mind that code is read many more times than it is written, and often by strangers, (and in many cases we are strangers to our own code after the passage of time), my opinion is that this very small change to semantics - as mentioned, desirable in its own right anyway - is much cleaner and more readable than a proliferation of small functions, whether hand-written or generated, and easier to maintain than obscuring the formula behind a function. Of course, sometimes encapsulating the formula in a black box is indeed the right thing to do, and there is nothing in this proposal to prevent that.

OwlSaver commented 1 year ago

In my career I have worked on the development of one assembler and three different compilers. It always amazed me how we would focus on language constructs without really caring about the real world implications. I spent months arguing that we should include at least an option for array bounds checking. We never added it.

It seems to me that the semantics of types as we know them today is really just renaming. In the real world we have types like Transaction Amount, Social Security Number, Velocity, and such that have deep semantics that the language should understand. By that I mean it is either built in or any programmer addition should work as though it were built in. Not an afterthought.

We have a long way to go.

apparentlymart commented 1 year ago

It sounds like the revised simpler proposal is something like:

The * and / operators can be applied to any pair of named types whose underlying types support those operators and whose underlying types are the same.

I think that leaves one thing unstated: what is the type of the result?

type speed float64
type duration float64

s := speed(1.0)
d := duration(2.0)
v := s * d

Is the type of v now float64? Is it better to silently reduce the result to float64 rather than making it explicit as today? float64(s) * float64(d) -> float64.

We could also define it such that v is either speed or duration but I can't really give a justification for that other than "arbitrarily choose the type of the first/second operand".

Does your intended goal here suggest one of these answers as being more appropriate than the others?

atdiar commented 1 year ago

Is there the option to keep it disabled unless the product is specifically allowed too?

Such as:


type speed float64
type duration float64
type distance = speed * duration // an alias because why not? It should be convertible to float64

s := speed(1.0)
d := duration(2.0)
v := s * d

s := speed(1.0)
d := duration(2.0)
v := s * d // compile error

var dist distance
dist = s * d // ok since it is defined above

?

Because I don't see why the type of the return value should be float64.. (but then that's a language change, yes) As you mentioned earlier, there is still an issue with e.g. speed x speed. Might not be a huge issue perhaps, since speed values should not be assignable to any potential type speedsquared = speed * speed variables.

Basically, as a constraint, such types would assert that they are the result of operations between specifically defined float64 values. It's more than a mere float64 hence type conversion would not be two-way anymore. (can see type distance as a form of product type, a float64 wouldn't be convertible into a value of such a product type)

Also thinking about commutativity. If combining multiple such constructed types, don't know whether there'd be a problem or not.

fardream commented 1 year ago

Speaking of constraints on the operations - is calculation of quantities like Reynolds number allowed? That will be basically:

(mass / distance / distance / distance) (speed) distance / (weight / speed) / (distance * distance / time)

czetie commented 1 year ago

@fardream I don't see why not - and in fact, something that complex is an excellent example of when automated dimensional analysis would be particularly valuable. The same principle applies: the compiler just has to go through the equation counting up the indices of each base type (it can exploit the same parse tree that it builds to evaluate expressions).

czetie commented 1 year ago

@apparentlymart > Does your intended goal here suggest one of these answers as being more appropriate than the others?

My preference is definitely for the complete solution. However, if it helps to address some of the concerns raised, this could be done in two stages.

Stage one is to remove the restriction on mixing types when the operators are * and / so that people can see how much more useful that makes type aliases for numeric types (and the existing restriction for + and - operators would remain, because it still doesn't make sense to write for example speed + time). Because stage one does not require any syntax changes, only a semantic change to type mixing rule, it could readily be done in Go v1. This would also provide a platform if somebody wanted to build a proof of concept for the dimensional analysis e.g. as a linter.

Stage two is to add the dimensional analysis to the compiler, and since this involves a language change (albeit small), i.e. compound type aliasing, it is best reserved for Go v2.

Does that make sense?

beoran commented 1 year ago

This long article seems interesting here: https://gmpreussner.com/research/dimensional-analysis-in-programming-languages

Now that we have generics perhaps they can be used for this looking at the examples from other languages?

OwlSaver commented 1 year ago

I just finished that article and was coming here to share it. The article is a good survey of what has been done in other languages. It is from 2018. The end is good where the author presets some alternatives and their pros and cons.

https://gmpreussner.com/research/dimensional-analysis-in-programming-languages

atdiar commented 1 year ago

@czetie just wondering. Isn't there still the question of what the return type should be if we just relax rules around operations as you imply? (@apparentlymart original question I believe)

@beoran @OwlSaver nice find! I wonder if parametricity could be useful to account for units. If we could write for instance:

type distance[M unit] float64
type duration[S unit] float64
type speed[T, U unit] = distance[T] / duration[U]

type speedSI = speed[meter, second] 

// where meter and second are types that satisfy the unit interface

Don't know if it would make sense, purely brainstorming.

czetie commented 1 year ago

just wondering. Isn't there still the question of what the return type should be if we just relax rules around operations as you imply? (@apparentlymart original question I believe)

I'm not I'm understanding the question, but I'll take a shot and if I have misunderstood please let me know

I have been writing code like this:

type distance float64
type time float64
type mass float64
type speed (distance / time)
type force (mass * acceleration)
type energy (mass * speed * speed)

var e energy
var m mass 
var s speed

e = m * s * s

I believe you suggesting that the last line here should be explicitly type-converted to the type of the result:

e = energy(m * s * s)

If so, I think you are right. The proposal is essentially the same, except that what we are now saying is instead of "the expression must have the same dimension as the variable to be assigned to", we are saying "the variable must have the correct dimensions for the type conversion (which in this case it does).

So in effect this becomes an extension of the type conversion rules:

Does that help?

atdiar commented 1 year ago

Yes, overall the idea is there.

I'm still unsure about whether that would just be a change in conversion rules. Perhaps it really has to be checked during assignment rather. The idea being that otherwise, it might look like it is possible to convert meters into square meters for instance. Or perhaps there is something else that would need to change I'm not sure.

But other than that, yes :)

apparentlymart commented 1 year ago

The general idea of most of my questions so far has been to try to make the proposal more concrete/complete by describing it as precisely as I can and then noting aspects that seem to not yet be fully specified.

In a few cases I also made some observations about what I might expect based on similar decisions elsewhere in the language, but some of the things I've asked just seem like tricky design edge cases where the answer isn't clear to me, so I'm asking in an attempt to understand what you'd expect to see (as someone who has more experience with programming with dimensional analysis in other languages) and fit that with what seems possible to achieve and what interacts well or less well with existing language features.

With all of that in mind, I'm now thinking about this idea of implementing in two steps:

  1. First: Allow multiplying or dividing any combination of named types whose underlying types naturally support those operators.
  2. Later: Allow declaring relationships between those types so that the result type can be derived from the types of the operands using dimensional analysis.

It's important to define for any operation what its return type should be, so splitting this into two steps requires step 1 to answer the question I previously posed about what type a multiplication or division of two different named types should return. Concretely, the question is what the following program should print, and why:

type A float64
type B float64
a := A(1.2)
b := B(2.5)
fmt.Printf("%T", a * b)

It seems like in this two-step proposal the second step would give an answer to this question if there were a type defined as the product of A and B, but for this to be implemented in two steps the first step must itself be fully specified.

I think some earlier questions still apply for step 2:

I think doing this in two steps also adds at least one additional question:

I want to be clear that I'm not asking these questions with a goal of catching you out by telling you that you gave the wrong answer. I don't know the answers to these questions myself. I'm curious about your thought process leading to the answers at least as much as the answers themselves, because that can help to understand the shape of the problem and the scope of the proposed solutions.

czetie commented 1 year ago

@apparentlymart > I want to be clear that I'm not asking these questions with a goal of catching you out by telling you that you gave the wrong answer.

Absolutely clear, and I'm happy that you're as invested in this proposal as you are, and these are exactly the right questions to ask if this concept is to become an implementable design. Many of them are simply things I had not thought of myself.

If step one makes it valid to multiply or divide arbitrary numeric types without explicit conversions, does that remain valid in step 2?

This in particular is an important question, and not one I have worked through because it wasn't originally part of my thought process. If the answer is No then to me that implies that Step 1 should be done in Go v1, and step 2 in Go v2, because it will existing code no longer compile (although the fix is probably automatable). If the answer is Yes then it leaves a hanging question about the type of the result and what it can be validly assigned to, especially as the Go philosophy is averse to implicit type conversions.

gophun commented 1 year ago

Step 1 should be done in Go v1, and step 2 in Go v2

@czetie I just want to point out that it's highly unlikely there will ever be a v2 of Go. "Go 2" was a conceptual name under which major new features such as modules, error inspection and generics were introduced. But after it turned out that they could be introduced without major breakage, that name is no longer to be taken literally. The idea of ​​introducing a new major version of Go with breaking changes has now been discarded as far as I know. In fact, the plan is to take backward compatibility even more seriously.

czetie commented 1 year ago

@gophun Thanks for the heads up. I saw a lot of references to it and had no idea it had gone away.

That makes it even more important to work through in design the implications of a two stage approach, if indeed that is considered desirable.

ianlancetaylor commented 1 year ago

Thanks for the proposal.

This is an interesting idea. However, it might be more appropriate for an analysis tool, not in the language proper. The analysis tool could interpret special comments describing the type relationships, and then verify that all conversions were applied appropriately. That might be even better than this proposal because the tool could also verify units (acre foot vs. cubic meters, or whatever).

In general Go's type system is quite simple, and this makes it more complex without providing a clear enough gain. This new functionality is also not generalized across the type system, in that it only works for scalar types.

Also the emoji voting is not in favor.

Therefore, this is a likely decline. Leaving it open for four weeks for final comments.

czetie commented 1 year ago

@ianlancetaylor Thank you for your thoughts. A couple of final thoughts for your consideration before final close: