dart-lang / language

Design of the Dart language
Other
2.62k stars 200 forks source link

The 'Word problem' for the semantics of Dart. #2243

Closed modulovalue closed 2 years ago

modulovalue commented 2 years ago

Edit: See the following comment for a more useful description.


There is a common misconception around algebraic data types that I would like to point out.

Algebraic data types are not just about sum types. They are called that way to make the duality between traditional algebra, (one could call that an interface) and product and sum types (an implementation of that interface) more obvious.

Here are some examples that should hopefully make it clear why that is hugely important:

Consider these two representations:

product Product {
  int get i;
  int get j;
  Sum get sum;
}

sum Sum {
  case CaseA {
    int get a;
    int get b;
  }
  case CaseB {
    int get c;
    int get d;
  }
  case Case B {
    int get e;
    int get f;
  }
}

And

i * j * ((a * b) + (c * d) + (e * f))

The constituents of a product type, and each case type, are interspersed with a multiplication operator. The cases of a sum type are interspersed with an addition operator. Cases are parenthesized.

We can observe very important facts that follow the intuition that most people acquire in Grade 2/3/4 for integers:

Consider this example:

product Product {
  int get i;
  int get j;
  Sum get sum;
}

sum Sum {
  case CaseA {
    int get a;
  }
}

// = i * j * (a)

It is equivalent to:

product Product {
  int get i;
  int get j;
  int get a;
}

// = i * j * a

=> Product types can be simulated with sum types.

Consider this example:

product Product {
  int get i;
  Sum get sum;
}

sum Sum {
  case CaseA {
    int get a;
  }
  case CaseB {
    int get b;
  }
}

// = i * (a + b)

It is equivalent to:

sum Sum {
  case CaseA {
    int get i;
    int get a;
  }
  case CaseB {
    int get i;
    int get b;
  }
}

// = (i * a) + (i * b)

=> Rules for dealing with mathematical expressions, learned at school and taught by non-programmers apply directly to the act of modelling domains within programming languages.

There are many more examples that are obvious once this connection has been made.

Most people already know what algebraic data types are, they just don't know it yet! I hope that Dart could build on that intuition/duality and make it obvious to people.

munificent commented 2 years ago

Yes, and function types are exponentials. The "algebraic" part of algebraic datatypes is a very cool property of functional languages.

However, I don't think that correspondence is very meaningful for Dart. Dart is an object-oriented-leaning multi-paradigm language. It's heavily based around nominal types and subtyping. It's never going to be Hope or Miranda, and I think it's unlikely that pushing for an algebraic interpretation of its types would actually help many users learn it.

modulovalue commented 2 years ago

@munificent Can you at least give me a chance to explain what I was thinking about? If I have failed to make my point clear then I'm sorry, but closing this issue like that is not very welcoming.

The "algebraic" part of algebraic datatypes is a very cool property of functional languages.

That's one of the misconceptions that I was referring to. It's not about functional languages, It's about that these properties apply to those structures independent of the language. They could already be useful, it's just that specifying any user visible feature around e.g. visitors is very hard.

It's never going to be Hope or Miranda

Could you perhaps point out what ideas you are referring to within those languages? I'm not familiar with them so I can't make any connections here.

munificent commented 2 years ago

@munificent Can you at least give me a chance to explain what I was thinking about? If I have failed to make my point clear then I'm sorry, but closing this issue like that is not very welcoming.

I'm sorry, it wasn't my intent to shut you down. I closed the issue because I didn't see anything concretely actionable in it. It's totally fine to open issues to spawn discussions.

The "algebraic" part of algebraic datatypes is a very cool property of functional languages.

That's one of the misconceptions that I was referring to. It's not about functional languages, It's about that these properties apply to those structures independent of the language. They could already be useful, it's just that specifying any user visible feature around e.g. visitors is very hard.

I'm sorry but I didn't follow this. My understanding is that modeling (or at least thinking of) types in terms of an algebra is a property of a couple of statically typed functional languages.

It's never going to be Hope or Miranda

Could you perhaps point out what ideas you are referring to within those languages? I'm not familiar with them so I can't make any connections here.

Maybe it will be easier to turn this around. Can describe what concrete changes to Dart you think would be required to support what you have in mind?

leafpetersen commented 2 years ago

The "algebraic" part of algebraic datatypes is a very cool property of functional languages.

That's one of the misconceptions that I was referring to. It's not about functional languages, It's about that these properties apply to those structures independent of the language. They could already be useful, it's just that specifying any user visible feature around e.g. visitors is very hard.

I'm sorry but I didn't follow this. My understanding is that modeling (or at least thinking of) types in terms of an algebra is a property of a couple of statically typed functional languages.

I also don't really see where you're going with this @modulovalue . It is true that the types you list in your issue can be shown to be isomorphic given a reasonable equational theory, but that doesn't mean that any practical programming language that I'm aware of equates them. In languages like ML, Haskell, etc those are all represented differently (a compiler for those languages may attempt to take advantage of those isomorphisms to make an user invisible representation change, but even there it must be cognizant of changes in the allocation lifetime behavior in a way that a purely equational system would not).

Maybe it will be easier to turn this around. Can describe what concrete changes to Dart you think would be required to support what you have in mind?

Yes, this would be helpful, or alternatively, what problem are you trying to solve?

modulovalue commented 2 years ago

I agree with both of you. I should have provided concrete use-cases and I can see how the issue description can have sent the wrong signal. I'd like to apologize.


There's an implicit assumption that I have observed when a discussion is about Dart: If the context around that discussion is about soundness, then any discussion about Dart is only about the subset of Dart that is maximally sound with respect to certain runtime errors that the type system can eliminate.

Therefore, I am making the assumption, that when the context is not about soundness but functional programming, that those dicussions are about the subset of Dart that is maximally 'functional'. Therefore, properties that can only be had within a purely functional language, such as referential transparency, are also assumed to hold for Dart. And that's why it seems to me that talking about properties of functional programming languages can be relevant to Dart.

What is this issue about

This issue is about the idea that first class ADTs could increase the expressivity of the functional subset of Dart and that this new power could be used to increase developer productivity in other ways that do not include changes to the Dart language.


I've been using simulated sum and product types within Dart for some time now, and when working on updates to a model of a domain, most of my time is spent bringing the rest of the codebase in sync with the model through manual and repetitive transformations. The dart analyzer makes that very easy and safe by reporting errors, but its extremely repetitive and time consuming. I believe others will experience this as a bottleneck as well. I hope that the design of proper first class ADTs could intentionally pave the way for automating such transformations by for example limiting them just the right way, even if it upsets some users*.

* Dart went against convention, and made nullable types much less powerful than for example in Swift. I initially did not like this decision, but I've realized that that decision captured the essence of nullable types and was the right decision to make.

Note: I understand that what comes below is a heavily underspecified problem. So far, I haven't found any solution that is general enough to cover a significant set of the use-cases that I can think of.

How does this relate to the language team?

I've noticed a pattern where analysis improvements provided by the language team have the requirement to be sound, while analysis improvements provided by the linter team can be complete e.g. the invariant_booleans lint. It seems to me that defining a useful and sound subset of such transformations over the Dart language is not trivial as possibly certain properties would have to hold to guarantee soundness.

Below are some examples for a tiny subset of valuable transformations:

Examples

Consider the following example of a model for a Human.

product Human {
  int get age;
  String get name;
  String get occupation;
}

Use-case 1

We need to also track a gender. We have to apply the following transformations:

  1. Manually add String? get gender.
  2. Check that all instantiations of Human have been provided a gender.
  3. Verify that all uses of the Human type take gender into consideration (either by ignoring it or making use of it).

Step 2 can be automated. Instead of requiring from a developer to manually add ... gender: null, ... to all instantiations of Human (which is trivial, but nonetheless, could introduce a bug), a user could have the ability to invoke a refactoring action that does that for him. Only a type, a name and a default expression would be needed. It seems to me that in a codebase that takes advantage of the soundness properties of Dart, such a transformation could also be guaranteed to be sound.


Use-case 2

The requirements for a gender are more clear now. We have been told that we have to support two known genders, an other option and a default value for backwards compatibility.

We have to do the following:

  1. Manually add a Gender sum type:
sum Gender {
  male,
  female,
  other {
    String get gender;
  },
  unknown,
}
  1. Replace the String type of Human.gender with Gender
product Human {
  int get age;
  String get name;
  String get occupation;
  Gender get gender;
}
  1. Replace all null genders with GenderUnknown(), replace all "Male" genders with GenderMale() and so on. This, again, is a highly automatable process assuming that our refactoring action accepts a rich enough specification.

Use-case 3

We don't need to be backwards compatible anymore. We can remove Gener.unknown

  1. Remove the unknown case in Gender
  2. Remove all code paths that explicitly match over Human.gender and handle Gender.unknown..

Step 2 appears to be very automatable as well.


What mistakes could hypothetical first class ADTs make?

A decision that I made, within my generated ADTs, that I believe now was a mistake, is the following: I should have respected the commutative property of ADT's and not have depended on a fixed order of its constituents: (age * name * occupation) should be the same as for example (name * occupation * age). An automatic toString() method, or a constuctor that takes unnamed parameters would already violate that. If we keep commutativity, our set of possible sound transformations could possibly be significantly more richer.

I believe that Dart users should not have to think about any of this and that they should be allowed and encouraged to apply their intuition, e.g. that multiplication or addition is commutative, to ADTs.

leafpetersen commented 2 years ago

. I believe others will experience this as a bottleneck as well. *I hope that the design of proper first class ADTs could intentionally pave the way for automating such transformations by for example limiting them just the right way, even if it upsets some users.**

I think I understand a bit better where you're coming from, thanks for the examples. I'm not entirely sure I understand what you would like to be different about the language feature though. That is, all of the refactors that you describe above seem to me to be equally applicable to the language feature as currently specified as to a more traditional "functional" style ADT specification. It is true that Dart supports some runtime check (dynamic) features that make refactors not 100% reliable, but I think you are positing by assumption that the programmer is willing to restrict themselves to the statically checked fragment. So this feels to me like mostly a tooling request: that is, you would like the IDEs to support refactors such as what you describe? Is this fair? Or is there something concrete about the current specification which you believes makes it harder to specify such refactors?

munificent commented 2 years ago
  1. Manually add a Gender sum type:
sum Gender {
  male,
  female,
  other {
    String get gender;
  },
  unknown,
}

You can idiomatically model this today in Dart using subtyping:

abstract class Gender {}

class Male extends Gender {}

class Female extends Gender {}

class Other extends Gender {
  final String gender;
}

class Unknown extends Gender {}

The remaining steps are then the same.

The only thing (as far as I know) that a family of subtypes lacks compared to a sum type is exhaustiveness checking. If you want to attach different behavior to each type of Gender you can define it as an abstract method on Gender with overrides in each subclass. If you fail to implement it in a subclass, the compiler will tell you. But that requires you to be able to modify those classes and spreads the behavior across all of them.

If you want to define all of that behavior outside of those classes and in one place, you want something like pattern matching over the types. The pattern matching proposal supports exactly this. It also supports exhaustiveness checking so that if new subtypes are added, the compiler tells you you're missing a case.

Is that what you're looking for or is there something more fundamental that I'm missing?

modulovalue commented 2 years ago

I try to not use the word refactoring because it seems to generally be accepted that a refactoring can alter the runtime behavior of a program. A transformation, such as an optimization pass of a compiler, can't have any observable effects on the runtime behavior of a program unless explicitly given permission to do that.


Response to leaf:

Or is there something concrete about the current specification which you believes makes it harder to specify such refactors?

If you are referring to the macro proposal, Yes. Macros applied on classes would change the semantics of Dart in way that would remove the commutativity guarantees of the members of a class because macros expose the order of class members.

It seems to me, that to maintain commutativity, Lists like that would need to be an unlinked Set with an explicit comment that the order can, and will change, but this would be somewhat inconvenient for the JSON serialization use-case of macros so its a tradeoff.

Or is there something concrete about the current specification which you believes makes it harder to specify such refactors?

If you are referring to this specifiction Patterns Feature Specification No, but I believe that it contains some false assumptions: (I'm not implying that the author isn't aware of what I'm pointing out, it's just that there's no mention of that and no discussion about the consequences):

Proving algebraic properties, such as commutativity (looked at from an ADT perspective), for any given instance of a Product Type, I believe, can be undecidable because Dart is turing complete. So having the semantics of Dart guarantee properties such as commutativity, at all times, if possible, is the only way to guarantee that transformations that require commutativity preserve the semantics of a program and terminate.

I think you are positing by assumption that the programmer is willing to restrict themselves [...]

A more accurate representation of what I'm implying would be:

A user has no choice, but to restrict themselves if they want such transformations to preserve semantics, and be expressive and terminate.

It's a completely different question whether paying attention to preserving such algebraic properties would help Dart as a language with respect to e.g. adoption (resource are not infinite, sadly opportunity cost is real). I believe it would help Dart, but that's an opinion and I can't prove that.


Response to bob:

Everything you said, I fully agree with. That's almost exactly what my code generator does today and sadly not enough people are aware of what you said.

My point is the following:

One of the problems that I'd like to see to be solvable by a semantics preserving transformation is to automatically migrate the program from one Category (using Category Theory terminology) or from one Point (using Homotopy Type Theory terminology) to a different Category or Point using a user specified Arrow (using Category Theory terminology) or a user specified Path (using Homotopy Type Theory terminology).

This hypothetical user supplied specification for an Arrow or a Path between String? and Gender could look like this: (Not a proposal, just a hypothetical example:)

TypeAndValueTuple migrate<T extends String?>() {
  match (T) {
    case Null: return (GenderUnknown, GenderUnknown());
    case "Male": return (GenderMale, GenderMale());
    case "Female": return (GenderFemale, GenderFemale());
    default value: return (GenderUnspecified, GenderUnspecified(value)); 
  }
}

After applying this hypothetical transformation to Human.gender the migration would be done and proven to be structure preserving with respect to the semantics of Dart.

But to safely automate such a migration from String? to Gender on Human.gender from a specification like that above, without having the user to inspect any code, we would need a proof of certain properties. It would be best if the semantics of Dart could imply a rich set of (algebraic) properties that we could assume to automatically hold for all programs.

munificent commented 2 years ago

I think I somewhat follow you, but I'm not very familiar with category theory and not at all familiar with homotopy type theory.

At a high level, yes, we consider automated code rewriting tools to be an important part of the Dart user experience. We ship a bunch of "quick fixes" like that with our tooling. It's definitely important for those automated fixes to be correct.

However, I don't think we have considered it a concrete requirement that language features must provably enable those kinds of transformations. I'm not sure what it would look like to apply a requirement like that. Instead, we just try to design a language whose semantics are sensible and amenable to local reasoning. When designing new quick fixes, we try to think very hard about edge cases where it could break down.

I don't think we have considered it a priority to enable users to easily author automated fixes. If that was a priority, it might change how we think about evolving the language. But even so, I'm not sure how.

modulovalue commented 2 years ago

I did more research and it seems to me that what I have tried to describe in this issue can be very succinctly expressed by making a connection to a problem from Computational Mathematics:

_Can the semantics of Dart be optimized to maximize the amount of useful Dart programs for which solutions to the Word problem are decidable?_

Quote:

[...] A deep result of computational theory is that answering this question [the Word problem] is in many important cases undecidable. [...] in (untyped) lambda calculus: given two distinct lambda expressions, there is no algorithm which can discern whether they are equivalent or not; equivalence is undecidable. For several typed variants of the lambda calculus, equivalence is decidable by comparison of normal forms.

First class ADTs would probably make that a lot easier. However, at this time I can only see heuristics (i.e. ADTs should maintain as many algebraic properties such as commutativity as possible) and not clear requirements.

munificent commented 2 years ago

However, at this time I can only see heuristics (i.e. ADTs should maintain as many algebraic properties such as commutativity as possible) and not clear requirements.

Yes, I think the way we generalize these heuristics is under the broader umbrella of "it should be possible to reason locally and statically about the semantics of code". The easier it is to correctly and precisely understand what a piece of code means and the less surrounding context required to do so, the easier it is to automatically transform that code into something equivalent.

Given that Dart has no notion of purity (outside of the very limited const stuff), is heavily imperative and mutation-oriented, and uses non-ideal machine primitive datatypes (fixed sized numbers, UTF-16 strings, etc.), I don't think we're likely to ever get to many formally provable properties about which transformations are valid.

Heuristics around local reasoning—which also make the language easier for humans understanding and transforming code!—are probably all we'll get but also seem to be sufficient.

modulovalue commented 2 years ago

I did some more research:

The reason why the word problem is decidable for the simply typed lambda calculus is because the simply typed lambda calculus is not turing complete and is strongly normalizable. To solve the word problem for Dart, it would not be enough to look at a functional subset of Dart, but it seems that we would need to look at a turing incomplete subset of Dart. And because the simply typed lambda calculus does not support parametric polymorphism we would most likely need to give that away as well.

This approach for making better decisions around the design of ADTs (or designing semantics preserving transformations) sounds very impractical and is probably not reasonable at all to ask the language team to deal with or investigate further so I'm going to close this issue.