dinfuehr / dora

Dora VM
MIT License
490 stars 31 forks source link

struct syntax consolidation #239

Closed soc closed 2 years ago

soc commented 4 years ago

(This is not about supporting structs in the runtime, but purely about consolidating some of the syntax that solely exists for structs in favor of existing syntax for classes.)

The intention is make it straight-forward to migrate between class-ness and struct-ness by making struct definitions a subset of class definitions. The restrictions are:

Now:

// definition
struct Foo {
    a: Int32,
}
// use
Foo { a: 1 }

Proposed:

// definition
struct Foo(let a: Int32) {
  // fun definitions only, no let or var
}
// use
Foo(1)
soc commented 4 years ago

@dinfuehr Are you fine with this?

soc commented 4 years ago

I'd probably even parse it with the same code as a class¹ and then check the restrictions afterward to have the best possible error reporting whether the i. e.:

match self.token.kind {
  ...
  TokenKind::Class  => { ...; parse_template(Class, ...); ... }
  TokenKind::Struct => { ...; parse_template(Struct, ...); ... }
  ...
}

¹ e. g. by generalizing Class to Template in the parser to hold a bit that stores "class or struct" that decides on whether a Template will be turned into a vm:: class or struct

dinfuehr commented 4 years ago

I am on-board with replacing Foo { a: 1 } by Foo(1). I wouldn't change syntax for defining structs though for now. While I agree we should keep syntax for classes and structs similar, for non-trivial structs/classes it will always be more involved because value types are immutable and don't have identity. The problem won't be syntax when switching between them.

IMHO class syntax right now isn't ideal and should be changed eventually. There are also enums which should be similar. Right now I don't think we guarantee that all class fields are initialized when we allow access to self, which is unsafe because some fields might still be null which breaks assumptions/guarantees. Replacing the constructor with a static method could guarantee this without any additional checks.

dinfuehr commented 4 years ago

I am leaning towards making classes more like structs actually:

class/struct SomeTypeA(a: Int32, b: Int32)
class/struct SomeTypeB { a: Int32, b: Int32 } // equivalent to SomeTypeA
enum SomeTypeC { A, B, C(Int32) } // enums would look familiar

// object creation would work like this:
SomeTypeA(1, 10)
SomeTypeB(a = 1, b = 10) // named arguments could be used in the future

// if some more code is needed, this needs to be solved with function/static method:
fun newSomeTypeA(): SomeTypeA {
  let x = getRandomValue();
  SomeTypeA(x, 10)
}

We could have T::new as a convention similar to Rust. This is already encouraged as we don't allow to overload the constructor (or function/methods), which results in the Array class having multiple "constructor" methods (e.g. empty, new, fill).

soc commented 4 years ago

I am on-board with replacing Foo { a: 1 } by Foo(1). I wouldn't change syntax for defining structs though for now.

I wouldn't recommend this – having syntax where the definition and the use looks similar is a benefit that should not be underestimated. Also, if the {} syntax was used, how would the class/struct body be defined?

Perhaps I'm not seeing the idea behind having two different syntax variations that do the same thing – could you expand?

Right now I don't think we guarantee that all class fields are initialized when we allow access to self, which is unsafe because some fields might still be null which breaks assumptions/guarantees.

I agree with this, but I feel throwing away constructors because of this is like throwing out the baby with the bathwater. Yes, checking this tends to be iffy, but I think we should give it a try before giving up on it.

With Dora being final-by-default, we could for instance tighten some rules like "do not call overridable methods in constructors" to simplify the checks.

Replacing the constructor with a static method could guarantee this without any additional checks.

Maybe I don't understand that, but where is the difference? At the moment I guess that constructors are safe iff the class body does not define fields in addition to the ones in the constructor parameters. So in what sense would static methods be safer? At some point the class needs to be initialized.

There are also enums which should be similar.

To be honest, I think enums in class-based languages are a mistake. The benefits they provide don't seem to be worth the unavoidable complexity and trouble they bring. I can totally live with Option being an enum to allow getting rid of nils while not having structs yet, but I'd hope we can get rid of enums as soon as possible after structs are implemented.

T::new

I think this is a topic orthogonal to the rest, but I think being able to use T(...) (which is prime real-estate in language terms) for the "most common" operation is a good thing. I'd hate having to write Array::new(1, 2, 3) instead of Array(1, 2, 3).

(This of course does not detract from the possibility from treating the constructor more like a static method conceptually.)

dinfuehr commented 4 years ago

I am on-board with replacing Foo { a: 1 } by Foo(1). I wouldn't change syntax for defining structs though for now.

I wouldn't recommend this – having syntax where the definition and the use looks similar is a benefit that should not be underestimated. Also, if the {} syntax was used, how would the class/struct body be defined?

With impl.

Perhaps I'm not seeing the idea behind having two different syntax variations that do the same thing – could you expand?

Right now I don't think we guarantee that all class fields are initialized when we allow access to self, which is unsafe because some fields might still be null which breaks assumptions/guarantees.

I agree with this, but I feel throwing away constructors because of this is like throwing out the baby with the bathwater. Yes, checking this tends to be iffy, but I think we should give it a try before giving up on it.

With Dora being final-by-default, we could for instance tighten some rules like "do not call overridable methods in constructors" to simplify the checks.

Replacing the constructor with a static method could guarantee this without any additional checks.

Maybe I don't understand that, but where is the difference? At the moment I guess that constructors are safe iff the class body does not define fields in addition to the ones in the constructor parameters. So in what sense would static methods be safer? At some point the class needs to be initialized.

In the constructor function the object isn't fully initialized in the beginning. At that point we evaluate the expressions the class fields are initialized with. One of those expressions could use self already (I don't think we restrict this right now). Fields which are initialized afterwards hold 0 or null at that point and that way we can't guarantee that references are non-null. If we create an object only with SomeClass(<field1>, ..., <field_n>) we get values for all fields as argument and can initialized all of them at once.

There are also enums which should be similar.

To be honest, I think enums in class-based languages are a mistake. The benefits they provide don't seem to be worth the unavoidable complexity and trouble they bring. I can totally live with Option being an enum to allow getting rid of nils while not having structs yet, but I'd hope we can get rid of enums as soon as possible after structs are implemented.

I don't really plan to remove enums. How could we even use struct for implementing Option?

T::new

I think this is a topic orthogonal to the rest, but I think being able to use T(...) (which is prime real-estate in language terms) for the "most common" operation is a good thing. I'd hate having to write Array::new(1, 2, 3) instead of Array(1, 2, 3).

IMHO that's totally acceptable, Array::new(1, 2, 3) isn't too bad and for many data types T(..) can still be used. I think the real problem with the approach I described above is how to initialize/create objects in class hierarchies.

(This of course does not detract from the possibility from treating the constructor more like a static method conceptually.)

dinfuehr commented 4 years ago

What's the goal of this issue after reaching agreement? The problem is structs aren't implemented yet, until it is we can't do much.

soc commented 4 years ago

With impl.

Oh ... I was rather eyeing on minimizing its usage as much as possible.

Depending on where you can define it, it's making the problem typeclass coherence much more pressing (because you don't even have a typeclass to distinguish multiple impls).

Having to search for impls is also one of my main annoyances when navigating Rust code. What's your intention behind it?

In the constructor function the object isn't fully initialized in the beginning. At that point we evaluate the expressions the class fields are initialized with. One of those expressions could use self already (I don't think we restrict this right now). Fields which are initialized afterwards hold 0 or null at that point and that way we can't guarantee that references are non-null. If we create an object only with SomeClass(, ..., ) we get values for all fields as argument and can initialized all of them at once.

But isn't that what I said? I. e. class Foo(let foo: Int32) { fun ... } is fine, class Bar(let bar: Int32) { let baz = ... } can be an issue?

Wouldn't flat-out disallowing instance method calls (everything that passes this) and only allowing module calls deal with this problem even though it might be too restrictive?

I don't really plan to remove enums. How could we even use struct for implementing Option?

enums regularly gets used for at least three distinct purposes:

  1. a finite, ordered list of values
  2. full-blown GADTs
  3. a very limited amount of control-flow structures (Option, Result)

and because they have different, and sometimes contradictory, requirements, you end up with ballooning implementation complexity while not making anyone happy.

My approach would be to simply use the existing facilities:

  1. a finite, ordered list of values

the only thing that's interesting here is having some convenience methods that a) list all defined values and b) allow converting from string to enum member. So let's do this:

@enum
module Planets {
  module Mercury
  module Venus
  module Earth
  ...
}

--> simple nesting, plus @enum to trigger the creation of some compiler-synthesized methods.

  1. full-blown GADTs
@abstract @sealed class Pet(let name: String)
module Pets {
  class Cat(name: String, let age: Int32) extends Pet(name)
  class Dog(name: String, let size: Size) extends Pet(name)
  ...
}

--> As above, use nesting of existing structures, just that we use classes instead of modules. No annotation, because we don't need any compiler magic at all here.

  1. a very limited amount of control-flow structures (Option, Result)
@abstract @sealed struct Option[T] {
  fun isDefined: Bool
  fun getOrPanic(): T
  ...
}
struct Some[T](let value: T) extends Option[T]
struct None[T]() extends Option[T]

--> Reusing existing facilities, as in the examples above. (Though I think the specifics are not important, as Option (and perhaps Result too) will receive enough special casing in the runtime to make any kind of source representation a convenient lie.

Note that there is no surrounding module, because we don't want to keep writing Option::Some everywhere, so unlike with enums, we simply structure our code that way.

IMHO that's totally acceptable, Array::new(1, 2, 3) isn't too bad and for many data types T(..) can still be used. I think the real problem with the approach I described above is how to initialize/create objects in class hierarchies.

I'm mainly seeing dangerous incentives here – if not validating inputs gets you the "good" syntax, and validating the inputs gets you the "less good" one ... I fear things aren't going to be validated much.

What's the goal of this issue after reaching agreement? The problem is structs aren't implemented yet, until it is we can't do much.

The idea of the issue was to deal with the superficial syntax issues first, which does not require any working implementation.

But I'm not really seeing what the agreement is going to be, to be honest. I think our vision of how things should work have little in common.