Semantics of Constructors in GOOL

B-rando1 commented 6 months ago

I've been working on thinking about how to translate GOOL's classes into struct libraries in Julia. Discussing some aspects of this with @balacij has raised a few questions about the semantics of class constructors in GOOL. (For more context on how I'm trying to translate classes to structs, see the wiki page)

The Problem

My question boils down to: should constructors in GOOL be required to give a value for all of a class's properties? The answer currently appears to be no, but we need to revisit this for a couple of reasons:

As I understand it, it is considered good form not to leave any class members as undefined after initialization, as this requires us to account for that possibility later down the line.
Some target languages require that all class members be defined when initialized, and this currently leads to differences in behaviour. The language currently implemented that has this requirement is Swift. To get around this constraint, the generator gives default values to each variable, based on its type. This differs from the other languages, which do not currently give default values:
```
public class Observer {
public int x;

public int getX() {
    return this.x;
}
}
```
```
class Observer {
var x: Int = 0

func getX() -> Int {
    return self.x
}
}
```

In Java, calling o.getX() (for o :: Observer) would return null (If I remember correctly 😄), whereas in Swift it would return 0.

Potential solutions

We could continue assuming that constructors don't always need to define all members of the class, but fix the inconsistency between targets by providing default values in all targets. It should be easy to do this, but a naive implementation would lead to a lot of likely redundant default values. We could look into keeping track of which variables are defined by the constructor, and only give defaults to the ones that aren't.
We could choose to enforce the constraint that a class's constructor(s) must define all of its members. This should also be easy enough to implement. This might require propagating these changes up through the rest of Drasil, but we can look into that and decide how much work it would be as well.

This is assuming there are no legitimate reasons for leaving class members undefined. Given that Swift doesn't even allow it, the assumption seems warranted, but let me know if you can think of any exceptions.

What are your thoughts?

Also, feel free to add anything from our discussion I forgot to mention @balacij 😅.

B-rando1 commented 6 months ago

I took a look at our current examples, and it looks like there are currently several places where constructors do not define all of a class's members. In particular, in GlassBR the InputParameters class has an empty constructor and relies on another class (not called from the constructor) to define all of its members, and in Projectile, the InputParameters class constructor calls another method which sets the instance's members. While the GlassBR example is clearly not defining the class's members, the Projectile example is less clear, since it does so via another method. If we decided to allow the Projectile class's behaviour but not that of GlassBR, it would be less easy to check for that. I'll need to look deeper into drasil-code to know how hard it would be to prevent generating GOOL code such as that of Projectile, but that could add more difficulty.

All that to say, Option 2 is likely still feasible, but it would be more work than I initially hoped.

smiths commented 6 months ago

@B-rando1 what you have observed is a consequence of the secrets (likely changes) we used in our design. We typically have one module that knows the secret of the data structure for the required inputs (InputParameters) and another module that knows the secret of the format of the input file. We also had a third module that new the secret of what constitutes a valid input. We realized that three modules wasn't really necessary, since they are never used independently of one another. We can essentially combine the modules into one whose secret is "how to provide valid input parameters".

It looks like GlassBR kept a separate module for the input format hiding secret, while Projectile made the input format hiding a "sub-module" of the input parameters module.

We can discuss further during today's meeting.

B-rando1 commented 6 months ago

Just to summarize what we discussed in our meeting today:

We want to avoid adding default values (at least when there is no sensible default given to GOOL), since that's no better than leaving them undefined.
The generated examples seem to have some poor design choices with how they read in parameters from the file, so we need to decide how we should do it instead. To get a better sense of how we should do it, I'll be hand-writing that part of the code, taking inspiration from tutorials, forums, etc. to get a sense of how it's "supposed to be done" in a few different languages. Swift in particular should be a good language to look at, as it has more constraints than the other targets.
Once we have a better handle on the type of constraints we want to have, rather than checking code once it gets to GOOL, we will be making sure that drasil-code generates code that meets the constraints. At the GOOL stage, we will assume that the code already meets the constraints and doesn't need to be checked.

JacquesCarette commented 5 months ago

On the last bullet above: if it is easy to do so, we should enforce the constraints through Haskell's type system. But we shouldn't do 'fancy gymnastics' to get there.

B-rando1 commented 5 months ago

I did an analysis of how file input currently works in GOOL and suggested some alternate designs here. Code illustrating my notes can be found here.

There's a lot more in that link, but my proposal for class constructors is this:

Do not allow a class's constructor to call its own methods, until all of its members have been defined. This means that a constructor cannot call a parse method which would set its members based on a file, but once it has all of its members defined, it could call a validate function which makes sure that all of its members satisfy a set of constraints.
Rather than leaving any 'hard work' (e.g. parsing) necessary to find values for each of a class's members, create a function that does this work of getting the values, calls the class's constructor on the values, and returns the instance of the class. This keeps the constructor clean, and decouples the class from the input format.

What are your thoughts?

B-rando1 commented 5 months ago

As a side note, I'm also going to try separating the storage of constructors and methods in GOOL. The reason for this is that there are a lot of differences between constructors and methods in procedural languages (e.g. in Julia the constructors can/should be inside the struct, but the methods are functions that need to be outside the struct). I'm hoping it'll be a small change, but if it gets too big I'll bring it up again.

smiths commented 5 months ago

I'll add discussion of these different alternatives to our next agenda. In our discussion, I think we should first focus on what we want to generate. How this should be done in GOOL should be the second thing we discuss. The wiki page @B-rando1 create is great, but it does intermix the two conversations.

@B-rando1 can you do some investigation to see what is idiomatic in the languages for reading inputs from a file to populate a struct? We probably don't have to look at every language, but Julia, Java and Swift would make sense. @JacquesCarette suggested in our last meeting that ChatGPT (or a similar tool) could help with this.

@B-rando1, I think it would be worthwhile for each of your options to think about the secret of each module. If you haven't run across Parnas's definition of information hiding, the secret of a module is the thing that it knows that is hidden from other modules. Typical secrets are data structures, input formats, hardware design, algorithms, requirements that can change, etc.

@B-rando1 in your designs where parsing of the input file is done external to the class, you are finding all of the inputs and then passing them all at once to the constructor, correct? I like this idea, since otherwise, we need default values, at least temporarily for the constructor. Passing all of the inputs at once is necessary if you want internal validation. Although many of our validation rules are done independently for each input (like length > 0), there are validation rules that depend on the combination of inputs, like a constraint on the aspect ratio of a sheet of glass in GlassBR.

B-rando1 commented 5 months ago

Thanks for your feedback @smiths. I guess I was viewing the problem in terms of both constraints (idiomatic and feasible to implement), but I can update the wiki page to try separating them.

Regarding what you said about investigating what is idiomatic, the designs I brought up were influenced by a conversation thread I had with the WhatsApp AI. It had some good ideas, and in particular the idea of parsing the file in an external function and passing the 'clean' inputs to the class's constructor was its idea. One problem I had with it though, is that like most LLMs, it's indecisive and overly affirming. It mostly avoided making objective statements about what design was 'best', and it would like pretty much every suggestion I made (even bad ones). Still, it was good for brainstorming different decisions we could make.

I also looked at a few tutorials and forums, but I found that the tutorials were a bit too simple ('how to do file IO in Java', not 'how to read a file into a struct in Java'). There were some good StackOverflow discussions, but the answers didn't seem to give much consensus. For example, in this discussion, the top answer puts the parser inside the constructor; the second answer creates an empty object, then parses the input and sets the instance's variables from the main function; and the third and fourth answers parse the input in the main function and then pass the values to a simple constructor. That leaves what is truly the 'best' way to do it in Java a bit unclear. There might be more places to look though, so I can do more research and give updates if I find anything.

I'll also make sure to look up information hiding and secrets, and incorporate them into the discussion on the wiki page.

smiths commented 5 months ago

@B-rando1 knowing the GOOL constraints is definitely important. I just want to guard against letting the current implementation of GOOL push us to the solution that is easy, instead of the one that is right. I'm probably being more careful than necessary. The other part of my motive is simply that I find things easier to understand when there is a separation of concerns. :smile:

That is a great summary of how you came up with your different design options using your brain, WhatsApp AI and forums. Knowing the process used for generating ideas helps others build confidence in the completeness of the ideas.

Thank you for the sample Java code. I agree that there isn't really a convergence on the best approach. None of the examples put any effort into justifying their design choices. I think we should try to use Parnas's information-hiding principle for our design. The things that are likely to change independently should be secrets. We also have the principle that we don't want to ever have a constructor that leaves the fields of an object uninitialized.

B-rando1 commented 5 months ago

I made the changes discussed to the wiki page.

I moved the discussion about implementation in GOOL to its own spot.
I made a few comments on differences in idioms between languages in this section.
I added a discussion of Secrets here.

To summarize the discussion of secrets:

Putting the file parsing in its own module (function or method) hides the information of the type of input and the file format.
- We could take this a step further with the Factory Pattern, which would split up the secrets of type of input and file format.
Putting validation in its own module (function or method) hides the information of the constraints on the input parameters.
- The rest of the program likely relies on these constraints staying constant no matter what we do, but I think it's still good practice.
Using getter methods instead of directly accessing the class's members would hide the secret of how the class is storing the parameters.
- This is unlikely to change though, so it might be unnecessary.

smiths commented 5 months ago

Great work @B-rando1. The factory pattern is particularly interesting. If we wanted to allow for different file formats to exist as a runtime variability, this would make sense. We would use the object creation that matched the file format that the user selected. The factory pattern would also make sense if we used different source information to build objects that matched the same interface. For instance, if the interface requires dimensions a and b, we could have one creator that uses inputs a and b, and another that uses a and the aspect ratio (Ar).

Allowing for specific decisions to be made by the user at run-time would be more flexible, but I think it goes against our current thinking. If the user changes the fileformat, then they should generate a new program. They can still generate a program that matches the previous spec by using the previous generator input. I think our philosophy is to generate a program that is specialized for its specific purpose, rather than generate a general purpose program for many purposes.

The exception to this is the input parameters; we expect those to be set at run-time.

B-rando1 commented 5 months ago

This might be a useful way of visualizing what I've found:

Location \ Functionality	Deriving Member Values	Validation
Constructor	+ Class members set inside constructor - Large constructor - Input type tied to class	+ Validation tied to class - Large constructor
Internal Method (called by constructor)	+ Simple constructor - Need default values - Input type tied to class	+ Simple constructor + Validation tied to class
External Function (called pre- or post-constructor)	+ Simple constructor + Input type separate from class	+ Simple constructor - Validation not tied to class

B-rando1 commented 5 months ago

A few points from our weekly meeting:

Many of these options have merit, at least in specific situations. Because of this, we want to allow the user to make the choice.
- It could even be a cool idea to let them generate 'all the choices', then choose which one looks best for their application.
As a rough model of the design space, we're currently looking at these dimensions:
- Data provenance (that is, where are the values determined, e.g. parsing)
- Data validation (that is, constraint-checking)
  - Both location and degree of (some constraints we could 'assume it's valid and hope for the best')
- Degree of bundling
  - Currently we have Unbundled and Bundled. We talked about potentially doing SubBundled which would have multiple smaller structs rather than one struct or many variables. That likely needs more design, and I think it's probably smart to revisit it once we know if we're going in the right direction.
There may be other types of decisions we want available, but we'll leave them for now and focus on getting something implemented.

smiths commented 5 months ago

Thank you for the summary @B-rando1. Sounds good.

B-rando1 commented 5 months ago

It seems like the focus of this issue has shifted, and it's covering a few too many aspects of the design. It's probably better if we close it and split it into separate issues as needed.

smiths commented 5 months ago

@B-rando1, I'm all for closing issues, but we also don't want to lose any good ideas in the process. Since you mentioned that the issue could be split into separate issues, can you create those issues now? If more discussion is required, you could instead create a discussion post or posts.

JacquesCarette / Drasil

Semantics of Constructors in GOOL #3752

The Problem

Potential solutions