Redesigning `Choices` to enable new input patterns

B-rando1 commented 3 weeks ago

As discussed in #3752, we want to add new input patterns to Drasil. From looking at drasil-code it seems that we need to change Choices in order to allow users to express the input patterns they would like. I have a design, but it has flaws so I would like some advice.

Changes to `Choices`

Choices already has a couple of relevant properties:

architecture :: Architecture contains information about whether the parsing and validation functions are in the same class or in separate classes.
dataInfo :: DataInfo contains information about whether the input is bundled in a class or contained in separate variables.

I created a new datatype data LogicLoc = InCons | InMthd | InFunc. I then added two fields of this type to Choices: parseLoc and validateLoc. (I'm not too sure about any of these names, feel free to suggest better ones 😄). The idea is that we can specify whether parsing and validation are done directly in the constructor, in a method to be called by the constructor, or in an external function.

Notes

We still don't have the ability to specify "I want parsing and validation in the same function/method, with each 'piece' of validation being performed as early in the parsing process as possible.
If input parameters are Unbundled, then only the InFunc option for parsing and validation makes sense.
So there are some patterns that we want to express but can't, and some patterns that we can express but shouldn't be able to.

It's possible I'm overanalyzing this, but it just seems like a better design of Choices would lead to fewer edge-cases to deal with.

B-rando1 commented 3 weeks ago

@smiths had a good idea in related discussion #3780. Rather than giving the user full control over every aspect of the code, we could create a curated list of decisions that fit well together, and just offer those. This would simplify the structure of Choices, and would also remove the chance of having a lot of edge cases.

B-rando1 commented 3 weeks ago

One change to Choices that I think makes sense is getting rid of the part of Architecture that deals with separate vs combined input methods. @smiths mentioned in #3752 that parsing and validation were originally separated in order to separate the 'secrets' of input format and input constraints. He went on to say that they later decided that since parsing and validation are only ever done together, they might as well be in the same module with the meta-secret of "how to provide valid input parameters". I agree with this, and I also think there's an argument that Parnas' idea of a "Module" doesn't have to refer to files - it could refer to functions or methods that hold a secret.

The other reason is that the currently available design choices that involve Separate (e.g. GlassBR) generally have more issues with them.

We should probably wait to make changes until we have an overall direction, but I just thought I should mention this piece here.

JacquesCarette commented 3 weeks ago

So one way to handle "dependent choices" is to have the data-structures direct them.

Let me take Architecture as the example (even though it might go away); Unbundled would stay as is (and imply InFunc) while Bundled would contain a LogicLoc field that further specifies that variability. (This needs to be adjusted for the actual design, but hopefully illustrates the idea.)

B-rando1 commented 3 weeks ago

@JacquesCarette I've been thinking a bit about your comment, and this is currently the set of choices that makes sense to me:

Dependent choices hierarchy

Remove InputModule from Modularity
Create new field for location of a piece of logic:
- data LogicLoc = InCstr | InMthd | InFunc
Split Structure into InputStructure and ConstStoreStructure:
- data ConstStoreStructure = UnbundledConsts | BundledConsts
- data InputStructure = UnbundledIns Bool | BundledIns LogicLoc (Maybe LogicLoc)
  - Unbundled:
    - The Bool is for interleaving: if True, then interleave parsing and validation in one function; if False, then put parsing and validation into separate functions.
  - Bundled:
    - The first LogicLoc (required) is location of parsing logic.
    - The second LogicLoc (optional) is location of validation logic.
      - If Nothing, then it's interleaved together with parsing.
        Pre-defined semantics
        
        There's still some flexibility that we can't express, so here are the semantics that I would choose to stick with:
If the parser is in a function, then the function is called and returns an object (it's not called by the constructor).
If the parser and validator are not together, then the validator is called by the constructor if inputs are Bundled, or by the Main function if inputs are `Unbundled (not called by the parser).

Pros
Can't express anything it shouldn't.
Can express a lot of stuff.
- Parsing and validation can be separate in any of the 3 locations, or interleaved together.
- Inputs can be bundled or unbundled.
Getting rid of InputModule removes the need to keep track of which module the input logic is in, since all of it will always be in the same file.

Cons
Interleaved is denoted by True for UnbundledIns, but by Nothing for BundledIns. I don't like the inconsistency, but I'm also not sure how to improve clarity without making the encoding scheme less clean.
There's still some pre-defined semantics of function call flow.

I think that if we want to allow the full set of choices, this is (hopefully) on the right track. I need to think some more if there's a good way of preventing complexity (#3780), but if this is the way we want to go then I think we might be able to figure that part out.

B-rando1 commented 3 weeks ago

A few notes from yesterday's meeting:

The Bool under Unbundled should be replaced with a custom data type to improve clarity.
We shouldn't use Maybe in Bundled to express interleaved, there's two options:
- Replace Maybe with a custom data type (as with the Bool above) to have the same structure with better clarity.
- Split Bundled into BundledInterleaved LogicLoc and BundledSeparate LogicLoc LogicLoc.
  - In some ways I like this, but overall I think the first option is better. This method has the first value of InputStructure tell a bit about input structure and a bit about parsing structure, instead of just about input structure. This would also make pattern-matching to find the input structure less clean, as we would have to check for two patterns instead of one.
Dr. Carette suggested we try thinking about Choices in terms of coordination rather than the set of low-level choices we currently have.
- He described coordination as 'who does what, and when; and if something goes wrong who tells whom?'
- I'm a bit confused by what this means, so I will have to do some research and potentially ask some questions.
- Dr. Carette also said it might not be worth it to change it up, but it's worth investigating.

smiths commented 3 weeks ago

Thank you for the summary @B-rando1. That is a great habit!

B-rando1 commented 1 week ago

I'm having some technical difficulties with getting Interleaved to work. A couple weeks ago I got something in the right direction, where if Unbundled Interleaved is chosen, it puts all of the input-related code inside get_input. Here is the function I added to do that: https://github.com/JacquesCarette/Drasil/blob/8ae2f82dfae96af251df6c309d486a524f551e6e/code/drasil-code/lib/Language/Drasil/Code/Imperative/Modules.hs#L410-L435

The issue is that, as you might expect, interleaving the parsing and validation is not as straightforward.

From reverse-engineering the code a bit, formatBod, derivedBod, and constraintsBod are all of type GenState [MSBlock], where each element of the list seems to refer to one specific action, e.g. opening the file, defining a single variable from a file, or checking a constraint on a set of variables.

Based on that, what makes the most sense to me is to change the types of these variables form GenState [MSBlock] to GenState [(Set Label, MSBlock)], where the Set is the set of all variables involved in that MSBlock.

If we could get things to that point, then we could build up a Set of variables that have been defined, and check after each input block if there are any input constraints that can now be added to the code.

Does this sound like a smart way of doing it?

I'm hoping that this way is doable, but there are a lot of parts to the code that I don't understand yet. I'll have to become friends with the quantVar, mkVar, and readData functions in particular it looks like.

B-rando1 commented 1 week ago

I may have found a problem with interleaved constraints. As I described in my last comment, we need some way of keeping track of which variables have been defined, and which variables each input constraint depends on. The first one seems just about possible (I got it working for derived inputs, and in theory parsed inputs should be similar).

The problem is that, as far as I can tell, each constraint is tied to a single variable, even if it depends on multiple variables. Take, for example, this example from Unitals.hs in PD Controller:

Because ipSimTime is put into the constructor for Bounded in this way, the constraint loses sight of it and just sees it as 'some expression' (no different than the frac 1 1000 for the lower bound). The constraint knows it's 'owned' by ipStepTime, but that's all it knows.

If we want to have interleaved parsing and constraint-checking, we'll need a better system of keeping track of all variables involved with constraints. It might be possible to reason it out under this current system, but I do wonder it would be better to make constraints their own thing (rather than a part of ConstrConcept as they currently are). I'm feeling a little out of my depth out here in drasil-lang, so I'd appreciate any ideas @balacij @JacquesCarette.

JacquesCarette commented 1 week ago

Having constraints as a separate thing that explicitly lists the variables that it depends on is an excellent idea.

Then we could do a topological sort and output things in the dependency order.

B-rando1 commented 1 week ago

I was talking with @balacij and @NoahCardoso about this, and we found a function eNamesRI that finds all UIDs involved in a RealInterval. We should be able to use this to get the necessary information to sort the code expressions.

JacquesCarette / Drasil