JacquesCarette / Drasil

Generate all the things (focusing on research software)
https://jacquescarette.github.io/Drasil
BSD 2-Clause "Simplified" License
141 stars 26 forks source link

Generate “Functional Requirement” for “Output Values” #3259

Open balacij opened 1 year ago

balacij commented 1 year ago

TODO

Background

Take https://github.com/JacquesCarette/Drasil/pull/3250 for example. I wrote this PR to add another output in the SRS & Code generated for the Projectile case study. At https://github.com/JacquesCarette/Drasil/pull/3250/commits/1ad12416b5b2a1c5f5354f585131def56a7ebd79, we designated a single output value in the internal SRS representation, and as a result, a ton of code was generated (great!). However, as smiths, it wasn't reflected in the SRS because the Functional Requirement (FR) isn't generated.

Analysis

Each of the case studies focus on the same kind of software (input-output software). In the “back” (our SRS abstraction), we have representations for what values are intended to be “outputted”:

https://github.com/JacquesCarette/Drasil/blob/e4b3354f1d8586ff25fec845ea9618af4bff25fe/code/drasil-sysinfo/lib/SysInfo/Drasil/SystemInformation.hs#L59

The input information is used to generate the code, but not the related functional requirements of the SRS. For example, (aside: I dislike posting images on tickets, but there's no way to create a permalink to HTML without external services), take DblPendulum's functional requirement as an example:

image

Then take GlassBR's:

image

Then take Projectile's:

image

And PD Controller's:

image

And NoPCM's:

image

etc.

Well, each of these statements is sufficiently similar such that we can automate the generation of these functional requirements uniformly.

Goals

  1. Brand the existing case studies with information about them being "input-output" case studies.
  2. In cases of SRS documents brandished with the "input-output" case study branding, we should provide the user with an off-the-shelf means of generating the Output-Values functional requirement.
  3. The off-the-shelf requirement generated can be done in at least 2 flavours: single sentence roll-ups (for small lists), table references (for larger lists).
JacquesCarette commented 1 year ago

Agreed! Now you're really thinking like a Drasil developer. Note that we could do each of those 3 goals separately, and each on its own would be a worthwhile step forward.

smiths commented 1 year ago

I definitely like the ideas expressed above. Having the output requirements and the code both generated from one piece of Drasil code would be a long-anticipated step forward. All of our current examples fit with the input-output pattern. As the current batch of 741 assignments show, not every scientific project fits with this pattern, but a great many of them do.

samm82 commented 1 year ago

We had some commented-out code already, and I'm not sure why it was abandoned (EDIT: apparently I had written this code and left it due to time constraints over one of my summers! 😅). It outputs the following output requirement (with associated table) for Projectile: image

Which looks like this currently: image

Filling in the Source column (i.e., which IM the value comes from) will likely take some design; the easiest solution would be to change the _outputs field of SystemInformation to take a list of tuples, where the first value is the output and the second is its source, or we could modify the output type (currently it is QuantityDict in Projectile, but Quantity i, MayHaveUnit i generally) to also include a source field (which I think is the better option). Do you have any feedback @JacquesCarette? My thoughts are to proceed with the first option (if its consequences aren't too severe) as to not add anything to the to-do list for our chunk investigation, but we may need to have a more in-depth design discussion about this at some point (or just leave the Source field blank for now and tackle it later!)

smiths commented 1 year ago

@samm82 I wonder if we really need the Source column in the FR:Output-Values table? The same information is in the Output-Values requirement.

samm82 commented 1 year ago

The same information is in the current Output-Values requirement, but isn't in the new one generated by the existing commented-out code. My comment above also assumes that we want the output values to be listed in a table, as opposed to a list like we currently do; I think it makes sense to display the outputs as a table for clarity and consistency with the input table we already have for its related requirement.

smiths commented 1 year ago

Okay, I understand now. Yes, making the output requirement parallel the structure of the input requirement makes sense. I thought you were proposing to have both the table and the explicit list in the requirement. I agree that the table is better, especially if there is a large number of outputs. The information on the related instance models is already available for the existing requirement. Can't you use that information in the table for the Source column?

samm82 commented 1 year ago

Yes, but the issue I'm having is where to put it so that it can be "pulled out" and generated automatically, instead of manually being supplied by the user when creating the manual requirement. The two options that immediately come to mind are changing the outputs field of our SystemInformation to a tuple that associates each output to its source, or changing the output type to include a source. Each approach has its drawbacks: the first will complicate how outputs are stored (and potentially cause many functions to ignore the unneeded source information, adding extra complexity, while the second will require a deeper analysis to see which chunks would need to have a source field, whether or not the source should be mandatory (what if an instance of a chunk only sometimes gets used as an output?), and whether or not the chunk itself will be required to build the source (my guess is that this will always be an IM), potentially leading to a cyclical dependency. Of course, any change to chunks will complicate our already-complicated chunk investigation.

My plan is to proceed with the first option; this will let us see how complicated it is and work as a proof-of-concept for the final requirement. This can then be refactored later (either in the PR or as a separate issue!) if desired.

smiths commented 1 year ago

Okay, I understand now. I thought the previous requirement was also generated, but it makes sense that it was manually written.

Manually written isn't necessarily a bad thing. It gives us flexibility. We have had cases of instance models that produce an output, but we have chosen to not have a requirement to display that output. I'm thinking specifically of t_flight. In the initial version of Projectile, we used t_flight to find the d_offset (if I remember correctly), but we didn't actually output t_flight.

@samm82 you mention associating each output with its source, but isn't it easier to associate each source with its outputs? I have no idea what makes the most sense in Drasil, but that is how I think of it. An instance model has outputs, not an output has an instance model.

samm82 commented 1 year ago

That makes sense - an initial look makes me think that InstanceModels can be used as the outputs in SystemInformation, but this has consequences in terms of the definitions attached to them vs. the ones attached to the quantities itself. Will investigate further

samm82 commented 1 year ago

@smiths If there is only one output, should we change the generated requirement's text from "values" to "value" or do something else?

samm82 commented 1 year ago

@smiths Some questions about SWHS:

  1. Some examples have the calculations of values grouped as one requirement (same with their output), like Projectile: image However, SWHS has a separate requirement for the calculation and output of each value: image Is it OK if these requirements are grouped into a calculate requirement and an output requirement, as done in Projectile?
  2. "Calculate-PCM-Melt-Begin-Time" and "Calculate-PCM-Melt-End-Time" include $t\text{melt}^\text{init}$ and $t\text{melt}^\text{final}$ as outputs, but they are not included in the actual outputs of the program (shown below). Should they be?

https://github.com/JacquesCarette/Drasil/blob/1acb8ba1553d16d1c949e3b55049fb2242dfe555/code/drasil-example/swhs/lib/Drasil/SWHS/Unitals.hs#L409-L412

samm82 commented 1 year ago

an initial look makes me think that InstanceModels can be used as the outputs in SystemInformation

And it was just as easy as that!

image

However, it leads to a difference between the descriptions of each quantity (the top being what we have currently and the bottom being the results of this change)

<     \param s output message as a string
<     \param d_offset distance between the target position and the landing position: the offset between the target position and the landing position (m)
<     \param t_flight flight duration: the time when the projectile lands (s)
---
>     \param s output message
>     \param d_offset offset: the offset between the target position and the landing position (m)
>     \param t_flight calculation of landing time: the time when the projectile lands (s)

My instinct is that we could just have each IM inherit its description from the quantity it defines. Is this a change that we should be implementing globally (e.g., for all theories in all examples)? If so, I think that should be done in a separate PR.

However, I'm not sure if we want the freedom to be able to name a theory differently than its quantity, although I can't think of a place where we would want this. The only place where the IM descriptions are used in the SRS are in the newly generated output table (since it is built from the IMs themselves) and in the code comments (as shown above), so I think we should just be using the descriptions of the quantities themselves for consistency. Any thoughts? @smiths @balacij @JacquesCarette

smiths commented 1 year ago

@samm82 I'll first respond to your comment that "My instinct is that we could just have each IM inherit its description from the quantity it defines" I don't think we can have the IM inherit its description from the quantity it defines. There isn't always going to be a one-to-one relationship between quantities and IMs. Projectile and most of our other examples are simple enough that they make it look that way, but I don't think this will always apply. As we've seen from the double pendulum example (#3533), the output of an IM can be defined implicitly from the IM's equation. Moreover, we likely will have times when one IM alone doesn't provide an output. Again in double pendulum, we need two IMs to define two outputs because there is a coupling between the equations. We could move both models to one model, but then we don't have one output from the IM, unless we add a record type that puts both outputs together.

With respect to the specific projectile example above, the new descriptions look fine to me. They seem like a different wording for the same thing, so just using the one wording throughout our documentation would be an improvement.

samm82 commented 1 year ago

Just noting that the above change was made in 05f9eb3, along with "fixing" the term/definition of offset, which I think improves the generated documentation. Would we want a custom constructor to take the term of an IM from its quantity or just do that manually when applicable?

samm82 commented 1 year ago

@balacij While working on this issue, I noticed some changes to the stable JSONs for seeds and the DOTs for traceability matrices. It seems like there are new dependencies (?) from the generated requirements, which looks alright, but it also looks like the dependency information between the generated requirement and its relevant values is being lost. I think this is because this traceability matrix is determined from SystemInformation, which doesn't have the generated requirement in it yet.

https://github.com/JacquesCarette/Drasil/blob/2ba94e9ad951ee6f55a978f1f32ae43274c117a0/code/drasil-docLang/lib/Drasil/DocumentLanguage/TraceabilityGraph.hs#L47-L55

Is there a way to get this traceability information after these requirements have been generated? (My guess is that implementing this change will also add traceability information for the input requirements, which is likely desired.)

https://github.com/JacquesCarette/Drasil/blob/2ba94e9ad951ee6f55a978f1f32ae43274c117a0/code/drasil-docLang/lib/Drasil/Sections/Requirements.hs#L35-L39

EDIT: Would getting this information from the DocDesc instead of the SystemInformation be feasible? (The DocDesc will contain the generated requirements.)

https://github.com/JacquesCarette/Drasil/blob/2ba94e9ad951ee6f55a978f1f32ae43274c117a0/code/drasil-docLang/lib/Drasil/DocDecl.hs#L107-L108

Click for an example of Projectile's diff
diff --strip-trailing-cr -r -X ../.gitignore stable/projectile/.drasil/inverted_seeds.json build/projectile/.drasil/inverted_seeds.json
129a130,133
>     "ReqOutputs": [
>         "references",
>         "labelledContent"
>     ],
diff --strip-trailing-cr -r -X ../.gitignore stable/projectile/.drasil/problematic_seeds.json build/projectile/.drasil/problematic_seeds.json
96a97,100
>     "ReqOutputs": [
>         "references",
>         "labelledContent"
>     ],
diff --strip-trailing-cr -r -X ../.gitignore stable/projectile/.drasil/reverse_trace.json build/projectile/.drasil/reverse_trace.json
20a21,23
>     "ReqOutputs": [
>         "outputValues"
>     ],
42d44
<         "outputValues",
68d69
<         "outputValues",
79d79
<         "outputValues",
diff --strip-trailing-cr -r -X ../.gitignore stable/projectile/.drasil/seeds.json build/projectile/.drasil/seeds.json
79a80
>         "ReqOutputs",
137a139
>         "ReqOutputs",
diff --strip-trailing-cr -r -X ../.gitignore stable/projectile/.drasil/trace.json build/projectile/.drasil/trace.json
79,81c79
<         "flightduration",
<         "message",
<         "offset"
---
>         "ReqOutputs"
diff --strip-trailing-cr -r -X ../.gitignore stable/projectile/TraceyGraph/allvsall.dot build/projectile/TraceyGraph/allvsall.dot
53,55d52
<   outputValues -> flightduration;
<   outputValues -> offset;
<   outputValues -> message;
diff --strip-trailing-cr -r -X ../.gitignore stable/projectile/TraceyGraph/allvsr.dot build/projectile/TraceyGraph/allvsr.dot
6,8d5
<   outputValues -> flightduration;
<   outputValues -> offset;
<   outputValues -> message;
samm82 commented 1 year ago

Interestingly, none of the seeds/dependencies were changed for generating the requirement for GlassBR 🤔

JacquesCarette commented 1 year ago

Going way back to last week's questions: should SI 's _output field be change to a pair? What I'm puzzled about it, why would SI still have such an explicit field? What does the "act of choosing outputs" correspond to? Is it a system design decision? What kind of coherence requirements should that _output have with the rest of the stuff we have? If I understand correctly, the current design means that a change to _output will change the requirements?

I agree with @smiths that, in general, we do want the description of an IM and the quantities it contains to (potentially) be different.

If you can adjust the grammar (value/values), great!

As to getting information from DocDesc: I have no memory of the order of processing of the various bits. And it's not like the implementation is "correct", quite the opposite -- for a long time we've sought better explanations for the pieces. So the feasibility needs to be investigated; if the answer is negative, we will have some further thinking to do; if positive, then just do it!

samm82 commented 1 year ago

In response to the first paragraph: As far as I understand, specifying the outputs identifies which quantities should be output by the output module during code generation. The design that I am currently building is to "hook" these outputs up to its associated requirement to reduce duplication and increase consistency. I am quickly learning that this is not a one-size-fits-all approach (for example, we currently already generate an input requirement, but not all examples do so for likely varying reasons) so I am only generating this requirements for the examples where it is intuitive to do so (at least for now).

In response to the rest: sounds good!

balacij commented 1 year ago

Sorry for the late response!

Is there a way to get this traceability information after these requirements have been generated? (My guess is that implementing this change will also add traceability information for the input requirements, which is likely desired.)

We can take as many snapshots of the ChunkDB as needed. So we would just need to take a snapshot pre/post this generation. However, that traceability information extends to a lot of chunks. With #2873, the traceability information is calculated as chunks are added to the ChunkDB, so I think that puts us in a happier place.

samm82 commented 1 year ago

Just making a note here about my progress:

I was able to add the necessary dependency information in 2794024 (\Why does GitHub allow a short commit ID to just be numbers? What if I just wanted to use that number and not refer to a commit?</side note>), which correctly populates the RefBy fields and traceability matrices relevant to the generated output requirement. I realized that, for some reason, the output FR showed up first in the list; apparently this already happens in our stable version (the following screenshot is from IM:calOfLandingTime from the Drasil website): image

Because this behaviour already occurred, figuring it out seems out of scope for what I'm doing, but the order still seems weird to me; I would think that the IMs would be first, then the FRs in the order they appear. Is this big enough that an issue should be created for determining how to sort sources in the RefBy fields? Maybe the ordering of the columns/rows in the traceability matrices could be used as a reference for how to sort them?

EDIT: Issue created (#3567)

Also note that using last $ lnames' [t] to get the UID of the FR may be a bit of a hack, but I feel like it's fairly safe to assume that the last UID in a requirement's table's caption will be the requirement it references, at least for now.

smiths commented 1 year ago

@samm82 I agree about the short commit ID. I think the GitHub creators realize that it could cause a problem with a legitimate number, not a GitHub ID, but they figure the chance is so low that they won't worry about it. The odds are probably less than 1 in 4329875 (or some other random 7-digit integer). :smile:

This is strange that the FRs are not grouped together. The most logical ordering, as you suggest, is to list the chunks in the order they appear in the document. Although it is a low priority, I agree that creating an issue for this is a good idea.

JacquesCarette commented 1 year ago

I don't think there's any code for sorting RefBy in any particular order. That would likely be easy to add, once some requirements for what it should be have been figured out.

samm82 commented 1 year ago

Somehow I forgot about GamePhysics when going through this issue, maybe because I hadn't implemented the source field at that time which meant that SystemInformation isn't impacted.

GamePhysics doesn't really have an output requirement since what it does is a bit more... nebulous: gamephysics... outputs?

This is what the generated table looks like with the outputs currently given for GamePhysics. None of these quantities seems specific enough to be useful. Removing the list of outputs entirely led to no changes in the SRS, and since this example doesn't generate code, I'm wondering what should be done. What are the desired outputs of this example? Are there any? @smiths (If this conversation is more involved, I'll make a separate issue.) image

This also gave me the chance to see what would happen if an output requirement was generated with no outputs specified: apparently that is totally allowed! @JacquesCarette mentioned earlier (I believe) that this kind of thing should be handled at a higher level; what did you have in mind for that? Should I still throw a warning (or error!) if an output requirement is attempted to be generated without any outputs (for a sanity/consistency check, if nothing else)?

empty outputs

samm82 commented 1 year ago

Generating the outputs for SWHS and NoPCM lose the time dependency of the output (see the following example for NoPCM):

nopcm output req original nopcm output table

Should this time dependency be added a la #3210/#3432 @smiths?

smiths commented 1 year ago

@samm82, when you encounter something new, can you create a new issue for it? It can be difficult to respond unambiguously to comments when they are interleaved with different questions/concerns. For instance, you asked me a question here and then another question here. I have to spend time thinking of how to reply, and thinking of how to indicate which comment I'm replying to. I understand that the comments are linked to the same investigation, which makes me think that a project might be the best way to organize things, or a central issue that uses hashtags to link to the related "sub-issues."

With respect to the game physics question, the desired outputs are in your table, along with a few things that we wouldn't typically care about. The goal of the game physics engine is to find the position (p(t)) and orientation (phi(t)) over time. We are also sometimes interested in the linear velocity and angular velocity. The two delta symbols in your table wouldn't be something that we are typically interested in.

The game physics example isn't our strongest example. It hasn't really be vetted to the extent that examples like GlassBR and Projectile have been. I think this is partly due to the fact that it doesn't generate code, and partly do to the fact that there is more physics in Game Physics, and not everybody enjoys physics. :smile: We should keep Game Physics reasonable, but we shouldn't go to extreme measures with it.

For the question about time-dependence in SWHS and NoPCM, I really would like it if we could keep the time dependence. There is a big difference between the type Real and the type Real -> Real. We've run into communication problems in the past by glossing over these details. Scientists and engineers gloss over these details all the time, saying to themselves that the readers will know from the context, but I think we should aim to do better than the status quo. :smile: Sometimes we will mean the energy at a specific time and sometimes we'll mean the function that defines the energy over time. For the output we mean the time history of energy. I wonder if there is a problem because we use E_w to represent both. Maybe we need E_w (change in heat energy of the water) and E_w(t) (time history of the change in each energy of the water) as two separate concepts?

JacquesCarette commented 1 year ago

If you attempt to generate an output requirement with no outputs, that should be a hard error.

Also, this shows why our concept of Output is too naive. There are too many meanings of 'output' (ranging from the result of a function to printing on the screen). A relational specification can be refined to point to some values as being 'output'. Some theories have multiple things that can be considered 'outputs'.

We should have more precise names for each of these concepts.

samm82 commented 1 year ago

Just noting that the work on #3573 and its related issues is necessary to be knocked out before the implementation with the source field populated can be finished.