JacquesCarette / Drasil

Generate all the things (focusing on research software)
https://jacquescarette.github.io/Drasil
BSD 2-Clause "Simplified" License
141 stars 26 forks source link

Are we using too many typeclass constraints on `SystemInformation`? What is a `SystemInformation`? #3260

Open balacij opened 1 year ago

balacij commented 1 year ago

Small steps to understanding SystemInformation

Original Post:

Related code:

https://github.com/JacquesCarette/Drasil/blob/e4b3354f1d8586ff25fec845ea9618af4bff25fe/code/drasil-sysinfo/lib/SysInfo/Drasil/SystemInformation.hs#L37-L66

It's very difficult to pin down a definition of what SystemInformation is, and I'm hoping this ticket will help us understand it a bit. Let's go through the fields:

  { _sys         :: a

a "system" (?) has a common idea & idea,

continuing, an SI has:

  , _kind        :: b

a kind/type, which is just an "idea",

  , _authors     :: [c]

a list of authors, which should be extracted from some polymorphic type,

  , _purpose     :: Purpose

a Purpose,

  , _background  :: Background

a Background,

  , _quants      :: [e]

quantity-like things that can be boiled down to quantities (Quantity e, Eq e, MayHaveUnit e,),

  , _concepts    :: [f]

quantity-like things that can be boiled down to concepts?, (Quantity f, MayHaveUnit f, Concept f, Eq f)

  , _instModels  :: [InstanceModel]

a list of instance models (this should likely be replaced with ChunkDB usage -- my problem! Oops!),

  , _datadefs    :: [DataDefinition]

a list of data definitions (this should likely be replaced with ChunkDB usage too),

  , _configFiles :: [String]

a list of configuration files, provided as Strings?,

  , _inputs      :: [h]

some list of things that can be boiled down to quantities (Quantity h, MayHaveUnit h),

  , _outputs     :: [i]

some list of things that can be boiled down to quantities (Quantity i, MayHaveUnit i)

  , _defSequence :: [Block SimpleQDef]

I'm not fully sure what this is, but I would hope it is something that we could automatically pull from the composed instance models and data definitions?

  , _constraints :: [j] --TODO: Add SymbolMap OR enough info to gen SymbolMap

a list of "constraints", a list of things that have a UID and are "constrained" (HasUID j, Constrained j)

(is this another thing that should be moved to chunkdb usage?)

  , _constants   :: [ConstQDef]

a list of constant quantitydicts,

(I think this another thing that should be moved to chunkdb usage?)

  , _sysinfodb   :: ChunkDB

the related "chunkdb",

  , _usedinfodb  :: ChunkDB

something marked for deletion (#1661) some time in the future,

  , refdb        :: ReferenceDB

a database of references (can likely be replaced with chunkdb usage + a bit of magic to sort citations per order of appearance in the SRS)

I think that simplifying the types on the fields in SystemInformation might help with pinning down a definition of what it is. Additionally, I think there are a few things we might want to try to remove, which will help us generally get closer to defining it, but I think we're getting closer :smile:

peter-michalski commented 1 year ago

I recall a comment a few weeks ago that suggested that SystemInformation could be pulled out of Body.hs as we expand the list of generated artifacts, implying that it would serve as a repository for information shared between artifacts.

I think there are a few things we might want to try to remove

What do you suggest for removal?

balacij commented 1 year ago

Hmm, I'm not quite sure if I understand your first comment.

Regarding the "removal," it doesn't necessarily need to be "removal," it can also be "moving around."

The instance model list, data definition list, and reference db look like they could be replaced with chunkdb usage. Same with the _quants and _constraints.

The "inputs", "outputs", and "config files" sound like they would be related to the "kind" of the SRS, so perhaps there needs to be some extra chunk type here that ties them together?

The usedinfodb should be removed (see the linked ticket).

The constant QDefs might be something that should be elsewhere too.

I'm not sure about the Purpose and Background yet.

peter-michalski commented 1 year ago

Data structure for holding all of the requisite information about a system to be used in artifact generation.

I was commenting that SystemInformation does not belong in Body.hs files considering the latter looks to specifically be an SRS-centric file. Moving SystemInformation within each example dir may help with understanding how to restructure and define it.

JacquesCarette commented 1 year ago

I've self-assigned this one. I think I can reconstruct an explanation for it all, with no promises that it's going to be coherent.

JacquesCarette commented 1 year ago

But I can 'answer' the aspect of this related to the title: SystemInformation tries to be representation independent. So instead of giving a type for each of its fields, it gives a set of constraints. The constraints record the "what information must the implementation promise to provide" and accept any representation that meets that promise. In Haskell, interfaces are declared via typeclasses - so this is using that style.

balacij commented 1 year ago

Sorry, I'm typing on a laptop and accidentally pressed to submit. I'll update and continue writing my comment!

balacij commented 1 year ago

First, I apologize for the late response. Today's CAS 741 lecture reminded me of this ticket, and I think it's the right place to put this commentary.

Second, I completely agree with your answer @JacquesCarette. In class, I was thinking about "Now, design aside, what does it capture? What is it representation independent of?" and #3003.

We were talking about the Module Interface Specifications (MIS) documents, and I asked @smiths two questions:

1) Does the MIS formulation affect what languages we can create code for? 2) Does the MIS approximately represent the "software solution" to the "problem" described in the SRS.

Paraphrasing @smiths answers, with my extra commentary and bias :smile:: 1) Yes. MIS formulations restrict what languages you can use. That isn't to say that if you had an SRS, MIG, and a preferred programming language with mismatched constructs from the MIG, that you couldn't have another MIG made for your preferred language. 2) Yes. Up to choices, it is a declaration of the "how" to solve a problem (as @smiths put it). In other words, it is the solution description.

Right now, all of Drasil's examples are this "Input-Calculate-Output" (ICO) style of programs. Given that we have this "Input-Calculate-Output" style of programs, there exists a basic set of "modules" that all solutions would need to have. Namely, input (1), calculation scheme (2), and output (3). There are likely others, I'm still new to the MIS document. Regardless, the MIS is a meaningful dissection of a specific solution (for which there might be many others of) to a specific problem. The modularization of the "solution" depends on the kind of software/problem we're describing. In other words, we have a layer of coherent "solution" thoughts before generating code. In other words, a "ModelKinds" for programs exists (but I think we already knew this, again, ignoring design).

Going back to the SystemInformation analysis and the design, it would be interesting to see if we could define a "ModelKinds" equivalent for SystemInformation (of course, with just 1 constructor -- ICO). Then SystemInformation is reduced to a record with (a) problem metadata (authors, purpose, background, etc.), (b) problem-specific data captured through "ProblemKinds" (or otherwise named), and (c) the single chunk database. (b) would have an ICO constructor, which would designate particular inputs and outputs from the chunk database, by their typed UIDs.

Note: Depending on how we view "ICO" and if solutions are necessary to posing problems, it might be better to just have "IO" and a partial constructor for "ICO"s from "IO"s and chunk databases.

Then we can capture how "Solutions" (but we wouldn't do this with a "SolutionKinds"-like construct, to be clear) relate to "Problem"s with an approach similar to how we define transformers (though it might be beneficial to think about #2883 and #2896's discussion of capturing all transformers, printers, etc. using type-classes to group them all together). With this, we would have a layer between SRS and "genCode", and a formulation of the abstract "solution" to the problem described by the SRS. Furthermore, we might be able to generate MIS documents.

Note: I don't know the full story regarding MIS documents and Drasil, but as I've understood it, they were just "too simple." But, with this, we could have different kinds of MIS documents for different target languages (e.g., a Haskell or Matlab MIS might want a different scheme than the MIS that a Java program might want). We could also just "not create the MIS documents", but the meaning behind the creation of the MIS (the solution) is something that we might consider capturing.

Aside: I notice that "config files" is a field in the SI, but I think it could/should be a MIS-specific choice when pulling an MIS from a particular SRS abstraction.

With this, I think we can continue to improve SystemInformation. I also think I can make some suggestions later, assuming we would want to go down capturing the I(C)O information in the SI and re-examine the design, but I'll need to think more about it and return.

However, SystemInformation discussion aside, I want to talk about "compilers" (and #3003). Previously, we've all discussed Drasil as a network of compilers. I like calling Drasil a compiler, and I like thinking of it as "connecting" compilers. To outsiders, they might find our definition of "compiler" to be odd. The traditonal compiler (TC) they refer to typically involves PLs and assembly languages, but that's not an important difference. TCs also have a particular ICO, where the inputs and outputs are externalized from the TC itself. An interpreter is similar, but it has a "loop" bit (i.e., REPL → ICOL). Are there any other meaningful differences? [I don't see any, but I could be wrong, of course :smile:]

Now, if we think about how the existing case studies parse their inputs, calculate, and output results, is there any difference (i.e., are the existing case studies "compilers?")? One difference is that the outputs are typically "console outputs", but that's not a hard requirement. I think some of the examples are supposed to output to files. In which case, what is the difference? To me, it seems that the inputs and outputs are typically "primitive" types. So does that mean that the existing case studies are compilers? In the traditional sense, no (because usually it has to be about programming languages), but in our "generation ≈ compiler" sense, yes!

To parse the primitive types, we use the standard facilities from Python/Java/etc, but there's no reason why we couldn't also create parsers for other input languages with complex data types, and consequently output complex data. One notable kind of input and output pairing is that of the TC. If we have an ICO where the input is a PL and the output is assembly, then we've created a TC in Drasil. Neat! We would also have MIS documents and SRS documents for compilers, and for quite a "cheap" price.

Ok, now, thinking about #3003. Let's assume we wanted to generate some sort of Haskell source code (but what? and how would it differ from the existing Drasil source code?) for another hypothetical Drasil compiler/interpreter. Well, if we can formulate an input language that represents the essence of the things we capture in Drasil, then we can nearly completely de-embed everything in Drasil, and re-generate specialized compilers. This last paragraph is still quite young. I haven't thought about it nearly as much as the above paragraphs, and I'll probably re-visit it soon (but in #3003), but it's a start. As of right now, I'm fairly convinced that "essence" should be captured through formulation through theory presentations, so I'm hoping to think about that more in the future.

In any case, @samm82 and @smiths, this is what I was trying to communicate at the end of lecture today.

@ all: Is what I'm saying making sense? To be honest, there are many tangents we could also go on from this, and I think it was a bit unfocused, but we could probably refine it if the foundation is solid.

EDIT: I discussed with @samm82 about this briefly on Discord (at least the bits about MIS capture and document generation). One more thing we could potentially think about with this is generating VnV Plans. @samm82 also noticed that it might be fruitful for his test case generation.

balacij commented 1 year ago

Ok, keeping it short, one last tangent: assuming everything above makes sense, and we capture the ICO(L), then we might be a smidge closer to resolving the recent "Characteristics of the Intended Reader vs background knowledge" discussion (#3301).

JacquesCarette commented 1 year ago

On "compiler": I think you're searching for the word 'transpiler'. We don't quite do source-to-source, but rather IR-to-IR. Though, really, in general we're doing DSL-to-DSL, which could be called "model transformation"!!! That's why I'm mostly thinking in terms of transformers. They do have a lot in common with compilers, but compilers bring up some unwanted connotations.

Yes, our solutions use a single 'architecture' that you label ICO. Note that this is at the level of the description of the solution. The actual solution must be traceable to the description, but doesn't have to match it! This is why a "big ball of mud" piece of code, with no modules and even functions, might still be an implementation. The link is via 'transformations' from the solution description that include inlining.

There's a lot more going on in this discussion, but I'd really want to draw some diagrams for some of the other pieces. And I'd really want the diagrams to be on a board, as interactive discussion might cause multiple redrawings of the diagrams.

balacij commented 1 year ago

I like the “transformers” terminology, so I'm content with this. I'll probably need to pick your brain about the terminology again later, specifically about the connotations.

Yes, and part of why I thought the conversation was most relevant here was because I was hoping it help us pin down the SystemInformation definition. I suppose by “big ball of mud,” you're referring to #3247's code example. I think that's a good concrete example. From how I understood the MIS in @smiths' lecture, there are many possible MIS documents that can exist for any problem, but there should always be a basic 'intersection' set of 'modules' between all possible MIS documents.

Since our next meeting is in-person, I've added it to our agenda :smile:

balacij commented 1 year ago

Also relevant: #3259

smiths commented 1 year ago

@balacij great discussion and ideas. I'll need to see more concrete examples before I fully understand where your brain is going, but I like it. :smile: With respect to their always being a basic set of modules between all possible MIS documents, I don't think that is true. I know there are some modularizations that come up when you think in terms of information hiding and a hierarchical uses relation, but information hiding and a hierarchical uses relation are heuristics that humans use for design for change. Generated computer code doesn't necessarily need to follow these human abstractions. Also, as much as I like design for change, it isn't the only criterion for design.

balacij commented 1 year ago

Thank you @smiths!

That makes sense. Maybe I should have sat on the idea a bit more before writing, it definitely has some holes and jumps...

It would be interesting to see if we could capture how the software requirements (e.g., assumptions, requirements, etc.) can be axiomatized (or captured as theories) so that we can directly show how a particular code generator generates solutions, with "proofs", or if a "solution" satisfies the axiomatic requirements.

JacquesCarette commented 1 year ago

There's a time and place for brainstorming, and a time and place for systematic, organized thought (and various things in between).

Brainstorming and analogies are great for getting the creative juices going and providing fodder to a later organization pass that makes sure that all the pieces fit into a well organized picture.

There's also lots of room for some ping-ponging between these. Grand ideas need to be grounded in

  1. code that works, and
  2. code that can be easily explained. (And I see that as two separate steps.) Staying at the idea level for too long is dangerous.
smiths commented 1 year ago

I agree with @JacquesCarette. There is also a useful activity between grand ideas and code - examples. I find it really helps me understand something if I can think of examples that use that something. For instance, in our Drasil meeting the other day the discussion about the relationship between assumptions and theories made more sense to me when we used the example of assumptions that restrict the dimensionality of a problem.

balacij commented 7 months ago

I have no idea where this issue discussion went in my notifications... Sorry!

I think my earlier discussion + a somewhat recent discussion on #3652 + #3481 + #3482 + #3259 are signals that we need a SystemKind-like data type that captures the kinds of programs we carry. An ICO SystemKind would automatically be associated with a "Functional Requirement" for output values and also be associated with a sanity check on ICO systems, that they were indeed defined as things with outputs of the discussed program families. I think this would close all of those tickets ... :sweat_smile:

Speaking of SystemInformation, shouldn't we just drop the Information part? It seems redundant, no?

smiths commented 7 months ago

I like the SystemKind idea. We don't have that many kinds at the moment, but that will likely change in the future. Using the SystemKing to distinguish programs from libraries seems like a way to capture information that we've been aware of for a long time, but haven't really codified. I'd be fine with saying System instead of SystemInformation.

balacij commented 7 months ago

Updated the OP with it, I suppose this issue can just be used to track evolution of System until we come up with a good definition of it.

JacquesCarette commented 7 months ago

Agree we can drop Information. And I agree that starting to enumerate the kinds of systems that we can talk about is a good idea. SystemKind is an excellent path forward.