JacquesCarette / Drasil

Generate all the things (focusing on research software)
https://jacquescarette.github.io/Drasil
BSD 2-Clause "Simplified" License
142 stars 26 forks source link

Where does information belong? #2195

Open JacquesCarette opened 4 years ago

JacquesCarette commented 4 years ago

This issue has been central to Drasil for a long time, but now there is an issue (#2123) that is bringing parts of the problem back to the fore.

In part, SystemInformation is a big hack. We never defined what a 'System' is, so it's very hard to know if some information belongs there or elsewhere.

For example: authorship. A single example can contain multiple authors. One person might have written the (original) code, another the SRS, and yet a third might have written the description of the 'system' as a whole. We need to give proper attribution (this is indeed very important to science), so we need to define the entities within Drasil that are things whose authorship is possible.

muhammadaliog3 commented 4 years ago

I will be dividing this comment into the followiing sections

  1. Code Investigation -- this time I explore systemInformation backwards, hopefully to get a different perspective. You also don't have to read it to understand the conceptual sections, it is more of a place to store my analysis.
  2. Conceptual Problems
  3. Conceptual (and high level code solutions)

Code Investigation

Potential new additions: _purpose (it should be a list of sentences), _configfiles (should be a list of files, it should probably be in some datatype such as type File = ConfigFile String )

Key findings: All the types seem to be find, except the authors should contain more then just author names and rather there emails, phones, and institutions, and kinds should not be restricted to just one value. _defSequence and _concepts, although conceptually they are useful there seems to be no practical use in any of the artifacts

Conceptual Problems

I think what got this issue started was that system information included so many different pieces of information, (each of different types and for different artifacts), it looked disorganized.

Requirements: Solution should make system information fields ordered, make system information fields grouped and make system information field types consistent.

Hence I will start by investigating some possible discriminators or clusters within system information:

After we pick the appropiate groups we need to pick the appropiate way to divide the groups, here are some possiblities

Some nice names I thought of: problemSpecifications/solutionSpecifications, personaInformation/systemInformation, SystemScience/(I couldnt think of the opposite ), systemChoices/systemSpecifications,

Conceptual (and high level code solutions)

I don't think we should have more then 3 groups as that could complicate things. I also don't think we should have a GOOL choices record, rather we should have it incorporated into "drasil choices"

I think that Hardscience vs (Softscience + Choices) would be the best split, along with a record of records, meaning we keep the same structure of keep one big record of "systemInformation" that contains all the raw chunk 'information'.

Hard Science: , _quants :: [e] , _definitions :: [QDefinition] --FIXME: will be removed upon migration to use of [DataDefinition] below , _datadefs :: [DataDefinition] , _inputs :: [h] , _outputs :: [i] , _constraints :: [j] --TODO: Add SymbolMap OR enough info to gen SymbolMap , _constants :: [QDefinition] , _sysinfodb :: ChunkDB , _usedinfodb :: ChunkDB

Soft Science: _sys :: a , _kind :: b , _authors :: [c] , _purpose :: d , _concepts :: [f] , _defSequence :: [Block QDefinition] , gool choices

JacquesCarette commented 4 years ago

Ok, commenting on each part. [Excellent investigation BTW]

Code Investigation

There are some problems I already see from the above analysis

  1. we don't know for sure what the intent of each component is supposed to be. [The name and the use are not necessarily good hints either]
  2. we still don't know what a "system description" is
  3. clearly some things are under-implemented, while other things have suffered from bitrot

Conceptual Problems

It's deeper than just not knowing what each piece of information is supposed to mean, there is also some uncertainty as to what this is supposed to represent!

Your discriminators are excellent. They, de facto, partly answer the question: what are the kinds of information that we've found useful to have as part of the description of a system? What the categories say, as subtext:

I agree that splitting system information into a record-of-records is probably the way to go. The danger is that we refactor this over and over. If we uses lenses properly, it's not such a big deal. So it's probably ok if we don't quite get this right the first time.

Nevertheless, we should still think a little harder about "what is a system description". What are the ingredients that make that up.

Conceptual & Solutions

Here is my thinking about what is in "system information"

  1. background knowledge pertinent to the problem
  2. a definition of the problem
  3. constraints that describe a "good" solution
  4. structure of the solution
  5. choices made in the solution
  6. people involved in the creation of 1-5.

I put 1-5 in that order because, I think, they depend on each other in that order. 6 is different, in that it is more meta and applies to all.

I definitely doubt that's the end of the story. But it's a (re)start.

smiths commented 4 years ago

The discussion above from @muhammadaliog3 and @JacquesCarette is very helpful. I definitely like the approach of "reverse engineering" what the Drasil code says, but being wary because we know the ad hoc way some of the code was developed.

My instinct is that for step 5 above (choices made in the solution), we should divide the choices into at least two categories: choices related to the requirements (physical model) and choices related to the design (software structure, data structures, algorithms). I believe that most of our decisions are currently related to design, but when we get further with the notion of a family of programs, we will also have physics related variabilities. For instance, in GlassBR, we currently assume that LSF (load share factor) is 1.0, because we have only one pane of glass. We could remove this assumption and have another member of the family available.

@JacquesCarette during one of our group meetings last summer you brought up the concept of refinement. I think you had a different name for it though. The idea is that we have one model and then we make decisions and then we have a new model. I don't know the proper terminology, but I remember feeling that what you were presenting could give us a structure within which we could place our different ideas.

muhammadaliog3 commented 4 years ago

I think what is going on is that some things are obtained from using Map.elems on Map’s from CDB’s. In other words some times you just want all of the information rather then specific information, that is why we probably don’t need to store all the definitions/concepts…..

Here is a good example of how all of the general definitions are used by just using the general definition map.

  | t `elem` keys (s ^. gendefTable)          = makeRef2S $ gendefLookup      t (s ^. gendefTable)

Possible fields we could include in system information. NOTE: I know that the drasil philosophy is to only add things when they are needed, but I think that if we put some extra useful information in system information it would encourage other artifacts to reuse it in some way. This can hopefully spur some ideas.

Finally answering, what describes or defines a system (@smiths could help with this)

Even more concretely a system defines:

Current system information storage strategy:

NEW SYSTEMINFORMATION:

systemProblem

systemScience

systemExplanation

systemArtifacts

systemAuthors

JacquesCarette commented 4 years ago

@smiths the kinds of refinement I was thinking of are "theory specialization", when you take a generic theory and instantiate some of its parameters to something more specific [and often then simplify the resulting model.]

@muhammadaliog3 Nice stab on "what describes or defines a system". @oluowoj this is the kind of speculating from observed data that we'd like you to be doing.

What you've provided is what I would call a really good first "brainstorming" of our current ingredients that describes/define a system. And you also gave a good first stab at categorizing that information. You are correct that all the information you list is part of a system description -- but we need to know the origin of each piece of information, what it means, whether it is human-specified or derived from some other basic information, etc. I need to try to circle back to this some time this week.

JacquesCarette commented 4 years ago

@smiths I think we should have a meeting about this. Might even make sense to have an all-hands?

smiths commented 4 years ago

Yes, a meeting is a good idea, but I won't be available tomorrow or next week. Next week is a vacation week, and tomorrow is the day where I have to get all my work done so that I can go on vacation. :-) I'll try to keep up on e-mail, but I won't be able to do a meeting until the week of August 4th.

balacij commented 2 years ago

By any chance, did the meeting (mentioned above) occur? If so, are notes public anywhere?

smiths commented 2 years ago

@balacij - No, I do not believe that a meeting took place.

JacquesCarette commented 2 years ago

It didn't. This is still an issue that is quite open.

balacij commented 2 years ago

Thank you, @smiths! :smile:

Hopefully we can continue this discussion soon. After reading Dr. Carette's changes to the Information Encoding wiki page, I think I understand a lot more thanks to having it formalized it and sitting in front of me. I've also had some thoughts related to this recently (albeit in a roundabout way -- analyzing our package structure). I just need to formalize them, and then I'll try to post my own thoughts too.

balacij commented 2 years ago

Ah, thank you as well, @JacquesCarette (I posted just 14s after you, but didn't see your comment until after I posted)! The latest changes to Information Encoding have been quite eye-opening.

smiths commented 2 years ago

Yes, we should discuss this issue again soon.