Open JacquesCarette opened 4 years ago
I will be dividing this comment into the followiing sections
Code Investigation
_sysinfodb -------- This is a map that contains almost all of a drasil example, it contains all symbols (except input and output), all concepts, all units, tracebility map, reference map, all of the data definitions in a single map, instance/theory/general definition map, conceptInstance map, section map, and labelled content map (which is figures but usually graphs).
_usedInfodb ------- I could not figure out what this was. What we do know is that this is a chunkDB with at max 2 fields filled out (atleast in the examples), a termMap and/or ConceptMap.
_refbyMap ----------- self explanetory
_constraints, _constants, _inputs, _outputs, _datadefs -------- self explanatory, they contain all the instances
_defSequence -------------- this is a list of the following
data Block QDefinition = Coupled QDefinition QDefinition [QDefinition] | Parallel QDefinition [QDefinition]
It is empty in some examples (such as Projectile) and filled in other examples (such as SSP) However it is not used within drasil-code or drasil-docLang. It could, therefore, be removed without issue. Or it could be something that is not implemented yet.
_concepts ------------- is always empty in all the examples and is not used within drasil-code or drasil-docLang
_definitions ------------- are empty list in some examples and in other examples are a list of QDefinitions from various places (theory+instance+genDefinition models, data definitions, etc.). In CodeSpec.hs they are used to get derived inputs incase there are to data definitions and then ultimately to create an execution order of code definitions.
_quants --------------- all symbols that are not input and not output symbols
_authors -------------- list of persons
data Person = Person { _given :: String
, _surname :: String
, _middle :: [String]
, _convention :: Conv
} deriving (Eq)
It could potentially contain more, such as telephone, email, institution
_sys ---------- self explanatory, but it is commonIdeaWithDict, hence it includes an abbrehviation, shorrt short and full title.
_kind ----------- self explanatory but it is any chunk with an idea, so for example srs. There could be an improvement here to allow more then just one kind of artifact, which will most definately happen in the future. Therefore it should be changed to a "[c]" where "c" is any chunk with an idea.
Potential new additions: _purpose (it should be a list of sentences), _configfiles (should be a list of files, it should probably be in some datatype such as type File = ConfigFile String
)
Key findings: All the types seem to be find, except the authors should contain more then just author names and rather there emails, phones, and institutions, and kinds should not be restricted to just one value. _defSequence and _concepts, although conceptually they are useful there seems to be no practical use in any of the artifacts
Conceptual Problems
I think what got this issue started was that system information included so many different pieces of information, (each of different types and for different artifacts), it looked disorganized.
Requirements: Solution should make system information fields ordered, make system information fields grouped and make system information field types consistent.
Hence I will start by investigating some possible discriminators or clusters within system information:
After we pick the appropiate groups we need to pick the appropiate way to divide the groups, here are some possiblities
make a different record for each group, that is split system information into 'n' records where 'n' is the number of groups
Make a record of records. I.e. make each of the groups a record within the system information, so system information will contain 'n' fields where 'n' is the number of groups. Each field will ITSELF be a record.
Leave system Information but split the groups with code COMMENTS
Leave most of system information the same but make the fields of contention (such as authors, purpose, kinds and name, like the fields that started this issue) a DATABASE within system information.
Some nice names I thought of: problemSpecifications/solutionSpecifications, personaInformation/systemInformation, SystemScience/(I couldnt think of the opposite ), systemChoices/systemSpecifications,
Conceptual (and high level code solutions)
I don't think we should have more then 3 groups as that could complicate things. I also don't think we should have a GOOL choices record, rather we should have it incorporated into "drasil choices"
I think that Hardscience vs (Softscience + Choices) would be the best split, along with a record of records, meaning we keep the same structure of keep one big record of "systemInformation" that contains all the raw chunk 'information'.
Hard Science: , _quants :: [e] , _definitions :: [QDefinition] --FIXME: will be removed upon migration to use of [DataDefinition] below , _datadefs :: [DataDefinition] , _inputs :: [h] , _outputs :: [i] , _constraints :: [j] --TODO: Add SymbolMap OR enough info to gen SymbolMap , _constants :: [QDefinition] , _sysinfodb :: ChunkDB , _usedinfodb :: ChunkDB
Soft Science: _sys :: a , _kind :: b , _authors :: [c] , _purpose :: d , _concepts :: [f] , _defSequence :: [Block QDefinition] , gool choices
Ok, commenting on each part. [Excellent investigation BTW]
Code Investigation
defSequence
is suppose to define "sequences of definitions", either coupled or that can be done 'in parallel'. It's probably needed for something that was only partly implemented. I would look at PR #1664 and issue #287 for the origins of this.There are some problems I already see from the above analysis
Conceptual Problems
It's deeper than just not knowing what each piece of information is supposed to mean, there is also some uncertainty as to what this is supposed to represent!
Your discriminators are excellent. They, de facto, partly answer the question: what are the kinds of information that we've found useful to have as part of the description of a system? What the categories say, as subtext:
I agree that splitting system information into a record-of-records is probably the way to go. The danger is that we refactor this over and over. If we uses lenses properly, it's not such a big deal. So it's probably ok if we don't quite get this right the first time.
Nevertheless, we should still think a little harder about "what is a system description". What are the ingredients that make that up.
Conceptual & Solutions
Here is my thinking about what is in "system information"
I put 1-5 in that order because, I think, they depend on each other in that order. 6 is different, in that it is more meta and applies to all.
I definitely doubt that's the end of the story. But it's a (re)start.
The discussion above from @muhammadaliog3 and @JacquesCarette is very helpful. I definitely like the approach of "reverse engineering" what the Drasil code says, but being wary because we know the ad hoc way some of the code was developed.
My instinct is that for step 5 above (choices made in the solution), we should divide the choices into at least two categories: choices related to the requirements (physical model) and choices related to the design (software structure, data structures, algorithms). I believe that most of our decisions are currently related to design, but when we get further with the notion of a family of programs, we will also have physics related variabilities. For instance, in GlassBR, we currently assume that LSF (load share factor) is 1.0, because we have only one pane of glass. We could remove this assumption and have another member of the family available.
@JacquesCarette during one of our group meetings last summer you brought up the concept of refinement. I think you had a different name for it though. The idea is that we have one model and then we make decisions and then we have a new model. I don't know the proper terminology, but I remember feeling that what you were presenting could give us a structure within which we could place our different ideas.
I think what is going on is that some things are obtained from using Map.elems on Map’s from CDB’s. In other words some times you just want all of the information rather then specific information, that is why we probably don’t need to store all the definitions/concepts…..
Here is a good example of how all of the general definitions are used by just using the general definition map.
| t `elem` keys (s ^. gendefTable) = makeRef2S $ gendefLookup t (s ^. gendefTable)
Possible fields we could include in system information. NOTE: I know that the drasil philosophy is to only add things when they are needed, but I think that if we put some extra useful information in system information it would encourage other artifacts to reuse it in some way. This can hopefully spur some ideas.
A small description, such as a name and purpose.
Problem statement, that defines constraints and input chunks with the proper units/types.
All of the relevant background information, such as sample input files, config files, and work from other people that are mentioned in the citations
Solution/output, which includes the desired chunks, their units/types, and constraints
Appropriate quantities, instance/thoery models etc. hard science that provides a way between the input and output
Appropriate justification for using the methods defines in “hard sciences”, such as theory models, derivations, concepts, assumptions
Specifies kinds of artifacts to present the solution and choices made in REPRESENTATION of the artifacts (choices made with regards to actual scientific content should belong in the justification section)
All the people involved
A possible split that I did not include has to do with separating the hard sciences. This split was separating defining symbols, defining equations, and solving equations. This is because these are often interwoven together so keeping them in multiple places could create confusion.
A possible section I did not include was “a system should be able to reference itself”, rather this should be included in the “presentation of the artifacts”.
Even more concretely a system defines:
Current system information storage strategy:
systemProblem
systemScience
systemExplanation
systemArtifacts
systemAuthors
@smiths the kinds of refinement I was thinking of are "theory specialization", when you take a generic theory and instantiate some of its parameters to something more specific [and often then simplify the resulting model.]
@muhammadaliog3 Nice stab on "what describes or defines a system". @oluowoj this is the kind of speculating from observed data that we'd like you to be doing.
What you've provided is what I would call a really good first "brainstorming" of our current ingredients that describes/define a system. And you also gave a good first stab at categorizing that information. You are correct that all the information you list is part of a system description -- but we need to know the origin of each piece of information, what it means, whether it is human-specified or derived from some other basic information, etc. I need to try to circle back to this some time this week.
@smiths I think we should have a meeting about this. Might even make sense to have an all-hands?
Yes, a meeting is a good idea, but I won't be available tomorrow or next week. Next week is a vacation week, and tomorrow is the day where I have to get all my work done so that I can go on vacation. :-) I'll try to keep up on e-mail, but I won't be able to do a meeting until the week of August 4th.
By any chance, did the meeting (mentioned above) occur? If so, are notes public anywhere?
@balacij - No, I do not believe that a meeting took place.
It didn't. This is still an issue that is quite open.
Thank you, @smiths! :smile:
Hopefully we can continue this discussion soon. After reading Dr. Carette's changes to the Information Encoding
wiki page, I think I understand a lot more thanks to having it formalized it and sitting in front of me. I've also had some thoughts related to this recently (albeit in a roundabout way -- analyzing our package structure). I just need to formalize them, and then I'll try to post my own thoughts too.
Ah, thank you as well, @JacquesCarette (I posted just 14s after you, but didn't see your comment until after I posted)! The latest changes to Information Encoding
have been quite eye-opening.
Yes, we should discuss this issue again soon.
This issue has been central to Drasil for a long time, but now there is an issue (#2123) that is bringing parts of the problem back to the fore.
In part,
SystemInformation
is a big hack. We never defined what a 'System' is, so it's very hard to know if some information belongs there or elsewhere.For example: authorship. A single example can contain multiple authors. One person might have written the (original) code, another the SRS, and yet a third might have written the description of the 'system' as a whole. We need to give proper attribution (this is indeed very important to science), so we need to define the entities within Drasil that are things whose authorship is possible.