Open balacij opened 3 years ago
Impressive work @balacij. Your observations seem on point to me, but @JacquesCarette is a better judge of the future direction of the design of Drasil. We should make a point of using the knowledge in this issue. In particular, the shallow analysis seems like a summary that should find its way (in some form) into one of our Wikis.
Thank you, @smiths! :smile: Hopefully so, I think some of the "shallow analysis" could also go into the main package.yaml
files and the README.md files too.
Huge amount of information here. And lots of good questions. So, to be able to eventually close this issue, I'm going to spin off a bunch of issues, each of which is related to what's here, but contains more 'actionable' material. When it is more 'purely informational', I'll make comments here. Eventually, we may want to extract the knowledge from here and put it on the wiki and/or in the READMEs.
Thank you, that sounds like a great idea!
On drasil-data
: it is a "database", done as a set of Haskell files that contain only declarations of 'chunks'. The 'chunk' part is not so important, the important part is that it uses a host of different encoding data-structures.
Furthermore, it spans from simple knowledge to rather complex knowledge (i.e. theories), with everything in between. There is also a cross-cutting arrangement where the 'knowledge' encoded comes from many different application domains.
Right now, we don't know how to organize this. For sure, internally to drasil-data
, it's partly organized, partly a mess. To detangle things, we should understand our own knowledge encodings well enough to understand exactly what kinds of "level mismatches" we've created. We also need to understand what classification seems to be the most natural to use -- level? application domain? both? neither?
In other words, I don't think we're even close to ready to do something sensible with it.
where should we define how encodings get printed into other?
An excellent question indeed! What you are witnessing here is a very long evolution, much of it in parallel, of various pieces. And you have keenly observed is that there's almost a pattern to it all, with emphasis to almost. Better yet, you've distilled some potential arrangements, and given your opinion about pros and cons of them.
The question is regarding placement. Should the lower-level encodings contain information about how higher-level encodings can be rendered into lower-level encodings, or should higher-level encodings contain a property that dictates how they can be "viewed"/printed as lower-level encodings?
That is indeed an excellent question. I don't fully see examples of the lower-level dictating to the upper ones how to do their jobs (specific examples would be great), but your point stands: the XML+CSS/XSL design is one that has proven its worth. To spell it out: it is good to have a 'semantic' data encoding (XML) that is coupled with instructions on how to render. These instructions can have all sorts of options, which in turn are all built on top of a single rendering language. In theory, the language in drasil-printers
is meant to be an "abstract rendering language" which can then be 'lowered' further to HTML, LaTeX, etc.
On drasil-database
and SystemInformation
: the fundamental problem here is that we don't have a good explanation of what these ought to be. And certainly they were created before our current understanding of many things.
For example, we have all sorts of type classes for our "chunks" having all sorts of data. And then higher-level routines can be polymorphic by, instead of asking for specific representations, can ask for any representation that has the data needed.
Our 'artifact generators', at the highest level, should work the same way: take a data-representation that has all the needed information, pulls it, and does its job.
Our current design is instead to create one monster representation, which we could call KitchenSink
instead of SystemInformation
, which has everything any part could possibly want. So our current process is thus
Instead, the first step we need is an understanding of the requirements of each artifact that we're going to generate. [Not globally for all, locally for each we treat.] Then an encoding of the information needed to actually generate those artifacts. Note that this links in with the above question of 'style choices' that the printers want.
That could then lead to a proper decoupling of various pieces, so that we could properly re-use Drasil infrastructure for drasil-website
without going through the currents hacks. But don't have the exposed API for all the parts that would allow us to do that. In part because too many parts use the KitchenSink
approach.
There's a whole lot more content in this issue, but I think the spun-off issues, and the comments above, already are a lot of work. So when that flurry dies down, a revisit (or two or three) will be needed.
Re: drasil-database
:
In other words, I don't think we're even close to ready to do something sensible with it.
Sounds good, I can see why. Hopefully it will become more clear later on.
An excellent question indeed! What you are witnessing here is a very long evolution, much of it in parallel, of various pieces. And you have keenly observed is that there's almost a pattern to it all, with emphasis to almost. Better yet, you've distilled some potential arrangements, and given your opinion about pros and cons of them.
Thanks!
The question is regarding placement. Should the lower-level encodings contain information about how higher-level encodings can be rendered into lower-level encodings, or should higher-level encodings contain a property that dictates how they can be "viewed"/printed as lower-level encodings?
That is indeed an excellent question. I don't fully see examples of the lower-level dictating to the upper ones how to do their jobs (specific examples would be great), but your point stands: the XML+CSS/XSL design is one that has proven its worth. To spell it out: it is good to have a 'semantic' data encoding (XML) that is coupled with instructions on how to render. These instructions can have all sorts of options, which in turn are all built on top of a single rendering language. In theory, the language in
drasil-printers
is meant to be an "abstract rendering language" which can then be 'lowered' further to HTML, LaTeX, etc.
Thanks. The lower-level encodings don't really dictate the conversion "well"/uniformly because they write the instructions nearby under nearby "floating" functions (it's not as "uniform" of a pattern as much as a function belonging to a typeclass).
A good example can be drawn from drasil-printer
's Language.Drasil.Printing.Import.*
.
Specifically, we can see .Space
:
https://github.com/JacquesCarette/Drasil/blob/a1c22b739c958ae169e8fe4fb2ea2b0a670dc7df/code/drasil-printers/lib/Language/Drasil/Printing/Import/Space.hs#L14-L18
Unit symbols .Symbols
:
https://github.com/JacquesCarette/Drasil/blob/a1c22b739c958ae169e8fe4fb2ea2b0a670dc7df/code/drasil-printers/lib/Language/Drasil/Printing/Import/Symbol.hs#L38-L43
These both are floating, but if we had a typeclass:
class CanGenMathExprs t where
toMathExpr :: t -> Printing.Expr
we would just use a common toMathExpr
for any applicable type for which it's defined. Alternatively, I guess we can try to parameterize further with an output variable, and another for "named" typeclass instances;
class CanGen i o ctx where
ctxPrint :: i -> o
instance CanGen Math.Symbol P.Expr 'SomeCtx where ctxPrint = _
I think we would just need type applications to access the ones for a particular 'context'. I'm not too sure how helpful this variant would be, I thought it might be an interesting way to get different 'printing' styles. Embedded 'printable' things in other 'printable' things means that the single type parameter used would need to be something each embedded 'printable' type would need to be defined for, and it would also be implying that a single type would need to carry enough information for the existing "PrintingInformation" for all embedded types. To overcome the ChunkDB
part, we would just add it as a parameter to ctxPrint
, but the other components might get messy for various combinations, or when PrintingConfiguration's size increases for extra configuration options. This would be a potential option for @cd155 primarily for the different styles in printing ODEs, I think, but I'm completely unsure if it's a good idea or not, it will require a bit more investigation. The interesting thing about SomeCtx
is that it could be a whole "style" of printing things in a layout (e.g., SRS variants, etc, or subvariants of certain SRS variants, etc).
In theory, the language in drasil-printers is meant to be an "abstract rendering language" which can then be 'lowered' further to HTML, LaTeX, etc.
Yes, taking this example, drasil-printers
would become a package with, strictly, it's own encoding of it's "abstract rendering language" and an "instance" of some typeclass, as above, which "lowers" it into HTML/LaTeX/etc. The dependencies of drasil-printers
would minimize to just the packages for the HTML/LaTeX/etc
-generation, while drasil-lang
would gain a dependency for drasil-printers
as it would also need to "instantiate" a typeclass for lowering things into the abstract rendering language of drasil-printers
.
On
drasil-database
andSystemInformation
: the fundamental problem here is that we don't have a good explanation of what these ought to be. And certainly they were created before our current understanding of many things.
Would you say we have a good explanation of why ChunkDB
exists? To my knowledge, I understood that within drasil-database
, it was only SystemInformation
that we didn't have a good definition for (this was one of the specific examples in #2195). With my version of ChunkDB
(from #2873) and UID
s, it would make sense, to me, that they would be a core bundle that would be the "least"-required components for Drasil to be used (e.g., all components would rely on it in some way [registration in a knowledge-base for usage, printing, etc]). As such, they'd become the only datatypes in drasil-database
and SystemInformation
would be removed or moved elsewhere, or they would be moved, together, into a new drasil
"core package/drasil-core
.
For example, we have all sorts of type classes for our "chunks" having all sorts of data. And then higher-level routines can be polymorphic by, instead of asking for specific representations, can ask for any representation that has the data needed. Our 'artifact generators', at the highest level, should work the same way: take a data-representation that has all the needed information, pulls it, and does its job.
If I'm understanding you correctly, I believe that this is the design I'm also thinking of with my above discussion of "reversing dependencies" and "forcing printing qualities to be properties of the higher-level encodings".
Our current design is instead to create one monster representation, which we could call
KitchenSink
instead ofSystemInformation
, which has everything any part could possibly want. So our current process is thus
- collect all the information
- stick it all in one place
- pass it to everyone That worked for a while, but is now fraying. All 3 pieces suffer, in different ways, from this monolithic design. In particular, it is hard to have automation that derives new information from old.
Would we be able to remove SystemInformation
completely in favour of using cast
more often to assisting in grabbing chunks from ChunkDB
s instead? This would allow us to place completely "foreign" types into a ChunkDB
, and since we're using TypeReps to grab data en masse (e.g., for a specific type) and UIDs+TypeReps to grab singular data instances. This would be helpful for new user libraries that build on Drasil but are not upstreamed.
Though, I'm uncertain of 2 problems:
Data.Typeable (.., cast)
usage problematic in any way? It looks like a safe version of unsafeCoerce
but it might still be anti-pattern.Systeminformation
contain 2 kinds of ChunkDB
s; "sysinfodb", and "usedinfodb"? It seems like a printer of a chunkdb should know which chunks are "sysinfo"/"used"/etc for itself by treating those chunk types as "relevant or not" to their printing goal.Instead, the first step we need is an understanding of the requirements of each artifact that we're going to generate. [Not globally for all, locally for each we treat.] Then an encoding of the information needed to actually generate those artifacts. Note that this links in with the above question of 'style choices' that the printers want. That could then lead to a proper decoupling of various pieces, so that we could properly re-use Drasil infrastructure for
drasil-website
without going through the currents hacks. But don't have the exposed API for all the parts that would allow us to do that. In part because too many parts use theKitchenSink
approach.
Perfect, this will be great for drasil-lang
/ #2885. I think this also is relevant to my discussion in my last comment (above this one, regarding CanGen
).
There's a whole lot more content in this issue, but I think the spun-off issues, and the comments above, already are a lot of work. So when that flurry dies down, a revisit (or two or three) will be needed.
Sounds good.
These discussions are getting too big - it would make sense to start new issues when the commentary is going to be more than just a few lines.
Re: printing and classes like CanGenMathExprs
- maybe. I've got some emails about that from multiple years ago. It's actually kind of a tricky design point where Haskell classes don't always quite fit. So it needs a proper design, which means that it first needs a proper analysis. Certainly it's not worth creating classes (in general) if there is only a single instance of it. I'll also email you some design discussion on that topic from a while back.
Re: drasil-database
, etc.
We have a currently adequate explanation of ChunkDB
: it's a container of things with UIDs. Note that that's probably the full explanation as well, which is perhaps unsatisfactory!
I think SystemInformation
was supposed to be just as its name implies: the necessary information for a "system". It was supposed to be arranged so that various different kinds of information would be assembled in it, and we would know a priori what that information was (thus the many fields). But it never quite got there. The rationale for each fields has been lost, and that rationale cannot be adequately reconstructed. So it is probably best to get rid of it, until such a time as we can get a decent explanation for what it ought to be, and even if it ought to exist.
Would we be able to remove SystemInformation completely in favour of using cast more often to assisting in grabbing chunks from ChunkDBs instead?
Here I'm less happy with the question: it mixes design and implementation/solution too much. The actual content of the question may be fine though, in that maybe cast
could end up being part of the implementation of a good design.
To a certain extent, cast
is an anti-pattern. In the sense that Haskell is great because of its static typing, and cast
works firmly against that grain. unsafeCoerce
is the same and different: if used wantonly, it's an anti-pattern. If used as part of a dedicated optimization pass in some efficiency-critical low level layer, that's quite different. Same with cast
: it might be useful because the situation is fundamentally dynamic, or Haskell's type system is not up to expressing the types in an ergonomic way.
If I recall, sysinfodb
was meant to be where we assembled all the information that might be used in a system, and usedinfodb
was the stuff that was actually used. So things reachable from usedinfodb
should appear in the glossary, table of symbols, etc; but its mere presence in sysinfodb
wouldn't trigger anything like that.
Re: moving to new tickets: Will do. I will refrain for this response because I think it will be relatively short. The CanGenMathExprs
will almost definitely need to go into a new ticket, however.
Re: SystemInformation
: Sounds good.
Re: Removing SystemInformation
: I understand. I very briefly mentioned replacing the lists in SystemInformation
in #2873, but I didn't expect that this would become the conclusion (assuming it is).
Re: cast
: That makes sense, thank you. The dynamic nature of what I was planning, was to have them be as open as possible, which might even be too open. It's interesting. I guess this can be something we revisit soon if it becomes problematic, but, immediately, it looks like a robust solution.
SUPER minor note here, but from our conversation in #3711, a quick Google search seems to indicate that "codebase" as one word is most common, but this isn't recognized by my Firefox spell checker. "Code base" also seems to be correct and the second most common, with "code-base" taking up the rear. @JacquesCarette
I've been running into a few problems with dependencies (and I've also caused one... see:
drasil-code-base
) because I'm unsure of where code (primarily regarding 'printing') should be placed. I've also never seen any references (in the code) of theSmithEtAl
template we base our generated artifacts on. Additionally, with #2873 in mind, I felt thatdrasil-database
, as a package, was a bit peculiar because it containedSystemInformation
in the same area as theChunkDB
(which, in my opinion, shouldn't have any dependencies, nor be related to any chunks [arguably, other than itself]). Finally, our package READMEs and descriptions are a bit confusing to me, they don't really describe the package dependencies, but the mainDrasil.md
file makes sense, but appears to be outdated.As such, with these things in mind, I am going to try to understand our packages, and how it relates to the foundational theory behind Drasil.
Shallow Analysis
First, let us start off by naively observing and analyzing the
drasil-*
packages:drasil-build
:Makefile
language, and a printer for the AST toDoc
(pretty
)Makefile
AST, smart constructors for building up aMakefile
, and a printer for the AST to be rewritten as aDoc
(pretty).drasil-code
:drasil-buld
.drasil-code
data types.drasil-lang
,drasil-printers
, anddrasil-theory
.drasil-code-base
entirely.package.yaml
file. It currently contains a manually written list of module files, caused by at least 1 HS file going unused.drasil-code-base
:drasil-printers
anddrasil-code
.drasil-printers
relies on it, and whydrasil-code
relies ondrasil-printers
.drasil-data
:package.yaml
file. It currently contains a manually written list of module files.drasil-data-physics
,drasil-data-mathematics
, etc.drasil-database
:ChunkDB
andSystemInformation
, and has a few 'helper' functions for working with theChunkDB
.drasil-docLang
:SystemInformation
out ofdrasil-database
, I think it would be good to also move the code fromdrasil-docLang
alongside it because the code is highly-coupled to the "SmithEtAl" template. This isn't to say that it shouldn't be exposed however, I think it should be exposed so that other printers/template engines can also base theirs off of Dr. Smith's template (potentially the updated variant that Dr. Smith mentioned on Monday's discussion).drasil-printers-smith-et-al
package?drasil-example
:examples
just to move it away from the "fundamentals"/drasil-*
namespace.drasil-gen
:drasil-printers-smith-et-al
package since it's highly coupled together with them.drasil-gool
:Doc
.drasil-utils
, and primarily for textual needs and list 'helper' functions. However, sincedrasil-utils
relies ondrasil-lang
, this package also only builds after drasil-lang, when it really shouldn't be impacted bydrasil-lang
's priority in the GHC construction plan.drasil-lang
:drasil-lang
the "root"/"base" package in Drasil since most other packages import it.drasil-metadata
:drasil-printers
:drasil-lang
-things into it's own "General Science Printing" AST.drasil-theory
:drasil-utils
:drasil-lang
.drasil-utils
, they also, by extension, have a potentially unused dependency ondrasil-lang
.drasil-lang
's source files because not all packages needdrasil-lang
files to be compiled before them.base
package rather than anydrasil-*
package.drasil-website
:SystemInformation
, but it contains no models, inputs/outputs, math, and isn't intended to generate any SRS or code. TheSystemInformation
seems inappropriate to be used (hence the empty lists). This is also likely evidence for a need to splitSystemInformation
into different variants.Slightly deeper, but still fairly shallow, observations, and discussion:
With a focus on observing the packages:
We have many "encodings" for things (ASTs, chunks, etc), and "printers" that either print "encodings" into other "encodings" or directly into artifacts (primarily,
Doc
s at the moment):drasil-build
anddrasil-gool
contain no dependencies, but contain encodings, an AST, and a printer for their ASTs (intoDoc
s). I would call them "near base-level" because, in the most obvious sense, they describe and produce end-user software artifacts. Of course, "base and higher" have a very different meaning when looking at encodings (they might mean different things, it's likely better to have relative terminology in the future). In some sense, other encodings might also be the target "end-user" artifacts, and we might call them to be "higher" thandrasil-build
anddrasil-gool
, so they can be thought of as both "high" and "low", it just depends on your scope because there might be encodings that sit above them too.drasil-code
&drasil-code-base
contain their own ASTs and encodings for various information related to code generation. They neatly tie together other ASTs and encodings from drasil-build
,gool
,lang
, andtheory
as a part of another, larger, cohesive "printer" geared towards "generating software artifacts". It is an intermediary betweenlang
&theory
andgool
&build
with a larger goal of generating "distributable software". It's at a "higher level" thandrasil-gool
anddrasil-build
because it isn't intended to generate artifacts on it's own, but through their artifact generators.drasil-printers
contains both a general "science/math document language", and multiple printers for printing various data encodings fromdrasil-lang
into it's document language.drasil-docLang
is another example of this, where it contains another layer of knowledge above the general "science/math document language" but with a specific ordering (currently, seemingly coupled to the "SmithEtAl" template).pretty
'sDoc
s [more on this later]). In other words, a (while still low) higher level encoding dictates how you can 'push out'/render it into a lower-level encoding.A
encoding toB
encoding, by understanding it as a property ofA
s encoding [aside: it can be thought of as "a component of a kind of more general version of ModelKinds"]. In other words, this will become declared as a property, which we should be able to encode later..Development
modules because we would be inverting many of our dependencies.drasil-code-base
entirely, by merging it's contents back intodrasil-code
, cleaning up dependencies in general, and making "finding printers" generally easier.instance CanProduceLowerEncoding HigherLevelEncoding where toLowerEncoding = ...
). We can also add extra parameters for these printers for each different "configuration" we want to see from printers. This might also assist in Dong's current work with the rendering styles for Linear DE Models.drasil-database
:ChunkDB
and maybe a few other chunks (realistically, I can only think ofUID
s fromdrasil-lang
, so it might be singular) become a self-contained unit, and they are a rather fundamental component to collecting knowledge/chunks. In which case, the dependencies fordrasil-database
are all forSystemInformation
. I wonder if it's appropriate to moveChunkDB
into a newdrasil-core
package on it's own (realistically,ChunkDB
s are a fundamental component for all drasil "systems" & examples because they are where knowledge is collected for the top-most-level printer to use.). Alternatively, I wonder ifSystemInformation
should exist at all with the new functionality that the newChunkDB
s could provide, or ifSystemInformation
should be moved to another package...ChunkDB
&UID
s into a newdrasil-core
package, this new package would contain strictly the fundamentals for "knowledge management". It's not necessarily fundamental to the theory behind Drasil, but it is an important component, nevertheless, in practice. This would leaveSystemInformation
as the only construction left inside ofdrasil-database
...drasil-database
'sSystemInformation
&drasil-docLang
are both seemingly connected by a common denominator; the code and the SRS documents (the template):SystemInformation
is the top-most-level encoding for theSmithEtAl
template printing. It is used to print out an SRS, and to print out/generate code.drasil-docLang
contains a document language and a lot of components that are highly coupled with theSmithEtAl
template. I wonder if we should be makingdrasil-docLang
a slightly simpler document language in favour of moving the parts that are more coupled with theSmithEtAl
template to a newdrasil-smithEtAl
package. This would potentially allow us to create other flavours of the document, or Dr. Smith's latest variant that he mentioned in our last meeting. The very nice functions used to build up anSI
(SystemInformation
) could also be used to restrict allowed "Chunks" into a "system" (undefined).drasil-lang
should be replaced as the "root" package by the either the new, potential, "drasil-core" package, or the slimmeddrasil-database
package.drasil-printers
:drasil-lang
into it.Doc
s. I think this should be moved intodrasil-database
, alongsideChunkDB
s [this was actually a part of my intended design forChunkDB
s] because it should be completely chunk-agnostic.drasil-lang
:drasil-database
, I believe, will (and should) replacedrasil-lang
as the root package, assuming we moveUID
fromdrasil-lang
intodrasil-database
(assuming we choose to keepChunkDB
inside ofdrasil-database
).The "SmithEtAl" template:
drasil-docLang
, in the form of special attention to the sections of the "SmithEtAl" template (also, deals withSystemInformation
). Since I'm primarily thinking of the "SmithEtAl" template as a template for "software requirements of scientific software" and I haven't had much exposure to too many other templates, I might be extending my own definition a bit too far, but there are still specific hard-coded components for general SRS documents, and the format we adhere to. I might be wrong, but I think we can still decompose further, to further add abstraction/ambiguity to the relationship or to allow for different printer configurations. However, the fact that we don't have any sort of "main entry point"/module/subpackage/package for containing the "SmithEtAl"-related code, but we generate SRS documents adhering to the template should indicate a possible coupling issue. b.drasil-printers
, in the form of all of the printers containing code which is specific to SRS documents (and, since we realistically only generate "SmithEtAl" templated documents, it's likely primarily for the template) c.drasil-code
, in the form of "composing printers" indrasil-printers
withdrasil-gool
anddrasil-build
d.drasil-database
, in the form ofSystemInformation
Again, slightly deeper observations
At a sky-high level, everything is an encoding of either data/phenomena, or a translation of knowledge in encodings (often one-way -- "printing").
ChunkDB
]) <- "high" knowledge density (again, realistically, a relative, or even false, sense of depth). This would also contain properties of "pushouts" ("lower" encodings) as "views" of the higher encodings.Pretty's
Doc
is still a phenomena to Drasil. The same goes for all imported libraries we use. It might be difficult, but we should consider not having any imports, but building all things from scratch (this effort would certainly not go to waste, because there will surely be a domain where these encodings are a part of the domain, and we shouldn't be constrained by using other libraries which we might not be able to edit easily). Afterwards, the final actions of the impure "IO"-related things will be the final phenomena (which I'm unsure of how we can sufficiently teach Drasil). Finally, through this, we will have a better understanding of how Drasil will need to, eventually, describe Drasil.Final Observations
ChunkDB
/knowledge-base and do something with that), but I don't think that should disambiguate or diminish the openness of what a "system" is, as defined in common dictionaries.SystemInformation
as the base entry point to the template, the requirements would be as follows: a. Knowledge-base must contain a list of authors b. Knowledge-base must contain a purpose c. Knowledge-base must have Input variables and Output variables (instead of directly placing these as QuantityDicts, we should place these wrapped in their own Input and Output data wrappers so that we can pull them directly from theChunkDB
) d. Knowledge-base must contain output constraints (these are very nice!ChunkDB
. Then, we can have "systems" that impose "this knowledgebase should only include 1 X, 2 Ys, any amount of Zs, no As, etc". Of course, this is heavily reliant on the proposedChunkDB
design I proposed in #2873.ChunkDB
s, we can do that too. This would be a stricter definition, where requirements are imposed by gathering "the correct types" from the system (in other words, we would be bunching up our knowledge/chunks by their type representations [TypeRep
s fromData.Typeable
] and imposing restrictions based on those found in theChunkDB
). Then, the "process" component of the System interface ~~class System where process :: ChunkDB -> IO ()
(this would probably be different, but it should paint the right picture) should be fairly straightforward.Thank you for reading :smile:! Hopefully, this all makes sense.