Understanding Drasil's theory through observing its codebase

I've been running into a few problems with dependencies (and I've also caused one... see: drasil-code-base) because I'm unsure of where code (primarily regarding 'printing') should be placed. I've also never seen any references (in the code) of the SmithEtAl template we base our generated artifacts on. Additionally, with #2873 in mind, I felt that drasil-database, as a package, was a bit peculiar because it contained SystemInformation in the same area as the ChunkDB (which, in my opinion, shouldn't have any dependencies, nor be related to any chunks [arguably, other than itself]). Finally, our package READMEs and descriptions are a bit confusing to me, they don't really describe the package dependencies, but the main Drasil.md file makes sense, but appears to be outdated.

As such, with these things in mind, I am going to try to understand our packages, and how it relates to the foundational theory behind Drasil.

Shallow Analysis

First, let us start off by naively observing and analyzing the drasil-* packages:

drasil-build:
- Contains an encoding for the Makefile language, and a printer for the AST to Doc (pretty)
- Defines a Makefile AST, smart constructors for building up a Makefile, and a printer for the AST to be rewritten as a Doc (pretty).
- No dependencies!
drasil-code:
- Currently primarily contains functionality for tieing some things together with GOOL & drasil-buld.
- Contains a few printers for converting DataDefinitions and QDefinitions into specialized drasil-code data types.
- We should see if we can remove the dependency on drasil-lang, drasil-printers, and drasil-theory.
  - ... and if we can remove drasil-code-base entirely.
- We should work to clean up the package.yaml file. It currently contains a manually written list of module files, caused by at least 1 HS file going unused.
- Dependencies:
  - drasil-build
  - drasil-code-base
  - drasil-database
  - drasil-gool
  - drasil-lang
  - drasil-printers
  - drasil-theory
  - drasil-utils
drasil-code-base:
- Contains the definition for CodeExpr, and a few other things. It came to be as a half-measure to avoid a cyclical dependency between drasil-printers and drasil-code.
- Contains a printer for converting Expr into CodeExpr.
- We should work to see if we can remove it by understanding why drasil-printers relies on it, and why drasil-code relies on drasil-printers.
- Dependencies:
  - drasil-database
  - drasil-lang
  - drasil-utils
drasil-data:
- Strictly contains instances of chunks.
- We should work to clean up the package.yaml file. It currently contains a manually written list of module files.
- Worth considering splitting up into more packages drasil-data-physics, drasil-data-mathematics, etc.
- These packages should primarily contain theoretical models that aren't directly usable in systems, and various expressions and derivations. They would only be usable in systems if variables are specialized.
- Their variables used are rather "generic", and unintrusive to systems that import it.
- Dependencies:
  - drasil-lang
  - drasil-metadata
  - drasil-theory
  - drasil-utils
drasil-database:
- This package contains the definitions for ChunkDB and SystemInformation, and has a few 'helper' functions for working with the ChunkDB.
- Dependencies:
  - drasil-lang
  - drasil-theory
drasil-docLang:
- This package primarily contains functionality for a generic notebook language, but with special components for the "SmithEtAl" template. Specifically, it contains an AST for the "SmithEtAl" document template, and functionality for analyzing and forming it. It is a middleman between our Chunks and HTML/JupyterNotebooks/LaTeX.
- If we move the SystemInformation out of drasil-database, I think it would be good to also move the code from drasil-docLang alongside it because the code is highly-coupled to the "SmithEtAl" template. This isn't to say that it shouldn't be exposed however, I think it should be exposed so that other printers/template engines can also base theirs off of Dr. Smith's template (potentially the updated variant that Dr. Smith mentioned on Monday's discussion).
- This might mean we form a drasil-printers-smith-et-al package?
- Dependencies:
  - drasil-lang
  - drasil-data
  - drasil-database
  - drasil-printers
  - drasil-theory
  - drasil-utils
drasil-example:
- Just a folder carrying the examples.
- I think it would be appropriate to rename it to just examples just to move it away from the "fundamentals"/drasil-* namespace.
- However, let's not spend much time thinking about this (yet?).
drasil-gen:
- Extra functionality for tieing together "SmithEtAl" template + code, with a focus on designating which artifacts should be built.
- This could also be moved in together with a potential drasil-printers-smith-et-al package since it's highly coupled together with them.
- Dependencies:
  - drasil-lang
  - drasil-gool
  - drasil-build
  - drasil-code
  - drasil-printers
  - drasil-docLang
  - drasil-database
drasil-gool:
- GOOL encoding + printer for GOOL to languages (Java/Python/C#/C++/Swift) + printer for languages to Doc.
- It only relies on drasil-utils, and primarily for textual needs and list 'helper' functions. However, since drasil-utils relies on drasil-lang, this package also only builds after drasil-lang, when it really shouldn't be impacted by drasil-lang's priority in the GHC construction plan.
- Dependencies:
  - drasil-utils
drasil-lang:
- Contains encodings for:
  - Mathematical languages (which can become 'chunks' through containers, but are currently encodings): Expr/ModelExpr/Literals
  - Mathematical constructs: QuantityDict, QDefinition, RelationConcept, Uncertainty, ConstrConcept, ConstraintedChunk, etc
  - Symbols
  - Derivations (which I don't want to put next to mathematical languages because I think it can be made polymorphic)
  - Components of a Notebook/Document language: Partition, SecCons, Section, SecHeader, Content, Document, Notebook, TableOfContents, ListType, ItemType, etc (there are plenty, but I dont think they are all worth mentioning unless we are specifically investigating them)
  - Natural language: Sentence, NounPhrase, SentenceStyle, etc (again, there are plenty, but not all worth noting)
  - People
  - References & Citations
  - URIs (URI, Scheme, Authority, Port)
- It may be too large. It might be worth breaking this up into a few packages. In particular, I think we should have a package for the natural language, mathematics (and constructions), and documents. Of course, we can further decompose as we see fit/necessary.
- No dependencies!
- I would call drasil-lang the "root"/"base" package in Drasil since most other packages import it.
drasil-metadata:
- This is a very interesting package, but, at the moment, it contains very little files & information.
- I'm not quite sure of what we define as "metadata" in Drasil, but I wonder if we should be moving more things into it?
- Dependencies:
  - drasil-lang
drasil-printers:
- Contains a "General Science Printing" AST, with printers for: HTML, DOT files (I'm not entirely sure of what these are), and LaTeX, with incomplete encodings for Markdown, JSON
- Contains multiple printers for drasil-lang-things into it's own "General Science Printing" AST.
- A general printer for Expr/ModelExpr/CodeExpr/Literals, Symbols, Sentences, etc into the HTML/LaTeX/Plain text/etc
- Dependencies:
  - drasil-data
  - drasil-code-base
  - drasil-database
  - drasil-lang
  - drasil-theory
  - drasil-utils
drasil-theory:
- Contains encodings for InstanceModels, DataDefinitions, TheoryModels, GenDefns, ModelKinds, ConstraintSets, and MultiDefns
- Contains CIs (CommonIdeas) for IMs, DDs, TMs, & GDs
- Dependencies:
  - drasil-lang
  - drasil-metadata
drasil-utils:
- Contains "utility" functions for other packages to use.
- Currently contains a lot of constructors for Sentences, NamedChunks, NPs, Contents, and other data types local to drasil-lang.
- Since many packages depend on drasil-utils, they also, by extension, have a potentially unused dependency on drasil-lang.
- I think these constructors should be pushed into drasil-lang's source files because not all packages need drasil-lang files to be compiled before them.
- The end result would be a "utils" package which supplements Haskell's base package rather than any drasil-* package.
- Dependencies:
  - drasil-lang
drasil-website:
- Builds the website.
- Currently relies on SystemInformation, but it contains no models, inputs/outputs, math, and isn't intended to generate any SRS or code. The SystemInformation seems inappropriate to be used (hence the empty lists). This is also likely evidence for a need to split SystemInformation into different variants.

Slightly deeper, but still fairly shallow, observations, and discussion:

With a focus on observing the packages:

We have many "encodings" for things (ASTs, chunks, etc), and "printers" that either print "encodings" into other "encodings" or directly into artifacts (primarily, Docs at the moment):
- Notable examples:
  - Self-contained/base-level: Both drasil-build and drasil-gool contain no dependencies, but contain encodings, an AST, and a printer for their ASTs (into Docs). I would call them "near base-level" because, in the most obvious sense, they describe and produce end-user software artifacts. Of course, "base and higher" have a very different meaning when looking at encodings (they might mean different things, it's likely better to have relative terminology in the future). In some sense, other encodings might also be the target "end-user" artifacts, and we might call them to be "higher" than drasil-build and drasil-gool, so they can be thought of as both "high" and "low", it just depends on your scope because there might be encodings that sit above them too.
  - Compose others and prints external encodings into itself/higher-level?: drasil-code & drasil-code-base contain their own ASTs and encodings for various information related to code generation. They neatly tie together other ASTs and encodings from drasil-build,gool,lang, and theory as a part of another, larger, cohesive "printer" geared towards "generating software artifacts". It is an intermediary between lang & theory and gool & build with a larger goal of generating "distributable software". It's at a "higher level" than drasil-gool and drasil-build because it isn't intended to generate artifacts on it's own, but through their artifact generators.
  - Self-contained but can print external encodings into itself/?: drasil-printers contains both a general "science/math document language", and multiple printers for printing various data encodings from drasil-lang into it's document language. drasil-docLang is another example of this, where it contains another layer of knowledge above the general "science/math document language" but with a specific ordering (currently, seemingly coupled to the "SmithEtAl" template).
- Since our development cycle starts off with some vague goal that we decompose into it's pieces, I think it makes sense that we should consider how and where our encodings and printers should be written.
- So, I'd like to ask: where should we define how encodings get printed into other (likely lower-level) encodings?
  - With our self-contained examples above, it makes sense that they have their language encodings and a printer that converts their language encodings into raw artifacts (often, pretty's Docs [more on this later]). In other words, a (while still low) higher level encoding dictates how you can 'push out'/render it into a lower-level encoding.
  - With composed printers, it makes sense again to have a higher-level encoding that composes other lower-level encodings together via it's printer.
  - The question is regarding placement. Should the lower-level encodings contain information about how higher-level encodings can be rendered into lower-level encodings, or should higher-level encodings contain a property that dictates how they can be "viewed"/printed as lower-level encodings?
  - We currently do the former, but I think we should be following the latter, because:
    - The lower-levels of encodings are the most "stable" (they are the most basic units, with the least data density [they're primarily just strings of words, in some form or another]). With the "bottom-up" approach we follow, we would continuously remove from hard-coded data by abstracting over the details of like-data through creating re-usable higher-level encodings (to be specific, with the desire of declaratively generating the lower-level bits through printing the higher-level instances). As such, I believe it makes sense to treat them as expectations and properties of the higher-level encodings, which we can also better see when placed next to the higher-level encodings.
    - Now, obviously, I'm not saying that an Expr should dictate how it should be translated into HTML, but I do think that it should dictate how it should be translated into it's greatest-lower-bound (e.g., the mathematical printing language) w.r.t the intent of printing (e.g., it's GLB could be CodeExpr in the context of generating code). Then, depending on the use case, the mathematical printing language should dictate how it's being laid out into LaTeX, HTML [realistically, it's not laid out into HTML by itself; we primarily use MathJaX, so it's still LaTeX], or into the 'plain' mathematical print.
    - Less importantly,
      - Additionally, I think it will make Drasil-in-Drasil easier because we will have a better understanding of the translations of A encoding to B encoding, by understanding it as a property of As encoding [aside: it can be thought of as "a component of a kind of more general version of ModelKinds"]. In other words, this will become declared as a property, which we should be able to encode later.
      - With the latter option, we should be able to remove the majority of our .Development modules because we would be inverting many of our dependencies.
      - We should be able to remove drasil-code-base entirely, by merging it's contents back into drasil-code, cleaning up dependencies in general, and making "finding printers" generally easier.
    - Concretely, I think we should create a series of typeclasses for each encoding (e.g., "class CanProduceLowerEncoding t where toLowerEncoding :: t -> LowerEncoding", which the higher-level encoding which would use as an interface to describe how the "pushout" would occur (e.g., instance CanProduceLowerEncoding HigherLevelEncoding where toLowerEncoding = ...). We can also add extra parameters for these printers for each different "configuration" we want to see from printers. This might also assist in Dong's current work with the rendering styles for Linear DE Models.
drasil-database:
- With #2873, ChunkDB and maybe a few other chunks (realistically, I can only think of UIDs from drasil-lang, so it might be singular) become a self-contained unit, and they are a rather fundamental component to collecting knowledge/chunks. In which case, the dependencies for drasil-database are all for SystemInformation. I wonder if it's appropriate to move ChunkDB into a new drasil-core package on it's own (realistically, ChunkDBs are a fundamental component for all drasil "systems" & examples because they are where knowledge is collected for the top-most-level printer to use.). Alternatively, I wonder if SystemInformation should exist at all with the new functionality that the new ChunkDBs could provide, or if SystemInformation should be moved to another package...
  - REDUNDANT DISCUSSION: Assuming we were to move ChunkDB & UIDs into a new drasil-core package, this new package would contain strictly the fundamentals for "knowledge management". It's not necessarily fundamental to the theory behind Drasil, but it is an important component, nevertheless, in practice. This would leave SystemInformation as the only construction left inside of drasil-database...
drasil-database's SystemInformation & drasil-docLang are both seemingly connected by a common denominator; the code and the SRS documents (the template):
- It appears that the SystemInformation is the top-most-level encoding for the SmithEtAl template printing. It is used to print out an SRS, and to print out/generate code.
- Additionally, drasil-docLang contains a document language and a lot of components that are highly coupled with the SmithEtAl template. I wonder if we should be making drasil-docLang a slightly simpler document language in favour of moving the parts that are more coupled with the SmithEtAl template to a new drasil-smithEtAl package. This would potentially allow us to create other flavours of the document, or Dr. Smith's latest variant that he mentioned in our last meeting. The very nice functions used to build up an SI (SystemInformation) could also be used to restrict allowed "Chunks" into a "system" (undefined).
  - This would solve (2).
- In either case, then drasil-lang should be replaced as the "root" package by the either the new, potential, "drasil-core" package, or the slimmed drasil-database package.
drasil-printers:
- We have many encodings and printers:
  - DOT: contains a DOT file encoding + various printers specifically intended for usage in the SRS
  - Printing: A general printing language for printing mathematical expressions and sentences + various printers for generally mathematics-related chunks from drasil-lang into it.
  - HTML: contains an HTML (w/out CSS) encoding + various printers, again, specifically intended for usage in the SRS
  - JSON: missing a JSON encoding, but contains bits and pieces of a Markdown and HTML encoding + printer. I think this is an active work-in-progress, so let's not spend much time on this right now.
  - Log: contains "dumping" mechanisms for dumping various chunk maps into Docs. I think this should be moved into drasil-database, alongside ChunkDBs [this was actually a part of my intended design for ChunkDBs] because it should be completely chunk-agnostic.
  - Markdown: missing a Markdown encoding, but contains raw Markdown for generating the existing READMEs used for our generated README.md software artifacts next to some generated code.
  - Plain: A 'plain' printer for various data types. I believe this primarily sees usage in normalizing symbol names, expressions and such related for usage in code generation.
  - TeX: Printing methodology for the printing language into TeX/LaTeX.
- It would be good to make internal encodings for JSON/HTML/CSS/Markdown so that we can lay other things into them as well instead of having to manually write these printers for other languages as well. Additionally, in general, we should be following the same guidelines as discussed above in (1).
drasil-lang:
- ...is mostly the "root" package (directly or indirectly) for most packages.
- It is always the first package to be built when building Drasil's entire codebase.
- With #2873, drasil-database, I believe, will (and should) replace drasil-lang as the root package, assuming we move UID from drasil-lang into drasil-database (assuming we choose to keep ChunkDB inside of drasil-database).
The "SmithEtAl" template:
- Components are currently scattered across: a. drasil-docLang, in the form of special attention to the sections of the "SmithEtAl" template (also, deals with SystemInformation). Since I'm primarily thinking of the "SmithEtAl" template as a template for "software requirements of scientific software" and I haven't had much exposure to too many other templates, I might be extending my own definition a bit too far, but there are still specific hard-coded components for general SRS documents, and the format we adhere to. I might be wrong, but I think we can still decompose further, to further add abstraction/ambiguity to the relationship or to allow for different printer configurations. However, the fact that we don't have any sort of "main entry point"/module/subpackage/package for containing the "SmithEtAl"-related code, but we generate SRS documents adhering to the template should indicate a possible coupling issue. b. drasil-printers, in the form of all of the printers containing code which is specific to SRS documents (and, since we realistically only generate "SmithEtAl" templated documents, it's likely primarily for the template) c. drasil-code, in the form of "composing printers" in drasil-printers with drasil-gool and drasil-build d. drasil-database, in the form of SystemInformation
- It might be beneficial to try to unify all of these components above into 1 single package to start. Afterwards, we should try to decompose it into smaller packages. As of right now, it seems there is high coupling. Through this, we should be able to generate different variations of the template and other artifacts.

Again, slightly deeper observations

At a sky-high level, everything is an encoding of either data/phenomena, or a translation of knowledge in encodings (often one-way -- "printing").
- Somewhere a bit lower, we can try to categorize the components of our packages into one of: a. Encodings that translate things into phenomena (e.g., "end-user" artifacts) <- "low" knowledge density due to it being an abstraction of "phenomena" b. Encodings that translate/compose some group of encodings into other encodings <- "medium" knowledge density (realistically, a relative, or even false, sense of depth) c. Data encodings of encodings (including encodings of maps of encodings [e.g., ChunkDB]) <- "high" knowledge density (again, realistically, a relative, or even false, sense of depth). This would also contain properties of "pushouts" ("lower" encodings) as "views" of the higher encodings.
- We should try to define the code in each package as belonging to one of these 3 types. This, I believe, would show stricter adherence to the foundational theory of "well-understood" domains of knowledge & Drasil in general.
Pretty's Doc is still a phenomena to Drasil. The same goes for all imported libraries we use. It might be difficult, but we should consider not having any imports, but building all things from scratch (this effort would certainly not go to waste, because there will surely be a domain where these encodings are a part of the domain, and we shouldn't be constrained by using other libraries which we might not be able to edit easily). Afterwards, the final actions of the impure "IO"-related things will be the final phenomena (which I'm unsure of how we can sufficiently teach Drasil). Finally, through this, we will have a better understanding of how Drasil will need to, eventually, describe Drasil.

Final Observations

A "system" has a different meaning with respect to each intended usage of a knowledge base. I think we shouldn't describe a "system" ourselves, but instead, we should consider letting the "printers" describe it, themselves, through their requirements. This is because we can think of each printer as a system unto itself (a cohesive network of knowledge). Note: we might only really care for the "larger" systems that "do something in the end" (often, these will take in a ChunkDB/knowledge-base and do something with that), but I don't think that should disambiguate or diminish the openness of what a "system" is, as defined in common dictionaries.
- Taking the "SmithEtAl" "system" as an example using the SystemInformation as the base entry point to the template, the requirements would be as follows: a. Knowledge-base must contain a list of authors b. Knowledge-base must contain a purpose c. Knowledge-base must have Input variables and Output variables (instead of directly placing these as QuantityDicts, we should place these wrapped in their own Input and Output data wrappers so that we can pull them directly from the ChunkDB) d. Knowledge-base must contain output constraints (these are very nice!
- These requirements would be imposed by checking that admissible "knowledge"/chunks are registered inside of the carrier "KnowledgeBase"/ChunkDB. Then, we can have "systems" that impose "this knowledgebase should only include 1 X, 2 Ys, any amount of Zs, no As, etc". Of course, this is heavily reliant on the proposed ChunkDB design I proposed in #2873.
- If we want to restrict only to the larger systems that act on ChunkDBs, we can do that too. This would be a stricter definition, where requirements are imposed by gathering "the correct types" from the system (in other words, we would be bunching up our knowledge/chunks by their type representations [TypeReps from Data.Typeable] and imposing restrictions based on those found in the ChunkDB). Then, the "process" component of the System interface ~~ class System where process :: ChunkDB -> IO () (this would probably be different, but it should paint the right picture) should be fairly straightforward.
- Classification of systems can potentially be gathered through creating generic requirements applicable to each system that they all must contain. They might become more specific versions of classifications through having extra requirements.
Assuming point (1) of "Again, slightly deeper observations" is "good", then we should consider building graphs of our knowledge encodings and the translation paths automatically (through somehow creating encoding printers). This should help one with understanding the infrastructure; it would make understanding why many of Drasil's encodings (DSLs and all) exist, as part of the larger Drasil "system."
Compared to other compilers, our "intermediate representation" of knowledge allows us to get a lot more out of the lower-level compilers, and, with enough effort, to completely supersede them. Ultimately, it seems we get a lot more, for a lot less work, than standard software development/usage.

Thank you for reading :smile:! Hopefully, this all makes sense.

Impressive work @balacij. Your observations seem on point to me, but @JacquesCarette is a better judge of the future direction of the design of Drasil. We should make a point of using the knowledge in this issue. In particular, the shallow analysis seems like a summary that should find its way (in some form) into one of our Wikis.

Thank you, @smiths! :smile: Hopefully so, I think some of the "shallow analysis" could also go into the main package.yaml files and the README.md files too.

Huge amount of information here. And lots of good questions. So, to be able to eventually close this issue, I'm going to spin off a bunch of issues, each of which is related to what's here, but contains more 'actionable' material. When it is more 'purely informational', I'll make comments here. Eventually, we may want to extract the knowledge from here and put it on the wiki and/or in the READMEs.

Thank you, that sounds like a great idea!

On drasil-data: it is a "database", done as a set of Haskell files that contain only declarations of 'chunks'. The 'chunk' part is not so important, the important part is that it uses a host of different encoding data-structures.

Furthermore, it spans from simple knowledge to rather complex knowledge (i.e. theories), with everything in between. There is also a cross-cutting arrangement where the 'knowledge' encoded comes from many different application domains.

Right now, we don't know how to organize this. For sure, internally to drasil-data, it's partly organized, partly a mess. To detangle things, we should understand our own knowledge encodings well enough to understand exactly what kinds of "level mismatches" we've created. We also need to understand what classification seems to be the most natural to use -- level? application domain? both? neither?

In other words, I don't think we're even close to ready to do something sensible with it.

where should we define how encodings get printed into other?

An excellent question indeed! What you are witnessing here is a very long evolution, much of it in parallel, of various pieces. And you have keenly observed is that there's almost a pattern to it all, with emphasis to almost. Better yet, you've distilled some potential arrangements, and given your opinion about pros and cons of them.

The question is regarding placement. Should the lower-level encodings contain information about how higher-level encodings can be rendered into lower-level encodings, or should higher-level encodings contain a property that dictates how they can be "viewed"/printed as lower-level encodings?

That is indeed an excellent question. I don't fully see examples of the lower-level dictating to the upper ones how to do their jobs (specific examples would be great), but your point stands: the XML+CSS/XSL design is one that has proven its worth. To spell it out: it is good to have a 'semantic' data encoding (XML) that is coupled with instructions on how to render. These instructions can have all sorts of options, which in turn are all built on top of a single rendering language. In theory, the language in drasil-printers is meant to be an "abstract rendering language" which can then be 'lowered' further to HTML, LaTeX, etc.

On drasil-database and SystemInformation: the fundamental problem here is that we don't have a good explanation of what these ought to be. And certainly they were created before our current understanding of many things.

For example, we have all sorts of type classes for our "chunks" having all sorts of data. And then higher-level routines can be polymorphic by, instead of asking for specific representations, can ask for any representation that has the data needed.

Our 'artifact generators', at the highest level, should work the same way: take a data-representation that has all the needed information, pulls it, and does its job.

Our current design is instead to create one monster representation, which we could call KitchenSink instead of SystemInformation, which has everything any part could possibly want. So our current process is thus

collect all the information
stick it all in one place
pass it to everyone That worked for a while, but is now fraying. All 3 pieces suffer, in different ways, from this monolithic design. In particular, it is hard to have automation that derives new information from old.

Instead, the first step we need is an understanding of the requirements of each artifact that we're going to generate. [Not globally for all, locally for each we treat.] Then an encoding of the information needed to actually generate those artifacts. Note that this links in with the above question of 'style choices' that the printers want.

That could then lead to a proper decoupling of various pieces, so that we could properly re-use Drasil infrastructure for drasil-website without going through the currents hacks. But don't have the exposed API for all the parts that would allow us to do that. In part because too many parts use the KitchenSink approach.

There's a whole lot more content in this issue, but I think the spun-off issues, and the comments above, already are a lot of work. So when that flurry dies down, a revisit (or two or three) will be needed.

Re: drasil-database:

In other words, I don't think we're even close to ready to do something sensible with it.

Sounds good, I can see why. Hopefully it will become more clear later on.

An excellent question indeed! What you are witnessing here is a very long evolution, much of it in parallel, of various pieces. And you have keenly observed is that there's almost a pattern to it all, with emphasis to almost. Better yet, you've distilled some potential arrangements, and given your opinion about pros and cons of them.

Thanks!

The question is regarding placement. Should the lower-level encodings contain information about how higher-level encodings can be rendered into lower-level encodings, or should higher-level encodings contain a property that dictates how they can be "viewed"/printed as lower-level encodings?

That is indeed an excellent question. I don't fully see examples of the lower-level dictating to the upper ones how to do their jobs (specific examples would be great), but your point stands: the XML+CSS/XSL design is one that has proven its worth. To spell it out: it is good to have a 'semantic' data encoding (XML) that is coupled with instructions on how to render. These instructions can have all sorts of options, which in turn are all built on top of a single rendering language. In theory, the language in drasil-printers is meant to be an "abstract rendering language" which can then be 'lowered' further to HTML, LaTeX, etc.

Thanks. The lower-level encodings don't really dictate the conversion "well"/uniformly because they write the instructions nearby under nearby "floating" functions (it's not as "uniform" of a pattern as much as a function belonging to a typeclass).

A good example can be drawn from drasil-printer's Language.Drasil.Printing.Import.*.

Specifically, we can see .Space: https://github.com/JacquesCarette/Drasil/blob/a1c22b739c958ae169e8fe4fb2ea2b0a670dc7df/code/drasil-printers/lib/Language/Drasil/Printing/Import/Space.hs#L14-L18

Unit symbols .Symbols: https://github.com/JacquesCarette/Drasil/blob/a1c22b739c958ae169e8fe4fb2ea2b0a670dc7df/code/drasil-printers/lib/Language/Drasil/Printing/Import/Symbol.hs#L38-L43

These both are floating, but if we had a typeclass:

class CanGenMathExprs t where
    toMathExpr :: t -> Printing.Expr

we would just use a common toMathExpr for any applicable type for which it's defined. Alternatively, I guess we can try to parameterize further with an output variable, and another for "named" typeclass instances;

class CanGen i o ctx where
    ctxPrint :: i -> o

instance CanGen Math.Symbol P.Expr 'SomeCtx where ctxPrint = _

I think we would just need type applications to access the ones for a particular 'context'. I'm not too sure how helpful this variant would be, I thought it might be an interesting way to get different 'printing' styles. Embedded 'printable' things in other 'printable' things means that the single type parameter used would need to be something each embedded 'printable' type would need to be defined for, and it would also be implying that a single type would need to carry enough information for the existing "PrintingInformation" for all embedded types. To overcome the ChunkDB part, we would just add it as a parameter to ctxPrint, but the other components might get messy for various combinations, or when PrintingConfiguration's size increases for extra configuration options. This would be a potential option for @cd155 primarily for the different styles in printing ODEs, I think, but I'm completely unsure if it's a good idea or not, it will require a bit more investigation. The interesting thing about SomeCtx is that it could be a whole "style" of printing things in a layout (e.g., SRS variants, etc, or subvariants of certain SRS variants, etc).

In theory, the language in drasil-printers is meant to be an "abstract rendering language" which can then be 'lowered' further to HTML, LaTeX, etc.

Yes, taking this example, drasil-printers would become a package with, strictly, it's own encoding of it's "abstract rendering language" and an "instance" of some typeclass, as above, which "lowers" it into HTML/LaTeX/etc. The dependencies of drasil-printers would minimize to just the packages for the HTML/LaTeX/etc-generation, while drasil-lang would gain a dependency for drasil-printers as it would also need to "instantiate" a typeclass for lowering things into the abstract rendering language of drasil-printers.

On drasil-database and SystemInformation: the fundamental problem here is that we don't have a good explanation of what these ought to be. And certainly they were created before our current understanding of many things.

Would you say we have a good explanation of why ChunkDB exists? To my knowledge, I understood that within drasil-database, it was only SystemInformation that we didn't have a good definition for (this was one of the specific examples in #2195). With my version of ChunkDB (from #2873) and UIDs, it would make sense, to me, that they would be a core bundle that would be the "least"-required components for Drasil to be used (e.g., all components would rely on it in some way [registration in a knowledge-base for usage, printing, etc]). As such, they'd become the only datatypes in drasil-database and SystemInformation would be removed or moved elsewhere, or they would be moved, together, into a new drasil "core package/drasil-core.

For example, we have all sorts of type classes for our "chunks" having all sorts of data. And then higher-level routines can be polymorphic by, instead of asking for specific representations, can ask for any representation that has the data needed. Our 'artifact generators', at the highest level, should work the same way: take a data-representation that has all the needed information, pulls it, and does its job.

If I'm understanding you correctly, I believe that this is the design I'm also thinking of with my above discussion of "reversing dependencies" and "forcing printing qualities to be properties of the higher-level encodings".

Our current design is instead to create one monster representation, which we could call KitchenSink instead of SystemInformation, which has everything any part could possibly want. So our current process is thus

collect all the information

stick it all in one place

pass it to everyone That worked for a while, but is now fraying. All 3 pieces suffer, in different ways, from this monolithic design. In particular, it is hard to have automation that derives new information from old.

Would we be able to remove SystemInformation completely in favour of using cast more often to assisting in grabbing chunks from ChunkDBs instead? This would allow us to place completely "foreign" types into a ChunkDB, and since we're using TypeReps to grab data en masse (e.g., for a specific type) and UIDs+TypeReps to grab singular data instances. This would be helpful for new user libraries that build on Drasil but are not upstreamed.

Though, I'm uncertain of 2 problems:

Is Data.Typeable (.., cast) usage problematic in any way? It looks like a safe version of unsafeCoerce but it might still be anti-pattern.
Why does a Systeminformation contain 2 kinds of ChunkDBs; "sysinfodb", and "usedinfodb"? It seems like a printer of a chunkdb should know which chunks are "sysinfo"/"used"/etc for itself by treating those chunk types as "relevant or not" to their printing goal.

Instead, the first step we need is an understanding of the requirements of each artifact that we're going to generate. [Not globally for all, locally for each we treat.] Then an encoding of the information needed to actually generate those artifacts. Note that this links in with the above question of 'style choices' that the printers want. That could then lead to a proper decoupling of various pieces, so that we could properly re-use Drasil infrastructure for drasil-website without going through the currents hacks. But don't have the exposed API for all the parts that would allow us to do that. In part because too many parts use the KitchenSink approach.

Perfect, this will be great for drasil-lang / #2885. I think this also is relevant to my discussion in my last comment (above this one, regarding CanGen).

There's a whole lot more content in this issue, but I think the spun-off issues, and the comments above, already are a lot of work. So when that flurry dies down, a revisit (or two or three) will be needed.

Sounds good.

These discussions are getting too big - it would make sense to start new issues when the commentary is going to be more than just a few lines.

Re: printing and classes like CanGenMathExprs - maybe. I've got some emails about that from multiple years ago. It's actually kind of a tricky design point where Haskell classes don't always quite fit. So it needs a proper design, which means that it first needs a proper analysis. Certainly it's not worth creating classes (in general) if there is only a single instance of it. I'll also email you some design discussion on that topic from a while back.

Re: drasil-database, etc.

We have a currently adequate explanation of ChunkDB: it's a container of things with UIDs. Note that that's probably the full explanation as well, which is perhaps unsatisfactory!

I think SystemInformation was supposed to be just as its name implies: the necessary information for a "system". It was supposed to be arranged so that various different kinds of information would be assembled in it, and we would know a priori what that information was (thus the many fields). But it never quite got there. The rationale for each fields has been lost, and that rationale cannot be adequately reconstructed. So it is probably best to get rid of it, until such a time as we can get a decent explanation for what it ought to be, and even if it ought to exist.

Would we be able to remove SystemInformation completely in favour of using cast more often to assisting in grabbing chunks from ChunkDBs instead?

Here I'm less happy with the question: it mixes design and implementation/solution too much. The actual content of the question may be fine though, in that maybe cast could end up being part of the implementation of a good design.

To a certain extent, cast is an anti-pattern. In the sense that Haskell is great because of its static typing, and cast works firmly against that grain. unsafeCoerce is the same and different: if used wantonly, it's an anti-pattern. If used as part of a dedicated optimization pass in some efficiency-critical low level layer, that's quite different. Same with cast: it might be useful because the situation is fundamentally dynamic, or Haskell's type system is not up to expressing the types in an ergonomic way.

If I recall, sysinfodb was meant to be where we assembled all the information that might be used in a system, and usedinfodb was the stuff that was actually used. So things reachable from usedinfodb should appear in the glossary, table of symbols, etc; but its mere presence in sysinfodb wouldn't trigger anything like that.

Re: moving to new tickets: Will do. I will refrain for this response because I think it will be relatively short. The CanGenMathExprs will almost definitely need to go into a new ticket, however.

Re: SystemInformation: Sounds good.

Re: Removing SystemInformation: I understand. I very briefly mentioned replacing the lists in SystemInformation in #2873, but I didn't expect that this would become the conclusion (assuming it is).

Re: cast: That makes sense, thank you. The dynamic nature of what I was planning, was to have them be as open as possible, which might even be too open. It's interesting. I guess this can be something we revisit soon if it becomes problematic, but, immediately, it looks like a robust solution.

SUPER minor note here, but from our conversation in #3711, a quick Google search seems to indicate that "codebase" as one word is most common, but this isn't recognized by my Firefox spell checker. "Code base" also seems to be correct and the second most common, with "code-base" taking up the rear. @JacquesCarette

JacquesCarette / Drasil