Truebase-com / TruthStack

Monorepo for the Truth technology stack.
10 stars 1 forks source link

Implement Backer #6

Open paul-go opened 5 years ago

paul-go commented 5 years ago

This is an umbrella issue that should give a high-level overview of the work that needs to be done on the backer library. In time, this should be broken up into smaller issues.

1. What Is Backer

Backer is a compilation back-end for Truth. Truth is a powerful system for describing and validating structure. However, to actually create something useful from Truth, we need to analyze it, and emit some artifact from the source code that serves some purpose.

Most compiler backends transform some AST format into an executable binary, or something similar that can be executed all by itself. Backer is a bit different in that it doesn't emit stand-alone executable binary, but rather, it emits a JavaScript / TypeScript library that can be embedded in some other piece of software These emitted libraries have the following functions:

2. Emitting Type Definitions

The first order of business will be to emit TypeScript type definitions, which are derived from the input Truth file. This may seem backwards ("why would we need to emit type definitions for a library that doesn't even exist?"). However, the most important thing is to get to a point where the emitter produces natural and obvious developer-facing APIs. This is also where the highest technical uncertainty exists, and so it needs to be dealt with first. The specifics of how the library actually behaves internally is less important.

The first block of work here will be to formalize some baselines, which will eventually be converted into the system's tests. We'll need to cover the entire gamut of features in the Truth language, and decide how the corresponding definitions would be generated.

We should likely strive for a line-for-line correspondence between the Truth code and the corresponding definitions. Line N of the Truth file should conceptually correspond to the same line N in the definition file. The reason for doing this is because it's highly probable that this will result in drastic performance optimization opportunities when dealing with incremental compilation.

If we know that a change occurred to specific line in a Truth file, we can (probably) guarantee that the only code that needs to be updated in the definition file is that same line.

Below is an example of what the Truth => Definition mapping could look like:

String
Number
/[a-z]* : String

Action
    Name : String
    Description : String
    Dependencies : Action...

Bug : Action
    Level
    Severity : Level
        Min : Number
        Max : Number

The corresponding TypeScript definition would look like:

// String is ignored
// Number is ignored
// Patterns are ignored

declare const Action: { new(): Action }; declare interface Action extends Type {
    Name : String
    Description : String
    Dependencies : Action[] } 

declare const Bug: { new(): Bug } & Bug; declare interface Bug extends Action, Type { 
    Level : { new(): Omit<Bug["Level"], "">;
        Amount : Number }
    Severity : Bug["Level"] & {
        Min : Number
        Max : Number } }

The bizarre formatting of the emitted TypeScript definition is intentional.

The list of Truth type declarations that will be ignored will be simply just the primitives in declared in TypeScript:

Whether or not we should ignore JavaScript built-ins such as Date and RegExp will be decided later.

On Incremental Compilation

The emit needs to be incremental, because it will be plugged into an editor at some point, and we can't be re-generating (potentially thousands of) definitions on each keystroke. This also means that the file could potentially be in a broken state when the emit occurs. The Truth compiler API already does a good job of abstracting these cases (by simply not reporting annotations that don't validate), so hopefully no additional work should be necessary in the definition emitter.

3. Emitting Persistence Layer Abstractions

Once the definitions are emitting properly, the following step will be to emit Persistence Layer Abstractions. PLAs are JavaScript objects paired with constructor functions that loosely correspond to the physical representation of data persisted somewhere. We should strive for a persistence API that is as invisible as possible, which can be done by hiding the persistence operations within the semantics of JavaScript itself. Consider the following theoretical examples:

const customer = new Customer();
// Customer object created, and stored in the persistence layer

customer.name = "Bob";
// Customer object's .name property updated with the value "Bob" in the persistence layer.

There are practical limits as to how far this can be taken, as well as design decisions that must be made before the full implementation is finalized, especially around the handling of arrays. For example:

Linking In Meta Data

In order to provide rich introspection capabilities of the data model, there should be a convenient way to access information about the type relating to an instance. For example, if we have an instance foo that is an instance of type Foo, it should be easy to get Foo from foo. With TypeScript, this can be done by statically typing the .constructor property of the emitted constructor function. For example:

class A
{
    ["constructor"]: typeof A;
    static property = "value";
}
const a = new A();
a.constructor.property; // Validates in TypeScript

More research needs to be done here to determine how exactly the .constructor property should be typed. For example, if there is a complex base graph of inheritance above the A type, do we merge all these in with intersection types? For example:

class BaseA
{
    static a = "a";
}

class BaseB
{
    static b = "b";
}

// Assume "Child" actually extends from BaseA and BaseB
// It's undetermined how the emit would actually work in TypeScript.
class Child
{
    ["constructor"]: typeof Child & typeof BaseA & typeof BaseB;
    static c = "c";
}

const child = new Child();
child.constructor.a === "a"; // true
child.constructor.b === "b"; // true

Prototype Chain vs Base Graph

JavaScript has a concept of a prototype chain, which is a single, linear line of prototype objects that specify the lineage of a particular object. This works for single-inheritance programming models. However, Truth is not a single-inheritance model. It's an unrestricted multiple inheritance model, which forms not a straight line of bases, but rather an entire DAG of bases (which we call the "Base Graph"). This presents a bit of an issue when trying to support accurate instanceof behavior. For example, consider the following Truth that creates a simple multiple inheritance hierarchy:

A
B
C : A, B

Presumably this would emit JavaScript constructor functions, allowing for the following code to work as expected:

const a = new A();
const b = new B();
const c = new C();

However, if we try to make instanceof work simply by using the single-inheritance supporting facilities built into JavaScript, there's no way we can achieve complete accuracy:

c instanceof A === true; // This could work
c instanceof B === true; // But we couldn't *also* make this work

The only solution I can see is to hack the emitted constructor functions by using Symbol.hasInstance, and provide custom behavior that returns true or false after manually inspecting the base graph. Symbol.hasInstance isn't supported in JS engines <= IE11, but I think it's the best I think we can do.

4. Backer Search API

(This part needs specification)

5. Backer Database

As mentioned above, emitted Backer libraries must include the ability to interact with data residing in some persistence layer through the use of PLAs. The actual system that will be used here is still undetermined, but SnapDB and FlexSearch look interesting. LevelGraph may also play some role.

The architecture needs to support pluggable data sources. For example, at some point, the system will need to cross-compile Backer searches into GraphQL queries in order to be executed on a DGraph cluster.

(This part needs specification)

6. Backer Data Protocol

Also included in the emitter Backer library is a means to parse, validate, and generate code in the Truth Data format. The specification of the Truth Data format is yet specified. The primary goal of the Truth Data format is to allow a potentially unlimited amount of Truth to be streamed into a validator, which validates the code and generates events. This is in contrast with the current Truth compiler, which requires the entire block of Truth code to exist in memory first in order for the validation process to begin.

I'm current envisioning the Truth data parser as being a parser that is generated from some input Truth schema file. The generated parser is then included in the emitted Backer library, and accessed through some provided utility functions. A generated parser can probably vastly outperform a general one in this case, because such a parser wouldn't be parsing general Truth code, but rather would be expecting Truth code conforming to a very specific schema.

In order for this to work, the Truth Data format will be a reduced subset of the features available in the broader Truth language. Namely:

  1. No fragmented types
  2. No creation of new regular expression patterns (aliases still work)
  3. No URLs (which are on their way to deprecation anyway)
  4. No new type definitions, all structures must be defined in some Truth schema file somewhere.
  5. No unions other than at the root scope.
  6. Possibly other limitations.

I envision Truth Data code as being primarily a binary representation of Truth, possibly with a textual counterpart. I'm referring to this as a "protocol", because this compressed and streamable nature will make it ideal for exchanging data between networked endpoints.

(This part should be broken off into a separate issue)