RFC: dgeni.next - Githubissues

btford commented 9 years ago

Here are my thoughts on the next version of dgeni.

Immutable data structs

I think these should have the additional restriction that you can only set keys on an object; you can't delete or modify them.

I think using Object keys instead of string keys would improve refactoring.

If we're using ES6 module syntax, and tokens to access keys in steps in the processor, we could even go as far as using static analysis to determine the partial ordering of steps in the pipeline.

Transducers

See: http://jlongster.com/Transducers.js--A-JavaScript-Library-for-Transformation-of-Data

This should be an implementation detail, and invisible to end users. tl;dr– we can avoid some intermediate work.

I think this implementation is most promising: https://github.com/cognitect-labs/transducers-js

Building the pipeline: Constraint solver vs. explicit ordering

There are two ways to build up a pipeline:

specify prereqs and find partial order (what dgeni currently does)
build the pipeline yourself (what gulp does, for instance)

We should consider the second approach. But we should be weary of repeating gulps mistakes– we should not re-run common steps if they appear more than once.

Instead, we should use explicit pipelines to do partial ordering when "merging" pipes.

TypeScript

We should write it in TypeScript. Types are a-ok.

Developer experience

Debugging should be easy
We should have a way to dump out the order of all the tasks
Templates

There should only be one set of templates used at the end of the generation process. Currently, there's some inheritance system that makes it hard to determine which template is actually used.

petebacondarwin commented 9 years ago

@btford: Thanks for getting this started. Here is a quick brain dump of my thoughts:

Async Processing

Currently, all processors can also be async, which means that they can return a promise to the docs collection. In terms of the Transducers approach above, I wonder if we would be better to make use of some functional reactive programming library such as https://github.com/Reactive-Extensions/RxJS or https://github.com/baconjs/bacon.js?

Processor Ordering (Pipeline building)

I think this is a tricky problem. I played with a few different methods when originally designing Dgeni.

The problem with explicit ordering is that there are some packages, such as examples, which need to have to processors that run at particular points in the pipeline - i.e. collecting up the tagged examples and then injecting the rendered examples back into the final output. This is an implementation issue for that package and the user of the package should not have to concern themselves about this. In the gulp scenario one would need the user of the package to ensure that these two processors were added at the correct point.

I am not sure how many use cases there are that require this. The primary one being related to inline tags. If it were possible to refactor how this worked so that we no longer needed the user to be responsible for this ordering then, I believe that it would be cleaner and clearer to use an explicit ordering.

(By the way, my event hooks PR also includes a new method called dgeni.info() which dumps out the full processor list in the order in which they will be run.)

Docs are not the only fruit

As the complexity of the implementations of dgeni increased, I had to start adding services that contain additional information about the docs. For instance, the moduleMap, which is a Hash of module names to docs, and the exampleMap, which is used to hold tagged examples that are parsed from the source content. Even more complex is the aliasMap, which is used extensively when resolving links to access a doc by its id and its aliases.

I began to wonder if what we really needed was a concept similar to the BrocolliJS Trees. This would allow us to demote the status of docs to just another collection (or tree) of data that is to be manipulated.

Strongly Typed Docs

Another idea I have been playing with is the concept of each doc having a well defined type, perhaps defined as a class in TypeScript, rather than simply relying on the docType property of a doc and hoping for the best with regards to the other properties that will appear.

In this sense, some processors, whose job is simply to attach new properties to a doc, would be more like creating a new doc from previous doc(s), which had a new type, but in some way inherited from the previous doc(s). An alternative way of thinking about this would be that processors attach traits to a doc as it passes through the pipeline. This might also play well with the idea of immutable docs.

This typing would allow the developer to reason more clearly about a document when it arrives at a particular processor or when trying to debug how a doc came about.

That being said, I am a little fearful of over constraining docs with such as type system, causing more bloat for not much benefit.

Conditional Processing

Another thing that I have noticed as dgeni has evolved is that many processors are only interested in a particular kind of doc and processors waste time iterating through the docs, filtering out the ones that they care about. For instance, in the angular.js docs, the guide docs have a significantly different processing life cycle to the API docs.

This also means that often the pipeline is unnecessarily long for a number of types of docs and it is difficult to see what impact each processor has on different types of docs; particularly if you are not diving in and looking at each processor's source code.

I wonder if we could consider the idea of multiple pipelines that split apart for distinct processing and maybe merge back together later for shared processing (such as rendering or writing to files). This might be something that would work well with the reactive functional programming approach.

Template Resolving

I actually really like that templates can be overridden and inherit from each other but I agree that we need a better way of communicating what templates are being used for a particular document.

TypeScript

I would like to have a fair debate about the use of TypeScript. Would we require that all Dgeni components (packages, processors, services, etc) are also written in a "typed" fashion? If we are going to add in a build step for dgeni (i.e. TypeScript -> JavaScript) then should we consider whether a completely different programming language and platform (say Go or Ruby) would be preferable?

btford commented 9 years ago

Async Processing

Transducers also work async; we could even use them with Rx. Please spend some time with them– the documentation isn't intuitive, but I think transducers are exactly the correct abstraction.

Processor Ordering

Explicit ordering (with merging) handles cases like tags. It's a bit tricky to explain, but the ordering communicates the prerequisites. The advantage is that it's easier to read the explicit ordering because it's all in one place. Basically, a package should specify an array of refs to steps:

[step1, <ref to imported step>, step2, step3]

Which is then "de-sugared" to the sort of "run before, run after" properties currently on each process.

Template Resolving

I have never seen the inheritance/overriding/default feature used profitably. Is there a good example of the overriding feature making things easier?

I like that we package some templates with each package so they can be used as a springboard, but that they are wired up and ready for use has only caused me grief. Even when I understand how the overriding works, I have to look in 3-4 spots in the code to figure out what will happen at runtime.

In most cases, there are only a handful of top-level templates (excluding partials, etc). I don't think making the templates explicit introduces much boilerplate.

Strongly Typed Docs

:+1: for explicit interfaces for docs.

Again, if you use objects for DI tokens, you can statically analyze plugins to automatically produce the ordering information. So simply by saying your plugin wants the aliasMap, dgeni can ensure that your plugin is run after the plugin responsible for generating that resource.

Docs are not the only fruit

Not sure what the "tree" architecture has to do with docs; I think the linked API makes little sense for dgeni.

I do agree that a doc is just one type of "thing" in a big collection of things that dgeni needs to process.

I think what we want is for a dgeni process to be able to produce a "thing" that you can inject into a successive processor. That would handle these different "maps" quite elegantly. Perhaps each process has the option of creating a child injector used for successive processors. That could be powerful.

Conditional Processing

Have you profiled to see if this is really that costly? Also transducers (if we chose to use them internally) can largely eliminate this cost– they'd only apply the filter operation once in most cases.

I'm skeptical that this is really worth complicating the architecture for– most of the time in the app is probably spent running the processor's function body, not djeni doing upkeep. Unless there's good evidence to the contrary, I wouldn't worry about it.

TypeScript

I see no reason for us to require plugins use TypeScript. Why not publish a distributable that's compiled to JS like other libs written with TS?

I'm not opposed to dgeni 2 being written in an entirely other language, but it does presumably raise the barrier to entry for contributors.

gampleman commented 9 years ago

I thought I would chime in as I'm using Dgeni in a perhaps slightly different manner than it was originally intended and hence I've been working on my own ecosystem of packages rather then using dgeni-packages (the thing I'm talking about is here.

Processor Ordering

The problem with explicit ordering is that there are some packages, such as examples, which need to have to processors that run at particular points in the pipeline. [...] I am not sure how many use cases there are that require this.

I use that feature all over the place, since my package design relies on much smaller modules than a typical dgeni installation.

Template Resolving

I think one option is to have templates as an injectable component. This handles the usecase of the user overriding them from upstream packages and is the same mechanism how they can override any other dependency.

Also :+1:s for explicit interfaces for docs and multiple pipelines.

petebacondarwin commented 9 years ago

@gampleman - thanks for chiming in. The more views the better at this stage. @btford - I promise to do further reading.

One more item for discussion, is the idea of being able to run partial documentation generation if only a small number of input files have been changed. Similar to the idea of only recompiling a single source code file and then linking in it with the other source code files that have been compiled previously - this is what Broccoli tries to achieve. This would enable fast (re)generation so that results of changes to the source files or template files could be immediately viewed during development.

petebacondarwin commented 9 years ago

Transduction

OK, so I am getting more acquainted with transducers. I still think that we should be using Observable flows for the async feature (rather than, say, CSP). This is a good page about integrating Transducers with Rx: https://xgrommx.github.io/rx-book/content/getting_started_with_rxjs/creating_and_querying_observable_sequences/transducers.html

Templating

Reading back through I am not clear which aspect of template "inheritance/overriding" is a problem. There at least two different ways that templates can be resolved, some are more unhelpful than others, but all can be avoided if you wish already:

templateFolders - one can specify a number of folders that are considered in order when trying to load a template. This means that a base package can provide a folder containing a set of standard templates and a later package can provide another folder that contains overridden templates, which will be loaded instead. The key thing to note is that when loading any kind of template file, even a partial or base template, the folders are traversed again, in order, not caring about the folder in which the referencing template was found. This can be confusing because you have to look at multiple folder locations to work out which file might be used in a template. It is probably safest to copy all the templates that you want into a single local package folder, and set up templateFolders to only search that one folder.
Nunjucks template inheritance - any template can extend another template; the base template specifies blocks, which can be overridden in derived templates. I find this very helpful, especially for defining common layout for pages. Moreover the Angular V1 templates use this even further since there are a number of page types that are very similar but only differ in a few places. For example, in the ngdocs package we have:
```
base
-> module
-> api
  -> object
     -> function
     -> service
     -> providers
  -> filter
  -> directive
     -> input
```

The Angular V2 docs also use this technique in a number of places including the TypeScript type files, generated for the project.

Nunjucks macros - these are like parameterized partials. The Angular V1 uses these a lot to render the parameters types of methods on services and so on. These are generally stored in a base package, which relies upon the templateFolders to be set up correctly. It is probably true that these should also be copied over rather than reusing them from base packages.

tommck commented 8 years ago

Have we ever done anything in this vein?

This projects seems to be abandoned at this point

petebacondarwin commented 8 years ago

I would love to but just don't have the capacity right now to do such major work on the library. dgeni and dgeni-packages are in significant use throughout the angular 1 and angular 2 projects (and a number of other projects), so it is by no means abandoned in general. Just not enough time to think about significant rewrites.

tommck commented 8 years ago

I've been so looking forward to getting a refactoring of this project because I find the documentation/usage to be REALLY convoluted.

For instance, I can't even figure out how to document components. I figured I might just be able to add a template, but that doesn't seem to work.

Any tips/tricks?

angular / dgeni

RFC: dgeni.next #136

Immutable data structs

Transducers

Building the pipeline: Constraint solver vs. explicit ordering

TypeScript

Developer experience

Templates

Async Processing

Processor Ordering (Pipeline building)

Docs are not the only fruit

Strongly Typed Docs

Conditional Processing

Template Resolving

TypeScript

Async Processing

Processor Ordering

Template Resolving

Strongly Typed Docs

Docs are not the only fruit

Conditional Processing

TypeScript

Processor Ordering

Template Resolving

Transduction

Templating