Open btford opened 9 years ago
@btford: Thanks for getting this started. Here is a quick brain dump of my thoughts:
Currently, all processors can also be async, which means that they can return a promise to the docs collection. In terms of the Transducers approach above, I wonder if we would be better to make use of some functional reactive programming library such as https://github.com/Reactive-Extensions/RxJS or https://github.com/baconjs/bacon.js?
I think this is a tricky problem. I played with a few different methods when originally designing Dgeni.
The problem with explicit ordering is that there are some packages, such as examples
, which need to have to processors that run at particular points in the pipeline - i.e. collecting up the tagged examples and then injecting the rendered examples back into the final output. This is an implementation issue for that package and the user of the package should not have to concern themselves about this. In the gulp scenario one would need the user of the package to ensure that these two processors were added at the correct point.
I am not sure how many use cases there are that require this. The primary one being related to inline tags. If it were possible to refactor how this worked so that we no longer needed the user to be responsible for this ordering then, I believe that it would be cleaner and clearer to use an explicit ordering.
(By the way, my event hooks PR also includes a new method called dgeni.info()
which dumps out the full processor list in the order in which they will be run.)
As the complexity of the implementations of dgeni increased, I had to start adding services that contain additional information about the docs. For instance, the moduleMap
, which is a Hash of module names to docs, and the exampleMap
, which is used to hold tagged examples that are parsed from the source content. Even more complex is the aliasMap
, which is used extensively when resolving links to access a doc by its id and its aliases.
I began to wonder if what we really needed was a concept similar to the BrocolliJS Trees. This would allow us to demote the status of docs
to just another collection (or tree) of data that is to be manipulated.
Another idea I have been playing with is the concept of each doc having a well defined type, perhaps defined as a class in TypeScript, rather than simply relying on the docType
property of a doc and hoping for the best with regards to the other properties that will appear.
In this sense, some processors, whose job is simply to attach new properties to a doc, would be more like creating a new doc from previous doc(s), which had a new type, but in some way inherited from the previous doc(s). An alternative way of thinking about this would be that processors attach traits to a doc as it passes through the pipeline. This might also play well with the idea of immutable docs.
This typing would allow the developer to reason more clearly about a document when it arrives at a particular processor or when trying to debug how a doc came about.
That being said, I am a little fearful of over constraining docs with such as type system, causing more bloat for not much benefit.
Another thing that I have noticed as dgeni has evolved is that many processors are only interested in a particular kind of doc and processors waste time iterating through the docs, filtering out the ones that they care about. For instance, in the angular.js docs, the guide docs have a significantly different processing life cycle to the API docs.
This also means that often the pipeline is unnecessarily long for a number of types of docs and it is difficult to see what impact each processor has on different types of docs; particularly if you are not diving in and looking at each processor's source code.
I wonder if we could consider the idea of multiple pipelines that split apart for distinct processing and maybe merge back together later for shared processing (such as rendering or writing to files). This might be something that would work well with the reactive functional programming approach.
I actually really like that templates can be overridden and inherit from each other but I agree that we need a better way of communicating what templates are being used for a particular document.
I would like to have a fair debate about the use of TypeScript. Would we require that all Dgeni components (packages, processors, services, etc) are also written in a "typed" fashion? If we are going to add in a build step for dgeni (i.e. TypeScript -> JavaScript) then should we consider whether a completely different programming language and platform (say Go or Ruby) would be preferable?
Transducers also work async; we could even use them with Rx. Please spend some time with them– the documentation isn't intuitive, but I think transducers are exactly the correct abstraction.
Explicit ordering (with merging) handles cases like tags. It's a bit tricky to explain, but the ordering communicates the prerequisites. The advantage is that it's easier to read the explicit ordering because it's all in one place. Basically, a package should specify an array of refs to steps:
[step1, <ref to imported step>, step2, step3]
Which is then "de-sugared" to the sort of "run before, run after" properties currently on each process.
I have never seen the inheritance/overriding/default feature used profitably. Is there a good example of the overriding feature making things easier?
I like that we package some templates with each package so they can be used as a springboard, but that they are wired up and ready for use has only caused me grief. Even when I understand how the overriding works, I have to look in 3-4 spots in the code to figure out what will happen at runtime.
In most cases, there are only a handful of top-level templates (excluding partials, etc). I don't think making the templates explicit introduces much boilerplate.
:+1: for explicit interfaces for docs.
Again, if you use objects for DI tokens, you can statically analyze plugins to automatically produce the ordering information. So simply by saying your plugin wants the aliasMap
, dgeni can ensure that your plugin is run after the plugin responsible for generating that resource.
Not sure what the "tree" architecture has to do with docs; I think the linked API makes little sense for dgeni.
I do agree that a doc is just one type of "thing" in a big collection of things that dgeni needs to process.
I think what we want is for a dgeni process to be able to produce a "thing" that you can inject into a successive processor. That would handle these different "maps" quite elegantly. Perhaps each process has the option of creating a child injector used for successive processors. That could be powerful.
Have you profiled to see if this is really that costly? Also transducers (if we chose to use them internally) can largely eliminate this cost– they'd only apply the filter operation once in most cases.
I'm skeptical that this is really worth complicating the architecture for– most of the time in the app is probably spent running the processor's function body, not djeni doing upkeep. Unless there's good evidence to the contrary, I wouldn't worry about it.
I see no reason for us to require plugins use TypeScript. Why not publish a distributable that's compiled to JS like other libs written with TS?
I'm not opposed to dgeni 2 being written in an entirely other language, but it does presumably raise the barrier to entry for contributors.
I thought I would chime in as I'm using Dgeni in a perhaps slightly different manner than it was originally intended and hence I've been working on my own ecosystem of packages rather then using dgeni-packages (the thing I'm talking about is here.
The problem with explicit ordering is that there are some packages, such as examples, which need to have to processors that run at particular points in the pipeline. [...] I am not sure how many use cases there are that require this.
I use that feature all over the place, since my package design relies on much smaller modules than a typical dgeni installation.
I think one option is to have templates as an injectable component. This handles the usecase of the user overriding them from upstream packages and is the same mechanism how they can override any other dependency.
Also :+1:s for explicit interfaces for docs and multiple pipelines.
@gampleman - thanks for chiming in. The more views the better at this stage. @btford - I promise to do further reading.
One more item for discussion, is the idea of being able to run partial documentation generation if only a small number of input files have been changed. Similar to the idea of only recompiling a single source code file and then linking in it with the other source code files that have been compiled previously - this is what Broccoli tries to achieve. This would enable fast (re)generation so that results of changes to the source files or template files could be immediately viewed during development.
OK, so I am getting more acquainted with transducers. I still think that we should be using Observable flows for the async feature (rather than, say, CSP). This is a good page about integrating Transducers with Rx: https://xgrommx.github.io/rx-book/content/getting_started_with_rxjs/creating_and_querying_observable_sequences/transducers.html
Reading back through I am not clear which aspect of template "inheritance/overriding" is a problem. There at least two different ways that templates can be resolved, some are more unhelpful than others, but all can be avoided if you wish already:
templateFolders
- one can specify a number of folders that are considered in order when trying to load a template. This means that a base package can provide a folder containing a set of standard templates and a later package can provide another folder that contains overridden templates, which will be loaded instead. The key thing to note is that when loading any kind of template file, even a partial or base template, the folders are traversed again, in order, not caring about the folder in which the referencing template was found. This can be confusing because you have to look at multiple folder locations to work out which file might be used in a template. It is probably safest to copy all the templates that you want into a single local package folder, and set up templateFolders
to only search that one folder.Nunjucks template inheritance - any template can extend another template; the base template specifies blocks, which can be overridden in derived templates. I find this very helpful, especially for defining common layout for pages. Moreover the Angular V1 templates use this even further since there are a number of page types that are very similar but only differ in a few places. For example, in the ngdocs package we have:
base
-> module
-> api
-> object
-> function
-> service
-> providers
-> filter
-> directive
-> input
The Angular V2 docs also use this technique in a number of places including the TypeScript type files, generated for the project.
templateFolders
to be set up correctly. It is probably true that these should also be copied over rather than reusing them from base packages.Have we ever done anything in this vein?
This projects seems to be abandoned at this point
I would love to but just don't have the capacity right now to do such major work on the library. dgeni and dgeni-packages are in significant use throughout the angular 1 and angular 2 projects (and a number of other projects), so it is by no means abandoned in general. Just not enough time to think about significant rewrites.
I've been so looking forward to getting a refactoring of this project because I find the documentation/usage to be REALLY convoluted.
For instance, I can't even figure out how to document components. I figured I might just be able to add a template, but that doesn't seem to work.
Any tips/tricks?
Here are my thoughts on the next version of dgeni.
Immutable data structs
I think these should have the additional restriction that you can only set keys on an object; you can't delete or modify them.
I think using
Object
keys instead ofstring
keys would improve refactoring.If we're using ES6 module syntax, and tokens to access keys in steps in the processor, we could even go as far as using static analysis to determine the partial ordering of steps in the pipeline.
Transducers
See: http://jlongster.com/Transducers.js--A-JavaScript-Library-for-Transformation-of-Data
This should be an implementation detail, and invisible to end users. tl;dr– we can avoid some intermediate work.
I think this implementation is most promising: https://github.com/cognitect-labs/transducers-js
Building the pipeline: Constraint solver vs. explicit ordering
There are two ways to build up a pipeline:
We should consider the second approach. But we should be weary of repeating gulps mistakes– we should not re-run common steps if they appear more than once.
Instead, we should use explicit pipelines to do partial ordering when "merging" pipes.
TypeScript
We should write it in TypeScript. Types are a-ok.
Developer experience
Templates
There should only be one set of templates used at the end of the generation process. Currently, there's some inheritance system that makes it hard to determine which template is actually used.