Closed Drup closed 8 years ago
Right now, extern
is a block-level simple command. For added flexibility, I could offer an environment version of extern, for example:
\begin{extern}
dot commands go here
\end{extern}
The extension would have to take care of generating a SVG, and outputting a picture
element linking to it. But alas, outputting SVG as a textual element within the Html5 markup would indeed not be possible within the current architecture.
How would I recognize if it's a dot thing ? In an ideal word, I would like
\begin{dot}
\end{dot}
I think the good way is to have an additional constructor block_t
of type Extension of string (* name *) * string list (* options *) * string
(and something similar in inline_t
) and add some functions to the Extension signature for functors that handle arbitrary extensions. Maybe we can also provide a fallback in the constructor If an extensions is not recognized by an output.
For environment commands, it would indeed be easy to allow for custom-named blocks such as \begin{dot}...\end{dot}
. For simple commands, things are a bit more complicated because of macros: there's no way of knowing whether an undeclared \foo{...}
refers to a macro or an extension. However, I think most extensions only make sense in environment commands anyway.
Anyway, I'll add support for extensible environment commands to my TODO list...
I can implement it too, if you are happy with the way I plan to do it.
The other question is : what syntax for the other readers (markdown in particular) ?
It might be more efficient if I implement it, as I'm more familiar with the code. (Something like this feature was already on my mind for a while, so it's nothing unexpected.)
As for the other readers, it depends. For Markdown, I'm using OMD for the grunt work, so it would have to fit within whatever mechanisms OMD offers. For Lambwiki, however, I may consider skipping it altogether, as Lambwiki is supposed to be a minimalistic language which covers only a subset of Lambdoc anyway.
@drup: I've just pushed the new extension mechanism. It's still not documented, so I recommend taking a look at the tutorial in the examples
directory. Basically, it allows the definition of custom inline or block commands (within some limits, especially for the latter). The example in part 5 of the tutorial illustrates how you can create a new \banner
block command which feeds its argument to the banner
utility, producing a verbatim-like block with the bannerised version of the argument.
Please check the Lambdoc_core.Extcomm
module for the various supported syntaxes. For your \begin{plot}...\end{plot}
example, it seems the appropriate syntax would be Synblk_envraw
(i.e., an environment block command taking only raw text as argument).
Also, note that for now only the Lambtex parser has support for extensions. Supporting them in Markdown and the other languages is next on my list.
Anyway, let me know if this fits your needs!
Ah, Nice! thanks you. :)
I took a look at it, I mostly like the main design. The name you chose are terrifying, though (It took me 5 minutes to realize "syn" was for "syntax") :D
My main grudge is that the ext*_t
types are too complicated and syntax-focused. In the core (extinl_t
and extblk_t
) we only need two variants, "raw" and "not raw". you can encode all the other things in that (basically, string list * string list
and string list * Intline.seq_t list
for "the options, then the content").
These syntax information means something for lambtex, but not for the other formats. The core is quite format agnostic (not as much as I would like, but still), I don't think it's a good idea to introduce all those syntactic specificities.
I don't have much time to try to use it just now, but I'll try.
@Drup: Thanks for the feedback. I'm well aware that because Lambtex is the only markup which currently supports extensions, there is the danger of introducing Lambtexisms into the core. This danger should be mitigated once I add extension support for other markups, though. Consider the extension mechanism a work in progress...
In the ext*_t
types, the simple/environment distinction is one of those Lambtexisms. I wanted to have some way of telling Lambtex whether some extension was a simple or environment command, and that's why it exists. However, I'm considering abandoning this distinction altogether, and allowing, for example, a command banner
to take the form \banner{...}
or \begin{banner}...\end{banner}
. For consistency sake, the two forms should also be allowed for the built-in commans such as verbatim
, of course.
Nevertheless, I don't agree that only "raw" and "not raw" variants are needed. I really do want to make the distinction between block commands which require, for instance,
Granted, the AST could very well support only a generic form as you suggested, but then it would be the extension's responsibility to report misuse (eg: "you gave me an extra inline sequence, but I don't know what to do with it"). I think the AST compiler should do this job, which is why I would prefer for extensions to report the exact type of parameters they expect. Moreover, note that this is a core issue, and not a Lambtexism.
(Also, allow me to page @edwintorok, as his feedback would also be welcome.)
I think abandoning the distinction simple/env is indeed a good idea. Of course the distinction inline/block should still be there. "Raw or not raw" was a bit overly simplistic, yes. :p
I think a nice general solution would be to have something of the form
type content = Raw of string | Seq of Inline.seq_t (* we can potentially add other things *)
and extblk_t = content list
then you can just pattern match on it and decide the shape you want easily. (with any interleaving of Raw and Seq). The extension would provide a witness indicating the expected shapes to the parser, (as of your current solution): type shape = [
Seq | Raw ] list
(and the witness is a shape list). It preserves the current features while being simpler and giving more freedom.
It could be done with a GADT giving the shape of the accepted values and indicating the type (I might take a shot at that, just for the fun of it, but it's not necessarily a good idea :p). It would enforce that the witness and the accepted shapes do match.
In omd, the extension mechanism is done by the node X
which contains output functions. We of course can't adopt this mechanism because it would hardcode the possible writers, which is precisely what we want to avoid! :)
@darioteixeira thanks for the extension support, it is a good start! Here is my feedback:
Given the answer on #26 I would agree with abandoning the distinction between simple and environment commands, i.e. Extinl_sim*
would be simplified to Extinl_*
(inline commands are always simple), and Extblk_sim*
would be merged with Extblk_env*
(block commands can be either simple or environment).
There is a BatResult.t, is there a reason for defining your own?
There is a Lambdoc_reader.Extension.S and Lambdoc_writer.Extension.S but there isn't a combined one, and combining them involves some boilerplate. Could you provide a signature for the combined extension type too?
Regarding the discussion above I think there should be a short example for each as a comment.
Just by looking at their types I'm not sure how the lambtex input should look like.
simraw
and simseq
are clear, but if I also have order/label/style parameters then how does that map to extinl_t
and extblk_t
? And what if I want to have multiple parameters like macros do?
I think there should be a sample extension that just dumps its input as sexp, and a sample document that exercises all the extinl_t
and extblk_t
variants.
That might also come in handy when developing/debugging other extensions.
The extension type is good for a low-level extension, but only allows me to define one extension.
If I want more than one extension I have to write a functor that composes multiple extensions but that requires to hardcode all possible extensions at build time.
To support dynamically selected extensions there could be a combine
function that takes a list of first class modules. Maybe the extension types should be further split based on simseq vs simraw?
I'd prefer you provide something equivalent to combine
but please the Extension module too as that allows to override the Monad too (I thought about putting the Monad signature inside the extensions in combine
but I haven't figured how to write the type constraint for the first-class modules).
There is no better way to evaluate how well an extension mechanism works than by actually writing an extension for something realistic. So I started writing an extension for parsing Org mode tables using mlorg. Although is not complete yet -- Orgmode's table features don't map 1:1 to Lambdoc's (column groups are missing from lambdoc) -- I was able to parse a simple table already, and I'm quite happy with the Lambdoc side: I could find my way around inline.mli and block.mli quite easily. (I hit more problems on the mlorg side (doesn't quite parse the full org-mode table syntax, and I had to patch the build system to be able to use it as a library) than the lambdoc side.)
I'll try to write a full mlorg-to-lambdoc extension, see what problems I hit (heading labeling/numbering seems complicated for example), and report back.
@edwintorok:
Why result_t?
Well, one of the things to do before a 1.0 release is to inventory all Batteries-specific functions. If their number/complexity is sufficiently low, then it might be worth putting them in a Lambdoc_util module, and thus remove the (big) dependency on Batteries. Hence, I'd rather not expose anything Batteries-specific in the API.
Type for read+write extension
The rationale behind the separation between reader and writer extensions is not obvious: basically, I want to support an architecture where the reading may take place in a different process from the writing. For resolving links and images, the reader/writer separation is crucial, but that's not the case for inline/block extensions, which can easily be done entirely on either side. Hence, I am now considering a different approach to inline/block extensions.
As for providing a combined module signature, that's easy to do once the API is settled. Until then it's just extra work that will be rendered obsolete anyway.
raw vs seq and ext_t
It's still not documented because the extension mechanism is in a state of flux...
Support for multiple extensions
Yeah, I've been thinking about the same thing. Instead of users providing a single extension, it might be better if they provide a list of independent extensions (using first class modules). This will help with combining extensions from multiple origins.
Writing my first extension
Please do go ahead and play with it. Just be aware that the API will change, so be prepared to adapt your code in the future...
Maybe we could rely on open types instead of first class modules ? It would make things as safe but much much easier to write.
@Drup:
It could be done with a GADT giving the shape of the accepted values and indicating the type (I might take a shot at that, just for the fun of it, but it's not necessarily a good idea :p). It would enforce that the witness and the accepted shapes do match.
Yes, I agree that using GADTs and a type witness would be a good way of avoiding the silliness of forcing the extension to pattern match against a variant when only one of the cases is relevant (the example extensions currently use assert false
for the non-relevant cases...)
Maybe we could rely on open types instead of first class modules ? It would make things as safe but much much easier to write.
Are you referring to the open extensible types introduced in 4.02? I haven't played with that feature yet. I'll have to investigate it better...
Are you referring to the open extensible types introduced in 4.02?
Yes. Basically we would have an open type extblk
. A new extension would add a new variant to the type and register a function of type extblk -> ... option
(writer) or ... -> extblk option
(reader) that matches only this variant, returns Some ...
in this case or None
otherwise.
Extension non-interference would be ensured by the fact that the new variant is not exposed and his kept private, so only the defined functions know about it.
The extension handler would possess the list of functions and try them all successively until one (or none) returns an element.
@Drup: that's pretty interesting, thanks! Could you recommend any paper/software that explores open extensible types, btw?
@Drup and @edwintorok : I'm considering another major change to the extension mechanism: for inline/block command extensions, the reader/writer split would be eliminated. Instead, these extensions would only be available on the reader side. Moreover, instead of outputing values of type Inline.seq or Block.frag, the extensions would instead output raw Reader.AST values. This approach has the advantage that extensions may use elements (notes, bib entries) that require processing by the compiler. What do you think? (This approach also has some disadvantages, of course)
If you deal with just Reader.AST as output then you won't need the custom internal datatypes for extensions, so could extensions be just functions (paired with an identifier) instead of modules? If so that might simplify the extension interface, or at least composing of extensions, and as you noted allow full access to the same features you have in a lambtex input document, extensions could even define and call macros, which is something not possible with the current extensions.
So I like this idea for block and inline extensions. I'm not sure about image and link extensions as those seem more like a post-processing transformation (as opposed to defining new commands) and better fit for the current Extension module. In fact you could take this distinction further (edited):
read_extblk
and read_extinl
except they output a Lambdoc_reader.Ast.t
value directlyInline.seq_t
and Block.frag_t
how would you define an inline construction that maps to some custom HTML that way ?
@Drup: that's pretty interesting, thanks! Could you recommend any paper/software that explores open extensible types, btw?
Sorry, I don't have any example right now. You can search in the mailing list archive a bit, but I think you will mostly find some simple tricks (to encode universe types, for examples).
@edwintorok:
Yes, as I mentioned, eliminating the reader/writer split would only apply to inline/block command extensions. Resolving links/images would remain split between reader and writer, as I want to support the case that those two stages reside in different processes.
@Drup:
how would you define an inline construction that maps to some custom HTML that way ?
You wouldn't. However, note that in the current extension mechanism you can't do that anyway: extensions must output either Inline sequences or Block fragments. Note that extensions should be generic, and not tied to a particular writer like HTML. This of course limits them somewhat, but is there any concrete example where this limitation is a show-stopper?
Sorry, I don't have any example right now. You can search in the mailing list archive a bit, but I think you will mostly find some simple tricks (to encode universe types, for examples).
Alright. I'll post a message to the caml-list if any doubts show up.
@edwintorok: Yes, the filter extension mechanism you suggest may be indeed be the best way forward. For convenience sake, it could also offer of an AST mapper like that provided by the new extension points feature in the OCaml compiler.
This of course limits them somewhat, but is there any concrete example where this limitation is a show-stopper?
Yes, basically all the one I want to implement. Of course the intermediate representation (in the core IR) is html-agnostic, but it's also non-encodable in the rest of the IR (afair) so I really need a custom variant.
I've pushed a preliminary version of the new extension mechanism. Highlights/caveats:
Ast
values. I reckon the advantages of this approach outweigh its disadvantages.Lambdoc_reader/Extension
. Yes, some more general mechanism is probably desirable, but I reckon this suffices for now.Blksyn_raw
should be used for the former and Blksyn_lit
for the latter.link_reader
or image_reader
is supposed to return Some result
if it can handle a link/image, and None
otherwise. In the latter case, the next extension in the list is given a chance.lambcmd_with_bookaml
for a practical demonstration.As always, feedback is welcome!
In the meantime, the extension mechanism now also supports extensions that add new ghost blocks. The most obvious application is solving @edwintorok's request about inline declaration of endnotes, and I've added a new instalment to the tutorial illustrating precisely this case.
I've made yet some more tweaks to the API and implementation of the extension mechanism. I reckon it is ready for more extensive testing, so please let me know what you think!
By the way, an interesting side-effect of the new extension mechanism is that it makes it trivial to embed markups within markups. I've added a new instalment to the tutorial illustrating this. Please see also this Lambtex source-file which embeds all four markups.
I think this ticket may be closed for now. The extension mechanism is fairly flexible and powerful already, and I reckon it won't require any further changes before 1.0 (famous last words...). Okay with you, @Drup and @edwintorok ?
OK to close, I think the extension mechanism is general enough now. Might have to write a few convenience functions on top of foldmapper to make it easier to write certain kinds of extensions, but those don't necessarily have to come with lambdoc, they can be part of the extension itself. The only way to know for sure is to actually try and write some extensions. Do you have a timeframe in mind for 1.0?
@edwintorok: I definitely want to refactor the Lambtex and Lambwiki parsers before 1.0. I'm just waiting for the next Menhir version to come out, which according to François Pottier should happen soon (the next Menhir version has features which should make on-the-fly lexer switching much easier and cleaner). I would also like to finalise the Lambtex language itself (fix issues #29 and #33). There are other smaller issues, but none of those are really show-stoppers.
@Drup and @edwintorok: I'm closing this issue, as I'm reasonably happy with the current extension mechanism. Feel free to reopen the issue if new ideas pop up!
I investigated the extension mechanism and I find it insufficient for my needs.
Imagine I would like to do a plugin where you include dot source and it replaces it by an svg. For that I need
In this example, I probably could get away by using a picture pointing to a dot file, but I have other examples where It's not possible to go through an external file.