bytesparadise / libasciidoc

A Golang library for processing Asciidoc files.
Apache License 2.0
199 stars 24 forks source link

Feature request: Docbook export needed. #564

Open edl7878 opened 4 years ago

edl7878 commented 4 years ago

Great work!

I think it is better to realize export to Docbook firstly? because they are twins in different formats.

Docbook is easy to change to other formats, as an intermediate format.

There are several golang libs for pdf format. The only problem I can predict is image or graph cross the pages.

A question left:

As your document mentioned a draft document generated when parsing the file, is this draft document equals to AST or in the parse stage, finally, an AST are produced?

xcoulon commented 4 years ago

thanks for your feedback and your suggestion, @edl7878!

I think it is better to realize export to Docbook firstly? because they are twins in different formats. Docbook is easy to change to other formats, as an intermediate format. There are several golang libs for pdf format. The only problem I can predict is image or graph cross the pages.

That's a good point. For now my focus is on HTML rendering because I would like to work on integrating it in Hugo afterwards. But a Docbook output would definitely make sense!

As your document mentioned a draft document generated when parsing the file, is this draft document equals to AST or in the parse stage, finally, an AST are produced?

Yes, the "draft document" is an AST that is produced by the parser, but it's in a very "raw" form (and it's missing the location of the element in the doc).

This "draft doc" is then processed in https://github.com/bytesparadise/libasciidoc/blob/master/pkg/parser/document_processing.go#L27-L55, where attributes are substituted, list items are grouped into lists, sections are grouped in a hierarchical way, etc. This "final doc" is then ready to be rendered (in HTML for now).

Naming is hard. Maybe RawDocument or ASTDocument would make more sense than DraftDocument? 🤔

edl7878 commented 4 years ago

PEG definition of Asciidoc is fundamental

I think your PEG definition of Asciidoc is fundamental since this definition should be in common between all different language implementations of Asciidoc parser and render.

Presentation of Asciidoc in go could be refined?

I checked the source of DraftDocument and Document definition:

// DraftDocument the linear-level structure for a document
type DraftDocument struct {
    FrontMatter FrontMatter
    Blocks      []interface{}
}

type FrontMatter struct {
    Content map[string]interface{}
}

// Document the top-level structure for a document
type Document struct {
    Attributes        DocumentAttributes
    Elements          []interface{} // TODO: rename to `Blocks`?
    ElementReferences ElementReferences
    Footnotes         []Footnote
}
type DocumentAttributes map[string]interface{}
type ElementReferences map[string]interface{}

According to these difinition, client code is hard to get useful information because they are interface{}. I am wondering how to traversal the Documment value.

Basically, if a client can traversal the Document, element by element, and if element interface has several methods which are in common between all kinds of element, It would be wonderful for client code to refine the renderer, and define a macro based on element type.

For example:

type Element interface{
    Id() string
    Kind() string //based on Asciidoc grammar
    LeftContent() string
    RightChildren()  chan <-Element
    Parent() &Element
}
xcoulon commented 4 years ago

hello @edl7878,

thanks for your feedback and your suggestion. I agree that having a common interface for all types of element would make sense and would be helpful to traverse the nodes in the AST.

Could you please elaborate on the purpose of the LeftContent and RightChildren methods? Also, why would you need to return a chan Element rather than just []Element?

edl7878 commented 4 years ago

hello @xcoulon,

The Element interface that I proposed is just an idea.

Provided that each element could be changed into an HTML tag, based on Kind, and the tag content is form of a plain string and other elements, and we can change this element into an HTML tag as <tag>element.LeftContent<tag>.

the right children are elements, we can get their HTML tags with goroutines, and insert them into the tag as children.

I think initially, to realize the RightChildren as a slice of Element is reasonable.

What I am thinking is that we can go GetHtmlSec(&element, chan string). In this way, the rendering is divided into several goroutines and especially useful in asciidoc editing realtime preview.

General speaking, any literal text in asciidoc could be looked as local, and attributes are global from where it is defined.

if use goroutines to divide the rendering into pieces, the global related stuff may have issues, such as numbing, but in asciidoc editing real-time preview, it is very useful.

I like using vscode asciidoc plugin to edit asciidoc in vs code, if the text is lengthy, the lag is noticeable when inputting.

So I think Libasciidoc may provide a better solution for previewing in the future.

xcoulon commented 4 years ago

@edl7878 out of curiousity, may I ask why you closed this issue? I found this discussion interesting, especially since you provided some requirements.

edl7878 commented 4 years ago

@xcoulon Thanks.

I reopened this thread, hopefully, this could enlighten others.

gdamore commented 4 years ago

The template work we did for HTML5 (and shortly XHTML5) will hopefully be useful as a starting point for this work.

gdamore commented 4 years ago

I'll probably wind up doing the docbook export, which is different from a true AST support.

Having a true AST with high level objects with a common interface would be really nice indeed.

I was thinking about this in the context of renderers too -- having common methods, but I think that becomes a problem when you conflate the AST and specific renderings.

At any rate, I've even started playing with a docbook export. Unfortunately the PEG grammar is still a bit shy in some regards, so I don't think our grammar is really ready for others to make use of (though it does offer a starting point for discussion.)

edl7878 commented 4 years ago

If a true AST needed, PEG should be replaced with another parser, which means current work should be given up.

As a result of the parser, DraftDocument is ok, I think.

But we need to document DraftDocument clearer which helps people could produce other converter based on it.

At the same time, perhaps provide a more powerful DraftDocument interface is necessary too.

Based on the clearer documented and more powerful DraftDocument interface, DocBook converter development could just rely on the DraftDocument interface, while not any low-level implementation.

gdamore commented 4 years ago

I'm not sure I agree with abandoning PEG. Plus there is work afoot to standardize some kind of syntax (which may not be precisely what we have today) in the asciidoctor wg at Eclipse. If what comes out of that is EBNF or something, then we can revisit I think, but it's silly to contemplate changing now until we see what happens there.

What we have now can support DocBook already -- please see the templates work I've done. It's not a big stretch to create a renderer that can emit DocBook. I've examined the effort for that, as well as EPUB and PDF. PDF is by far the hardest, but the SGML variants are relatively straight-forward.

The harder part is the extending the grammar we have today to cover the short-comings -- which is mostly just a matter of work, although there are also compatibility concerns that arise. The Ruby based asciidoctor project is just a bunch of regular expressions, and iterative substitutions (multiple passes) and nothing like a real grammar, which makes trying to make a formal grammar that emits precisely the same thing difficult. (There are quite a few inconsistencies in how that project parses documents, but they are all really edge cases, and unlikely to be encountered by authors writing reasonable documents, and easily worked around (to eliminate the ambiguities) if specific mark up is needed.)

xcoulon commented 4 years ago

If a true AST needed, PEG should be replaced with another parser, which means current work should be given up.

As a result of the parser, DraftDocument is ok, I think.

But we need to document DraftDocument clearer which helps people could produce other converter based on it.

At the same time, perhaps provide a more powerful DraftDocument interface is necessary too.

Based on the clearer documented and more powerful DraftDocument interface, DocBook converter development could just rely on the DraftDocument interface, while not any low-level implementation.

Well, I'm thinking about making some changes on the DraftDocument while working on substitutions on delimited blocks and paragraphs. I think that the draft document will have all file inclusions resolved, but it will contain "raw" lines for paragraphs and blocks (that's the "pre-processing" as mentioned in the Asciidoctor user manual). Then, the processing will take care of applying the substitutions and dealing with elements ordering (lists, sections, etc.), inserting the table of contents, preamble, etc. In other words, I don't think that the DraftDocument will be useful for the DocBook support. Now, if this causes troubles, I can change my plans (it's not too late)