asciidoctor / asciidoctor

:gem: A fast, open source text processor and publishing toolchain, written in Ruby, for converting AsciiDoc content to HTML 5, DocBook 5, and other formats.
https://asciidoctor.org
Other
4.84k stars 789 forks source link

Support chunked (multi-page) HTML output in asciidoctor #626

Closed gAmUssA closed 5 years ago

gAmUssA commented 11 years ago

Asciidoc can produce chunked html output via docbook toolchain http://www.methods.co.nz/asciidoc/chunked/ch05.html#_asciidoc_docbook_xsl_stylesheets_drivers

mojavelinux commented 11 years ago

I'm interested to see if we can get this done without having to use the DocBook toolchain. Of course, the DocBook toolchain is always an option, and likely something we'll want to integrate into asciidoctor-fopdf. But it would be nice to keep HTML generation contained within Asciidoctor.

My idea here is to having some sort of Treeprocessor that walks the AST and produces multiple trees divided at a given section level. Splitting up the AST is the simple part. Maintaining all the cross-references is the harder part ;)

AndorChen commented 10 years ago

:+1:

mojavelinux commented 10 years ago

I think we can definitely do this without the DocBook toolchain. The one barrier at the moment is that the AST (the document model) needs to get some refinements that allows it to be split apart. I have an early prototype which I'll share as soon as I can get back to it.

There will also be some design questions this will raise. When I push the prototype, I'll pose those questions here. If you want to try to take a stab at it before then, don't feel like you have to wait on me. Let's do this.

logemann commented 9 years ago

+1 for this feature.

mojavelinux commented 9 years ago

Here's a super quick hack using a custom converter.

https://gist.github.com/mojavelinux/d94372393950ca76d594

jxxcarlson commented 9 years ago

Great!  Could you give me a use case?  I tried

  $ asciidoctor -r ./multipage-html5-converter.rb -b multipage_html5 foo.ad

iat the root of my backends folder (see below), but I got an error:

/Users/carlson/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/asciidoctor-1.5.1/lib/asciidoctor.rb:1528:in `initialize': No such file or directory @ rb_sysopen - -r (Errno::ENOENT)

 $ ls -F

README.adoc haml/ slim/

erb/ multipage-html5-converter.rb

foo.ad ruby/

 $ 

http://noteshare.io

On Mon, Nov 3, 2014 at 1:02 AM, Dan Allen notifications@github.com wrote:

Here's a super quick hack using a custom converter.

https://gist.github.com/mojavelinux/d94372393950ca76d594

Reply to this email directly or view it on GitHub: https://github.com/asciidoctor/asciidoctor/issues/626#issuecomment-61443068

mojavelinux commented 9 years ago

@jxxcarlson When we get this converter published somewhere, I'll be sure to test and document it. Stay tuned!

jxxcarlson commented 9 years ago

OKK! I want to use it in http://noteshare.io as soon as practical to do so.

On Nov 9, 2014, at 3:02 PM, Dan Allen notifications@github.com wrote:

@jxxcarlson https://github.com/jxxcarlson When we get this converter published somewhere, I'll be sure to test and document it. Stay tuned!

— Reply to this email directly or view it on GitHub https://github.com/asciidoctor/asciidoctor/issues/626#issuecomment-62293989.

rhattersley commented 9 years ago

Here's a super quick hack using a custom converter.

https://gist.github.com/mojavelinux/d94372393950ca76d594

Invoke via:

ruby multipage-html5-converter.rb <your-asciidoc-file>

Ensure you have the output directory already in place.

gAmUssA commented 9 years ago

@mojavelinux I tried multipage-html5-converter.rb. it throws me an error

/usr/local/opt/rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/asciidoctor-1.5.2/lib/asciidoctor.rb:1362:in `rescue in load': asciidoctor: FAILED: <stdin>: Failed to parse source, undefined method `encoding' for nil:NilClass (NoMethodError)
    from /usr/local/opt/rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/asciidoctor-1.5.2/lib/asciidoctor.rb:1281:in `load'
    from /usr/local/opt/rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/asciidoctor-1.5.2/lib/asciidoctor.rb:1458:in `convert'
    from multipage-html5-converter.rb:33:in `document'
    from multipage-html5-converter.rb:18:in `convert'
    from /usr/local/opt/rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/asciidoctor-1.5.2/lib/asciidoctor/document.rb:1028:in `convert'
    from /usr/local/opt/rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/asciidoctor-1.5.2/lib/asciidoctor.rb:1499:in `convert'
    from /usr/local/opt/rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/asciidoctor-1.5.2/lib/asciidoctor.rb:1562:in `convert_file'
    from multipage-html5-converter.rb:94:in `<main>'
gAmUssA commented 9 years ago

disabled safe options, converted ends without errors, but -chunked.html Zero bytes

mojavelinux commented 9 years ago

@gAmUssA that may be because the converter needs to be updated for the latest API. The best approach right now is to get this code migrated into the extensions-lab where we can hack on it and experiment with different approaches. I'll try to get that step done soon.

mdlinville commented 9 years ago

I played around with this, but I don't understand how you would create a TOC for the multiple pages, also using Asciidoctor.

connollyst commented 9 years ago

+1

mojavelinux commented 9 years ago

As I mentioned, I need to push this into extensions lab so we have a clearer starting point.

ob commented 9 years ago

Since I couldn't find the converter in the extensions lab, I did a quick hack to make it run with a semi-current asciidoctor (1.5.2)

https://gist.github.com/ob/4c122d882f2c1b201883

eskatos commented 9 years ago

:+1: for the feature, chunked html output with cross-references handling would be awesome!

lordofthejars commented 9 years ago

we need this at CloudBees too. I am going to move the code of @ob to extensions lab.

mlocati commented 9 years ago

Me too I needed this. I solved this issue in quite a hacky way:

  1. convert the asciidoc files to docbook
  2. generate the chunked html with saxon+docbook-xsl, with the TOC in a separate file
  3. read the generated files (both TOC and contents) and use a template HTML file to get the resulting chunked document with a better presentation than the one of saxon+docbook-xsl

The whole script that converts from our asciidoc manual to all our output formats (PDF, EPUB, HTML-single and HTML-chunked) can be found in this library file (it's called by one of these scripts).

Obviously, a direct conversion from asciidoctor to chunked-html would be much much easier and faster.

lordofthejars commented 9 years ago

Yes currently we are doing something similar, of passing through docbook. Now I am trying to implement some TreeProcessor and see what's going on.

mojavelinux commented 8 years ago

Obviously, a direct conversion from asciidoctor to chunked-html would be much much easier and faster.

I want to emphasize this point to the group. Multi-page HTML5 is entirely possible from AsciiDoc if you use the DocBook toolchain. What isn't yet available out of the box (currently in prototype only) is direct AsciiDoc to multi-page HTML using only Asciidoctor.

I've published my prototype to the extensions lab so that we can start hacking on it. The major limitation of my implementation at the moment is that the xrefs break. The reason this is more complicated than it should be is because we don't parse xrefs (completely) until the convert phase. However, we do catalog them, so it might be something we can use to build a postprocessor to know where to fix them. That's where I left it.

https://github.com/asciidoctor/asciidoctor-extensions-lab/blob/master/lib/multipage-html5-converter.rb

I definitely plan to graduate this from the extensions lab into its own repository so it can be consumed as a gem. But we need to do some hacking on it first to make it a bit more viable.

mojavelinux commented 8 years ago

@ob Feel free to integrate any changes you made to the converter in your gist to the one in the extensions lab.

mojavelinux commented 8 years ago

One of the big gaps in the current implementation is that the attributes from the original document are not passed on to the pages. Those need to be passed on. I added a hack for imagesdir, but all the attributes need to be propagated.

mojavelinux commented 8 years ago

@gAmUssA The error you were seeing was do to a missing author attribute. I've added a guard for that (as did @ob in his impl).

mojavelinux commented 8 years ago

@mstanleyjones A TOC can be created by taking notes from @jxxcarlson. See http://home.noteshareblog.io/

gAmUssA commented 8 years ago

@mojavelinux thanks, I will try

mojavelinux commented 8 years ago

I want to point out that this is also an opportunity to define some missing APIs in Asciidoctor. Clearly, one missing API is the ability to cleanly create a new empty document without parsing for making a new page. Another is this reparenting stuff that's going on. This is a great opportunity to make these APIs more flexible and intelligent.

ob commented 8 years ago

@mojavelinux I added what I had in https://github.com/asciidoctor/asciidoctor-extensions-lab/pull/56. Cheers!

gAmUssA commented 8 years ago

@mojavelinux could you point me how to use rb extentions from gradle scrippt, please?

mojavelinux commented 8 years ago

You use the requires directive, except instead of a library name, it has to be a file path (using the file function in Gradle).

asciidoctor {
  requires file('./path/to/extension.rb')
}

We really need to add this example to the README for the Asciidoctor Gradle plugin (https://github.com/asciidoctor/asciidoctor-gradle-plugin).

mojavelinux commented 8 years ago

@ob excellent, I'll have a look! Thanks!

gAmUssA commented 8 years ago

Apparently, source-highlighter=highlight.js attribute doesn't work with chunked output. Tried patched version by @ob. Still doesn't work. Patched version doesn't really work at all - no error, just generated some stylesheets and that's all.

gAmUssA commented 8 years ago

@mojavelinux thanks for this tip

mojavelinux commented 8 years ago

I think we shouldn't be worrying about source highlighting at this stage. That's polishing. That will come once we get the attributes passed through correctly. What we should be focusing on right now is the structure of the output and getting xrefs working.

gAmUssA commented 8 years ago

@mojavelinux like I mentioned, @ob patch does't work with attributes

mojavelinux commented 8 years ago

@gAmUssA I'm sure it just needs some debugging. We'll get there :)

ob commented 8 years ago

@gAmUssA Yeah, sorry... I just plopped the patch without testing it. I needed to reset the backend for it to work. @mojavelinux an open question is what attributes are worth keeping and which are worth resetting. I've updated the pull request with the bug fix.

mojavelinux commented 8 years ago

what attributes are worth keeping and which are worth resetting.

Good question. I need to give it some thought.

I've updated the pull request with the bug fix.

I'll integrate it asap.

ob commented 8 years ago

@mojavelinux I should probably rebase it once you review it and approve right? To clean up the history...

mojavelinux commented 8 years ago

That would be ideal. Though to avoid having to do it twice, I'll review first just in case other changes are needed.

jaredmorgs commented 8 years ago

This is a really exciting opportunity. I know that some people really need the ability to chunk HTML but still retain the link logic. Having this ability would open up Asciidoctor to a larger audience IMHO.

Great work on this, and keep it coming.

mojavelinux commented 8 years ago

While in practice I agree we need to provide the ability to chunk the output (multiple output documents for a single input), I do want to point out that I think arbitrarily slicing of a document at a fixed section level is a very bad practice in terms of usability. It leads to pages like https://docs.fedoraproject.org/en-US/Fedora/23/html/System_Administrators_Guide/s2-email-types-mua.html, which is a complete waste of a click because it lacks any context or substance. It's a single glossary term in the middle of a book. That's not a page.

I really like the emergent idea of topic-based authoring, information which is pre-chunked and can be combined together in different ways using includes to create different documents. The OpenShift Origin documentation is a great example of this (as supported by Asciibinder). See https://docs.openshift.org/latest/welcome/index.html

Where chunking is valuable is when a large book/document needs to be partitioned to avoid unnecessary grouping of information and lots of scrolling (like the current Asciidoctor user manual). But even then, that's a stop-gap measure. Topic-based authoring may be the correct solution.

mojavelinux commented 8 years ago

To put it another way, the author knows best where the natural breaks are and should author with that in mind so that the tool doesn't have to guess. Resist making multi-page HTML output just because someone said it is part of the deliverable. That helps no one.

jaredmorgs commented 8 years ago

The overarching Asciidoctor User Guide could be multiple targeted documents. I was thinking that very thing the other day. You do lose the ability to Ctrl+F to find any and all things about Asciidoctor though, which is less optimal for quick answers.

(totally off-topic for this issue, so I'll stop now).

mojavelinux commented 8 years ago

Actually, I really want to chat with you about using AsciiBinder to redo the documentation site for Asciidoctor. Let's touch base about it on the mailing list. I'll start a thread.

robertpanzer commented 8 years ago

CTRL+F is my most used feature of the Asciidoctor manual! Please don't remove it!

Am Freitag, 15. Januar 2016 schrieb Dan Allen :

Actually, I really want to chat with you about using AsciiBinder to redo the documentation site for Asciidoctor. Let's touch base about it on the mailing list. I'll start a thread.

— Reply to this email directly or view it on GitHub https://github.com/asciidoctor/asciidoctor/issues/626#issuecomment-171856910 .

jaredmorgs commented 8 years ago

It's only because there isn't any other way to search for info. I do use it the same way though, and agree it is a very natural way to search for info.

On Fri, 15 Jan 2016, 15:43 Robert Panzer notifications@github.com wrote:

CTRL+F is my most used feature of the Asciidoctor manual! Please don't remove it!

Am Freitag, 15. Januar 2016 schrieb Dan Allen :

Actually, I really want to chat with you about using AsciiBinder to redo the documentation site for Asciidoctor. Let's touch base about it on the mailing list. I'll start a thread.

— Reply to this email directly or view it on GitHub < https://github.com/asciidoctor/asciidoctor/issues/626#issuecomment-171856910

.

— Reply to this email directly or view it on GitHub https://github.com/asciidoctor/asciidoctor/issues/626#issuecomment-171876725 .

Sent from Mobile.

roelvs commented 8 years ago

My main use for asciidoc is the creation of academic course materials, and I'm pretty sure my students would appreciate it a lot if their 200 page syllabus wasn't rendered on one single html page. This is one of the reasons I believe they now mostly rely on the pdf version to study... Having a structured document/site with multiple smaller pages would be a major advantage for this use case...

Example of my current book (html): http://roelvansteenberghe.ikdoeict.be/cursus/book_comparch.html

Example of a better structured output (gitbook): https://codegangsta.gitbooks.io/building-web-apps-with-go/content/index.html

mojavelinux commented 8 years ago

@roelvs I don't think this is about one or the other. What I'm debating is where the responsibility lies to split the document. I think the most sound approach is to split the AsciiDoc source into multiple files, then convert each file individually as well as an aggregate master that uses includes (for people who like to use Ctrl+F). That's possible today (though there's an open issue with controlling the numbering state value, see https://github.com/asciidoctor/asciidoctor/issues/1368).

eskatos commented 8 years ago

Whether a document is automatically split according to structure levels or by some authored markup, my main concern about split output is to get references/links working.