ggrossetie / asciidoctor-web-pdf

Convert AsciiDoc documents to PDF using web technologies
https://asciidoctor.org
MIT License
448 stars 92 forks source link

Ability to add TOC after preamble for book doctype #492

Open danyill opened 3 years ago

danyill commented 3 years ago

Thank you for asciidoctor-web-pdf, I am greatly enjoying it and trying out its features :smiling_face_with_three_hearts:

I think the ability to add the toc after the preamble has inadvertently been lost in asciidoctor-web-pdf alpha.12:

Here is my minimal reproduction:

= Just a little test
:doctype: book
:toc:
:toc-placement: preamble

// [discrete]
// == Revisions

This should be prior to the ToC

== Thing 1

This should be after the ToC

== Thing 2 

== Thing 3

I ran this in the simplest possible way:

$ asciidoctor-web-pdf -v
Asciidoctor Web PDF 1.0.0-alpha.12 using Asciidoctor.js 2.2.0 (Asciidoctor 2.0.10) [https://asciidoctor.org]
Runtime Environment (node v10.21.0 on linux)
CLI version 3.4.0
$ asciidoctor-web-pdf test.adoc 

Which yielded: test.pdf

But here's the thing. It worked before I upgraded to 1.0.0-alpha.12 so I thought I'd roll back and test again:

$ asciidoctor-web-pdf -v
Asciidoctor Web PDF 1.0.0-alpha.9 using Asciidoctor.js 2.2.0 (Asciidoctor 2.0.10) [https://asciidoctor.org]
Runtime Environment (node v10.21.0 on linux)
CLI version 3.4.0
$ asciidoctor-web-pdf test.adoc -o test2.pdf

Which yielded: test2.pdf

Looking into what happened I feel the most likely cause was in https://github.com/Mogztter/asciidoctor-web-pdf/blob/1fcb2d6a7e110815fa907defe316dc638a34dbe3/lib/document/document-converter.js#L303-L318

Where the following change occurred between 1.0.0-alpha.9 and 1.0.0-alpha.12:

-    if (doc.isAttribute('toc-placement', 'preamble') && doc.hasSections() && doc.hasAttribute('toc')) {
+    if (!this.hasTitlePage(doc) && doc.isAttribute('toc-placement', 'preamble') && doc.hasSections() && doc.hasAttribute('toc')) {

The associated commit is a244a791 however at this point I can't obviously match the commit message to the rationale associated with this change although PR #237 and issue #126.

The logic of the previous code was:

The logic of the current code is:

I don't understand the relevance of the title page here. It seems to me the code in alpha.9 is more correct with my understanding.

I checked my understanding of the :toc-placement: preamble worked with https://docs.asciidoctor.org/asciidoc/latest/toc/position/#beneath-preamble which seems to be in agreement. I also tried this using asciidoctor which behaved as I would expect. Finally I removed :doctype: book from the sample document which resulted in the TOC being generated after the preamble.

image

It seems that hasTitlePage is always true for :doctype: book from https://github.com/Mogztter/asciidoctor-web-pdf/blob/0b38875cb0f476f51b66a5a077a8388b7655414f/lib/document/document-converter.js#L402-L405 which would suggest that it is currently impossible in books to place the TOC after the preamble which (for me at least) is undesirable for my use case (putting a table of revisions prior to the table of contents in a report).

Happy to provide further details and work on a PR for a solution.

ggrossetie commented 3 years ago

Thanks for providing such a detailed analysis!

If I remove !this.hasTitlePage(doc) from:

https://github.com/Mogztter/asciidoctor-web-pdf/blob/1fcb2d6a7e110815fa907defe316dc638a34dbe3/lib/document/document-converter.js#L306

+if (doc.isAttribute('toc-placement', 'preamble') && doc.hasSections() && doc.hasAttribute('toc')) {
-if (!this.hasTitlePage(doc) && doc.isAttribute('toc-placement', 'preamble') && doc.hasSections() && doc.hasAttribute('toc')) {

and this.hasTitlePage(node) from:

https://github.com/Mogztter/asciidoctor-web-pdf/blob/1fcb2d6a7e110815fa907defe316dc638a34dbe3/lib/document/document-converter.js#L513

+if (node.hasSections() && node.hasAttribute('toc') && hasTocPlacementHeader) { 
-if (node.hasSections() && node.hasAttribute('toc') && (hasTocPlacementHeader || this.hasTitlePage(node))) { 

Here's what I get:

asciidoctor-web-pdf

As you can see, there are a few issues:

danyill commented 3 years ago

Let me see if I can help.

I'm not sure I can but I'm also not sure I can't... :wink:

danyill commented 3 years ago

Thanks for your investigation, I have tried to do a little of my own and have gained some insight but don't have a full solution yet -- perhaps with a few hints I might be able to continue.

Immediately after making the change in document-converter.js to:

      if (node.hasSections() && node.hasAttribute('toc') && (hasTocPlacementHeader || this.hasTitlePage(node) && tocPlacement !== 'preamble')) {
      }

We begin to get failing tests:

  69 passing (1m)
  2 failing

  1) PDF converter
       Page break
         should break pages accordingly when the document type is book, has preamble, has section and has TOC preamble:

      AssertionError: expected 5 to equal 4
      + expected - actual

      -5
      +4

      at Context.it (test/pdf_test.js:327:45)

This test is a scenario:

      {
        doctype: 'book',
        preamble: true,
        section: true,
        toc: 'preamble',
        'title-page-attribute': false,
        'expected-page-number': 4
      },

We check our output using:

$ ./bin/asciidoctor-web-pdf -a reproducible='' -a doctype=book -a toc=preamble -a title-page! document-with-title-preamble-and-section.adoc

This test failure is caused by generating table of contents twice:

When we have the preamble attribute set the first table of contents should not be generated. However the condition for providing this table of contents does not include any consideration of the preamble. I have disabled this table of contents generation if the preamble attribute is set because it is handled elsewhere.

I have done this in the tocHeader method:

       const tocPlacement = node.getAttribute('toc-placement', 'auto')
       // Add a toc in the header if the toc placement is auto (default), left or right
       const hasTocPlacementHeader = tocPlacement === 'auto' || tocPlacement === 'left' || tocPlacement === 'right'
-      if (node.hasSections() && node.hasAttribute('toc') && (hasTocPlacementHeader || this.hasTitlePage(node))) {
+      if (node.hasSections() && node.hasAttribute('toc') && (hasTocPlacementHeader || this.hasTitlePage(node) && tocPlacement !== 'preamble')) {
         return `<div id="toc" class="${node.getAttribute('toc-class', 'toc')}">
 <div id="toctitle">${node.getAttribute('toc-title')}</div>
 ${this.convert_outline(node, opts)}

After this there is a failure because it is not visually identical with the test fixture. Hence I would need to regenerate the PDF. Additionally the TOC is visually different if generated after the preamble. The width is slightly smaller and there is a grey border.

The css selector causing this seems to be from paged.js:

[data-id="content"] [data-id="toc"] {
    ...
    border-color: #e0e0dc;
    ...
}

This means that an element with data-id=toc must be a descendant of an element with data-id=content. This is matched in the preamble case because the toc is occurring in the "normal flow of the document" and is not a separate and prior entry.

I'm not sure how hard or desirable it is to override the paged.js styles so I shall await some feedback... Perhaps they have very fine reasons for applying these styles and the default toc styles should be modified to be consistent (?!).

In the meantime I have added to document.css:

[data-id="content"] [data-id="toc"] {
    border: none;
}

I feel this gives acceptable performance (to me at least)

The second test I expect (but haven't checked) is probably similar -- an additional page caused by toc generation twice.

  2) PDF converter
       Page break
         should break pages accordingly when the document type is article, has preamble, has section, has TOC preamble and has :title-page: attribute:

      AssertionError: expected 4 to equal 3
      + expected - actual

      -4
      +3

      at Context.it (test/pdf_test.js:327:45)

The next issue is the page numbering.

I note that in #251 there is the comment that:

"For reference, Asciidoctor PDF considers that the preamble is not a front matter page (by default, this is configurable via theme)."

I propose we leave the very complicated task of page numbering to the victor of #251 and not to attempt to do numbering prior to the TOC here. However it would be good to remove the numbering from the preamble and start correctly after the TOC. Do you think this would be a minimally acceptable starting point?

We currently start page numbering using css on #content. When there is a preamble we should start after the preamble in some fashion. If we have doctype = book and toc = preamble then surely the first section is the right place to start (?) However while inspecting the html output I noticed something looking at the page and the following seems to work:

[data-after-page="preamble"] {
  counter-reset: page 1;
}

I put this in my css but I'm not sure what the general solution is -- do we need to generate different css based on toc attributes? Is it possible for one css to rule all the different scenarios?

I couldn't determine how to remove the page numbering on the preamble page(s). I can't find good selectors or think my way around what they might be.

danyill commented 3 years ago

Interestingly I thought this might remove the preamble page number but it didn't appear to do anything

@page preamble {
  @bottom-right {
    margin: 0;
    padding: 0;
    border: none;
    content: none;
  }
  @bottom-left {
    margin: 0;
    padding: 0;
    border: none;
    content: none;
  }
}
ggrossetie commented 3 years ago

Here's what I get with my suggested changes:

Case Image Comment
page-break-book-preamble_true-section_true-toc_preamble-title-page-attribute_false image In this case, we might remove the border around the Table Of Contents since it has a dedicated page.
page-break-article-preamble_true-section_true-toc_preamble-title-page-attribute_false image I think the result is as expected 👌🏻
page-break-article-preamble_true-section_true-toc_preamble-title-page-attribute_true image Here, we have a title page on an article (document type), so the title page is on its own page. However, I don't know if the preamble should have its own page? What do you think?

For reference, I get two failing tests:

  1) PDF converter
       Page break
         should break pages accordingly when the document type is book, has preamble, has section and has TOC preamble:

      AssertionError: expected output/page-break-book-preamble_true-section_true-toc_preamble-title-page-attribute_false.pdf to be visually identical to reference/page-break-book-preamble_true-section_true-toc_preamble-title-page-attribute_false.pdf but has 8661 pixels difference
      + expected - actual

      -8661
      +0

      at Context.<anonymous> (test/pdf_test.js:330:34)

  2) PDF converter
       Page break
         should break pages accordingly when the document type is article, has preamble, has section, has TOC preamble and has :title-page: attribute:

      AssertionError: expected output/page-break-article-preamble_true-section_true-toc_preamble-title-page-attribute_true.pdf to be visually identical to reference/page-break-article-preamble_true-section_true-toc_preamble-title-page-attribute_true.pdf but has 11947 pixels difference
      + expected - actual

      -11947
      +0

      at Context.<anonymous> (test/pdf_test.js:330:34)

And here's the diff:

diff --git a/lib/document/document-converter.js b/lib/document/document-converter.js
index 34a252c..66da46d 100644
--- a/lib/document/document-converter.js
+++ b/lib/document/document-converter.js
@@ -303,7 +303,7 @@ ${titleElement}${node.getContent()}
   convert_preamble (node, opts = {}) {
     const doc = node.getDocument()
     let toc
-    if (!this.hasTitlePage(doc) && doc.isAttribute('toc-placement', 'preamble') && doc.hasSections() && doc.hasAttribute('toc')) {
+    if (doc.isAttribute('toc-placement', 'preamble') && doc.hasSections() && doc.hasAttribute('toc')) {
       toc = `<div id="toc" class="${doc.getAttribute('toc-class', 'toc')}">
 <div id="toctitle">${doc.getAttribute('toc-title')}</div>
 ${this.convert_outline(doc, opts)}
@@ -510,7 +510,7 @@ ${faDom.css()}
       const tocPlacement = node.getAttribute('toc-placement', 'auto')
       // Add a toc in the header if the toc placement is auto (default), left or right
       const hasTocPlacementHeader = tocPlacement === 'auto' || tocPlacement === 'left' || tocPlacement === 'right'
-      if (node.hasSections() && node.hasAttribute('toc') && (hasTocPlacementHeader || this.hasTitlePage(node))) {
+      if (node.hasSections() && node.hasAttribute('toc') && hasTocPlacementHeader) {
         return `<div id="toc" class="${node.getAttribute('toc-class', 'toc')}">
 <div id="toctitle">${node.getAttribute('toc-title')}</div>
 ${this.convert_outline(node, opts)}

if (node.hasSections() && node.hasAttribute('toc') && (hasTocPlacementHeader || this.hasTitlePage(node) && tocPlacement !== 'preamble'))

I don't think the condition is correct. You probably need parenthesis because hasTocPlacementHeader and tocPlacement !== 'preamble' are somehow equivalent.

This test failure is caused by generating table of contents twice: once because it's a book once because we generate it again if we have a preamble

With my changes or with your changes?

This means that an element with data-id=toc must be a descendant of an element with data-id=content. This is matched in the preamble case because the toc is occurring in the "normal flow of the document" and is not a separate and prior entry.

I think it makes sense if the Table Of Contents is occurring in the "normal flow of the document" (and not on a dedicated page). I mean, you probably want the Table Of Contents to stand out from the preamble content.

The next issue is the page numbering. I propose we leave the very complicated task of page numbering to the victor of #251 and not to attempt to do numbering prior to the TOC here. However it would be good to remove the numbering from the preamble and start correctly after the TOC. Do you think this would be a minimally acceptable starting point?

Yes, let's address this issue in a follow-up pull-request/issue.

"For reference, Asciidoctor PDF considers that the preamble is not a front matter page (by default, this is configurable via theme)."

I guess that was true until Asciidoctor PDF supports preamble placement where the Table Of Contents is beneath the preamble. The Table Of Contents is definitely a front matter page so if the preamble is before the Table Of Contents, it should also be a front matter page right?

We currently start page numbering using css on #content. When there is a preamble we should start after the preamble in some fashion. If we have doctype = book and toc = preamble then surely the first section is the right place to start (?) However while inspecting the html output I noticed something looking at the page and the following seems to work: I put this in my css but I'm not sure what the general solution is -- do we need to generate different css based on toc attributes? Is it possible for one css to rule all the different scenarios? I couldn't determine how to remove the page numbering on the preamble page(s). I can't find good selectors or think my way around what they might be.

Page numbering is hard! There are also a few conventions:

Please note that this is especially difficult since the HTML structure is a bit over-complicated (for compatibility reasons with AsciiDoc.py). It will surely be easier once the new modern HTML converter is available but until then we will need to get by!

Anyway, let's discuss this issue on https://github.com/Mogztter/asciidoctor-web-pdf/issues/253

danyill commented 3 years ago

Here, we have a title page on an article (document type), so the title page is on its own page. However, I don't know if the preamble should have its own page? What do you think?

I think it should not have its own page. I could imagine (not that I have seen it very often) that one might include a dedication or mention of source of funding in an article and want that to be in the place of a preamble but I think it would be unlikely that the end user would want to devote a whole page to it, that is more :doctype: book.

I don't think the condition is correct. You probably need parenthesis because hasTocPlacementHeader and tocPlacement !== 'preamble' are somehow equivalent.

You are quite right, I am embarrassed :flushed:

I will consider some more.

danyill commented 3 years ago

OK, I have now discovered DEBUG=true npm run test which makes reproductions and comparisons easier.

Here, we have a title page on an article (document type), so the title page is on its own page. However, I don't know if the preamble should have its own page? What do you think?`

I think it should not have its own page. I could imagine (not that I have seen it very often) that one might include a dedication or mention of source of funding in an article and want that to be in the place of a preamble but I think it would be unlikely that the end user would want to devote a whole page to it, that is more :doctype: book.

OTOH once you have decided you want a title page it seems less clear. I still favour not devoting an extra page to it -- I might want my article to be compact and just slap a title page on it and see this as an undesirable side affect. I think "compact" is what :doctype: article means (?).