ggrossetie / asciidoctor-web-pdf

Convert AsciiDoc documents to PDF using web technologies
https://asciidoctor.org
MIT License
443 stars 91 forks source link

PDF output is unreliable #635

Closed siaccarino closed 2 years ago

siaccarino commented 2 years ago

I get a warning "Unable to find destination...." for a chapter that definitely don't contain umlauts.

The adoc contains:

=== Communication strategy

blah...

=== Problem analysis

blah...

=== Alert notification strategy

The outline dictionary generated in https://github.com/Mogztter/asciidoctor-web-pdf/blob/3e1bb2ecf1b781f06bf555556b0935496283ac15/lib/outline.js#L76 looks like

PDFName { encodedName: '/communication_strategy' } => PDFArray { array: [Array], context: [PDFContext] },
PDFName { encodedName: '/_alert_notification_strategy' } => PDFArray { array: [Array], context: [PDFContext] },

The HTML looks like:

<li class="toc-entry"><a href="#communication_strategy">5.3. Communication strategy</a></li>
<li class="toc-entry"><a href="#problem_analysis">5.4. Problem analysis</a></li>
<li class="toc-entry"><a href="#_alert_notification_strategy">5.5. Alert notification strategy</a></li>

After PDF inspection it looks like the PDF file generated by puppeteer is wrong Analysis finished: it's Paged.js which messes up the PDF output - it generates a page break right after the communication_strategy header, the following paragraph is vanished from DOM:

PDF copy & paste 5.3. Communication strategy ------ Page 12 of 18 5.4. Problem analysis

blah...

siaccarino commented 2 years ago

Since asciidoctor-web-pdf relies pretty much on Paged.js it inherits it's reliability. Vanishing content during pagination is the opposite of reliable. I'll check why Paged.js runs amok

ggrossetie commented 2 years ago

The umlaut error message should be removed and replaced, see: https://github.com/Mogztter/asciidoctor-web-pdf/issues/267#issuecomment-980544430 It might be possible to intercept the message Unable to layout item: ... sent by Paged.js but that would probably be more brittle.

siaccarino commented 2 years ago

This is one indicator for lost content, maybe there is a loss of data without warning, I want to check what Paged.js is doing there - there is no problematic HTML code near to the lost content.

siaccarino commented 2 years ago

Debug analysis shows that root cause of lost content is 100% pagejs - the lost content is still in DOM but it is invisible. I'll fill bug report there.

There is not much asciidoc-web-pdf can do here but to write a warning:

siaccarino commented 2 years ago

hiddenparagraph Content is not vanished but hidden

siaccarino commented 2 years ago

Seems to be a known issue: https://gitlab.coko.foundation/pagedjs/pagedjs/-/issues/357

siaccarino commented 2 years ago

"Unable to layout item" seems to be the best solution to me

siaccarino commented 2 years ago

Added work-arround to the pagedjs issue