docbook / xslTNG

DocBook xslTNG Stylesheets
https://xsltng.docbook.org
MIT License
43 stars 22 forks source link

Validation anomalies in xslt output #276

Closed g-vidal closed 1 year ago

g-vidal commented 1 year ago

We have tested XHTML and epub transformation with this minimal docbook We have noticed some anomalies in the validation meanwhile the original docbook validates the schema :

There is a dicrepancy in the transformation behavior if <table> or <figure> tags are placed inside or between <para> blocks. Both positions are valid in the schema but the transformation provides a valid xhtml only if <table> or <figure> are between 2 <para>. Would it be possible to modify either the schema or the xslt to fully accomodate one or both solutions.

ndw commented 1 year ago

Tables and figure are allowed in paragraphs as well as between them because the question of whether or not paragraphs can contain such things is an editorial one, not a technical one. You will find authors and editors who hold very strongly to the opinion that that this paragraph contains a table:

<para xml:id="p1">Consider the following table:
<informaltable>...</informaltable>
It very clearly shows that...</para>

They would argue that neither the paragraph nor the table can stand alone, the table is an intrinsic part of the content of the paragraph, and critically, if you were to select the paragraph p1 for reuse in another document, it must contain the table and the prose that follows it.

The fact that HTML will, by design, only validate this markup:

<p id="p1">Consider the following table:</p>
<table>...</table>
<p>It very clearly shows that...</p>

Is, some would argue, a design failing in HTML.

I am reluctant to attempt to transform paragraphs containing tables and figures into multiple paragraphs with interstitial figures and tables. While it's true that generating:

<p id="p1">Consider the following table:
<table>...</table>
It very clearly shows...</p>

means that the output won't validate against a prescriptive grammar for (X)HTML, it is equally true that the result matches the author's intent and if you had a system that allowed composition of the HTML, you'd want p1 to return the prose and the table.

I will investigate the possibility of adding an option to "unwrap" blocks in paragraphs, but the most straightforward answer is probably to author with a customization layer that doesn't allow block content in paragraphs. The customization layer is very simple:

default namespace db = "http://docbook.org/ns/docbook"

include "docbook.rnc" {
  db.para =
    element para {
      db.simpara.attlist, db.simpara.info, db.all.inlines*
    }
}

That's essentially why simpara exists.

ndw commented 1 year ago

I think these issues have all been addressed. Please let me know if you think I'm mistaken.