Open josteinaj opened 10 years ago
Input:
<h1 id="h1_4">1 Introduction: Standpoint Theory as a Site of Political, Philosophic, and Scientific Debate</h1>
Storing the input with this:
<d:file method="xhtml" encoding="utf-8" indent="true" version="1.0" media-type="application/xhtml+xml" omit-xml-declaration="false" href="EPUB/DTB09004-05-chapter.xhtml"/>
Stores it like this:
<h1 id="h1_4">1 Introduction: Standpoint Theory as a Site of Political, Philosophic, and Scientific
Debate
</h1>
While storing the input with this:
<d:file method="xhtml" encoding="utf-8" indent="false" version="1.0" media-type="application/xhtml+xml" omit-xml-declaration="false" href="EPUB/DTB09004-05-chapter.xhtml"/>
Stores it like this:
<h1 id="h1_4">1 Introduction: Standpoint Theory as a Site of Political, Philosophic, and Scientific Debate</h1>
This only occurs for long texts, and it doesn't seem to happen only for headlines. I suspect it's got to do with the serialization performed by p:store
in calabash (or one of calabash's dependencies).
I'm failing to see the real issue there: when you set indent="true"
, you essentially leave it to the processor to apply serialization rules. which conform to XSLT and XQuery serialization –note also the more recent 3.0 version which is not yet referenced by XProc.
Indenting XML (or whatever) typically means that you add whitespace characters.
Given that an HTML user agent will strip and collapse whitespace, what's wrong with the use case above ? Do you have a rendering issue ?
Mmm, on further reading of the serialization spec, it says that:
Whitespace MUST NOT be added other than before or after an element, or adjacent to an existing whitespace character.
which would mean there's a bug indeed. AFAIK Calabash is delegating to Saxon's serializer, so it w/b interesting to check with latest versions of these and report the issue if needed.
This is still an issue. I tried removing the custom pretty-printing XSLT in the nordic migrator, but it seems it is still needed.
Thanks for checking. I created an XProcSpec test.
@josteinaj Is it an option for you to not use method="xhtml"?
For HTML I think this result might be correct because spaces at the end of blocks are not rendered. If this also happens with inline elements, there is a problem though.
EDIT: OK I tried with this example:
<h1><span>1 Introduction: Standpoint Theory as a Site of Political, Philosophic, and Scientific Debate</span>.</h1>
It results in:
<h1>
<span>1 Introduction: Standpoint Theory as a Site of Political, Philosophic, and Scientific
Debate
</span>.
</h1>
With method="xml"
(and media-type="application/xhtml+xml"
) we get:
<h1>
<span>1 Introduction: Standpoint Theory as a Site of Political, Philosophic, and Scientific Debate</span>.</h1>
We need to use method=xhtml because not all HTML tags are self closing. With method=xml we'd end up with
<div epub:type="pagebreak" title="1"/>
instead of
<div epub:type="pagebreak" title="1"></div>
OK I see. So this is an issue of being compatible with HTML-only readers?
There's not much we can do about this apart from filing a bug report with Saxon. The issue doesn't appear to be listed in the change log, but let's try with Saxon 10 first.
See nlbdev/nordic-epub3-dtbook-migrator#94