Closed toshiya44 closed 8 years ago
As far as i can tell baka-tsuki literally does <br>
which is not allowed in xhtml.
Not including any special cases.
Welp
// Replace elements that are invalid in xhtml with their valid counter parts
BakaTsukiParser.prototype.replaceInvalidElements = function(element) {
// replace br tags
let brs = util.getElements(element, "br");
for(let br of brs) {
let newBr = document.createElement("br");
br.parentNode.replaceChild(newBr, br);
}
}
Still gives bogus error about <br />
being incorrect
@dteviot i believe a doctype tag is in order.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
Although doing this means we will have to import nodes before appending them
@belldandu & @toshiya44
My understanding is that the only problem this is giving at the moment is epubcheck is reporting this as a violation. Readers are not having a problem with it. As such, I'm regarding it as a "trivial" problem and I'm ignoring it until more critical issues are fixed. See: my most recent post to https://baka-tsuki.org/forums/viewtopic.php?f=84&t=16114&start=45#p271810
If my understanding is incorrect, please let me know.
On Fri, Jul 1, 2016 at 6:39 AM, Belldandu notifications@github.com wrote:
Although doing this means we will have to import nodes before appending them
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dteviot/WebToEpub/issues/34#issuecomment-229750557, or mute the thread https://github.com/notifications/unsubscribe/AE6w2XlWrpMUkw5QDQp86Jgxvc3f5KGIks5qRA1lgaJpZM4JB7GM .
@dteviot well actually i already figured out a way to fix the br tags without breaking dom minipulation (its a bit hacky like the xml declaration thing)
Main problem for what i was doing before (removing them) is that it completely removed some line breaks that where needed in some places.
Now what i do is i replace the br with the proper br tag and then append a docType that accepts the new br tag to the xhtml immediately after the xml declaration.
This way it doesn't affect our dom manipulation and epub readers / epubcheck wont go nuts about the tag not being allowed where its at.
I will push the pull request in a bit.
I thought I was already adding that doc type tag. But maybe I was doing it at end, just before writing to the epub zip. In which case, need to do it when create initial xhtml file to fill in util. Then chrome should automatically close the tag.
On Fri, Jul 1, 2016 at 8:07 AM, Belldandu notifications@github.com wrote:
@dteviot https://github.com/dteviot well actually i already figured out a way to fix the br tags without breaking dom minipulation (its a bit hacky like the xml declaration thing)
Main problem for what i was doing (removing them) is that it completely removed some line breaks that where needed in some places.
Now what i do is i replace the br with the proper tag and then append a docType that accepts this br tag to the xhtml immediatly after the xml declaration.
This way it doesn't affect our dom manipulation and epub readers / epubcheck wont go nuts about the tag not being allowed where its at.
I will push the pull request in a bit.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dteviot/WebToEpub/issues/34#issuecomment-229773050, or mute the thread https://github.com/notifications/unsubscribe/AE6w2TnlulYZ13xJBiEAcBsG4Sx-ym4Yks5qRCHwgaJpZM4JB7GM .
You create a document however none of the xhtml files have an actual doctype. There are a couple of other things that go along with this. Almost done with PR with push in 5 minutes @dteviot
@dteviot @toshiya44 see #35
@belldandu BakaTsukiParser.replaceInvalidElements() which is supposed to close <br> tags does not appear to do anything useful.
See this test
test("replaceInvalidElements", function (assert) {
let dom = new DOMParser().parseFromString(
"<html>" +
"<head><title></title></head>" +
"<body>" +
"<p>SomeText</p>" +
"<br>" +
"<p>More</p>" +
"</body>" +
"</html>",
"text/html");
let parser = new BakaTsukiParser();
let content = dom.body.cloneNode(true);
parser.replaceInvalidElements(content);
// this assert fails.
assert.equal(content.innerHTML, "<p>SomeText</p><br /><p>More</p>");
});
I think Chrome automatically closes the <br> tag when converting nodes from html to xhtml
See this test
test("replaceInvalidElements", function (assert) {
let dom = new DOMParser().parseFromString(
"<html>" +
"<head><title></title></head>" +
"<body>" +
"<p>SomeText</p>" +
"<br>" +
"<p>More</p>" +
"</body>" +
"</html>",
"text/html");
let parser = new BakaTsukiParser();
let content = dom.body.cloneNode(true);
let xhtml = util.createEmptyXhtmlDoc();
let body = xhtml.getElementsByTagName("body")[0];
body.parentNode.replaceChild(content, body);
assert.equal(xhtml.getElementsByTagName("body")[0].outerHTML,
"<body xmlns=\"http://www.w3.org/1999/xhtml\"><p>SomeText</p><br /><p>More</p></body>");
});
I also checked a Baka-Tsuki story with <br> tags and confirmed they are being closed in the generator's output when replaceInvalidElements() has been removed. I think we need to look more carefully at what epubcheck is complaining about. Possibly all that's needed is the doctype element.
@dteviot weird and ok.
@dteviot @toshiya44 can this be closed?
@belldandu @toshiya44
can this be closed?
Only if epubcheck is now happy with the <br /> tags we're now using. If not, that needs to be fixed. Note, at current time I have not set up epubcheck to run on my system.
@dteviot will check tommorow
@dteviot
[kami@Index ~]$ epubcheck Toaru_Maju...dexVolume1.epub
Validating using EPUB version 2.0.1 rules.
ERROR(RSC-005): Toaru_Maju...dexVolume1.epub/OEBPS/Text/0006_Part_5.xhtml(66,7): Error while parsing file 'element "br" not allowed here; expected the element end-tag or element "address", "blockquote", "del", "div", "dl", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "ins", "noscript", "ns:svg", "ol", "p", "pre", "script", "table" or "ul" (with xmlns:ns="http://www.w3.org/2000/svg")'.
ERROR(RSC-005): Toaru_Maju...dexVolume1.epub/OEBPS/Text/0006_Part_5.xhtml(66,115): Error while parsing file 'element "br" not allowed here; expected the element end-tag or element "address", "blockquote", "del", "div", "dl", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "ins", "noscript", "ns:svg", "ol", "p", "pre", "script", "table" or "ul" (with xmlns:ns="http://www.w3.org/2000/svg")'.
ERROR(RSC-005): Toaru_Maju...dexVolume1.epub/OEBPS/Text/0008_Part_7.xhtml(13,7): Error while parsing file 'element "br" not allowed here; expected the element end-tag or element "address", "blockquote", "del", "div", "dl", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "ins", "noscript", "ns:svg", "ol", "p", "pre", "script", "table" or "ul" (with xmlns:ns="http://www.w3.org/2000/svg")'.
ERROR(RSC-005): Toaru_Maju...dexVolume1.epub/OEBPS/Text/0008_Part_7.xhtml(13,115): Error while parsing file 'element "br" not allowed here; expected the element end-tag or element "address", "blockquote", "del", "div", "dl", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "ins", "noscript", "ns:svg", "ol", "p", "pre", "script", "table" or "ul" (with xmlns:ns="http://www.w3.org/2000/svg")'.
ERROR(RSC-005): Toaru_Maju...dexVolume1.epub/OEBPS/Text/0008_Part_7.xhtml(72,7): Error while parsing file 'element "br" not allowed here; expected the element end-tag or element "address", "blockquote", "del", "div", "dl", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "ins", "noscript", "ns:svg", "ol", "p", "pre", "script", "table" or "ul" (with xmlns:ns="http://www.w3.org/2000/svg")'.
ERROR(RSC-005): Toaru_Maju...dexVolume1.epub/OEBPS/Text/0008_Part_7.xhtml(72,115): Error while parsing file 'element "br" not allowed here; expected the element end-tag or element "address", "blockquote", "del", "div", "dl", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "ins", "noscript", "ns:svg", "ol", "p", "pre", "script", "table" or "ul" (with xmlns:ns="http://www.w3.org/2000/svg")'.
ERROR(RSC-005): Toaru_Maju...dexVolume1.epub/OEBPS/Text/0009_Chapter_2_...e_7th-Egde.xhtml(59,7): Error while parsing file 'element "br" not allowed here; expected the element end-tag or element "address", "blockquote", "del", "div", "dl", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "ins", "noscript", "ns:svg", "ol", "p", "pre", "script", "table" or "ul" (with xmlns:ns="http://www.w3.org/2000/svg")'.
ERROR(RSC-005): Toaru_Maju...dexVolume1.epub/OEBPS/Text/0009_Chapter_2_...e_7th-Egde.xhtml(59,115): Error while parsing file 'element "br" not allowed here; expected the element end-tag or element "address", "blockquote", "del", "div", "dl", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "ins", "noscript", "ns:svg", "ol", "p", "pre", "script", "table" or "ul" (with xmlns:ns="http://www.w3.org/2000/svg")'.
ERROR(RSC-005): Toaru_Maju...dexVolume1.epub/OEBPS/Text/0009_Chapter_2_...e_7th-Egde.xhtml(134,7): Error while parsing file 'element "br" not allowed here; expected the element end-tag or element "address", "blockquote", "del", "div", "dl", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "ins", "noscript", "ns:svg", "ol", "p", "pre", "script", "table" or "ul" (with xmlns:ns="http://www.w3.org/2000/svg")'.
ERROR(RSC-005): Toaru_Maju...dexVolume1.epub/OEBPS/Text/0009_Chapter_2_...e_7th-Egde.xhtml(134,115): Error while parsing file 'element "br" not allowed here; expected the element end-tag or element "address", "blockquote", "del", "div", "dl", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "ins", "noscript", "ns:svg", "ol", "p", "pre", "script", "table" or "ul" (with xmlns:ns="http://www.w3.org/2000/svg")'.
ERROR(RSC-005): Toaru_Maju...dexVolume1.epub/OEBPS/Text/0010_Part_2.xhtml(173,7): Error while parsing file 'element "br" not allowed here; expected the element end-tag or element "address", "blockquote", "del", "div", "dl", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "ins", "noscript", "ns:svg", "ol", "p", "pre", "script", "table" or "ul" (with xmlns:ns="http://www.w3.org/2000/svg")'.
ERROR(RSC-005): Toaru_Maju...dexVolume1.epub/OEBPS/Text/0010_Part_2.xhtml(173,115): Error while parsing file 'element "br" not allowed here; expected the element end-tag or element "address", "blockquote", "del", "div", "dl", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "ins", "noscript", "ns:svg", "ol", "p", "pre", "script", "table" or "ul" (with xmlns:ns="http://www.w3.org/2000/svg")'.
ERROR(RSC-005): Toaru_Maju...dexVolume1.epub/OEBPS/Text/0018_Epilogue_T...ohibitorum.xhtml(35,7): Error while parsing file 'element "br" not allowed here; expected the element end-tag or element "address", "blockquote", "del", "div", "dl", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "ins", "noscript", "ns:svg", "ol", "p", "pre", "script", "table" or "ul" (with xmlns:ns="http://www.w3.org/2000/svg")'.
ERROR(RSC-005): Toaru_Maju...dexVolume1.epub/OEBPS/Text/0018_Epilogue_T...ohibitorum.xhtml(35,115): Error while parsing file 'element "br" not allowed here; expected the element end-tag or element "address", "blockquote", "del", "div", "dl", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "ins", "noscript", "ns:svg", "ol", "p", "pre", "script", "table" or "ul" (with xmlns:ns="http://www.w3.org/2000/svg")'.
ERROR(RSC-005): Toaru_Maju...dexVolume1.epub/OEBPS/Text/0018_Epilogue_T...ohibitorum.xhtml(108,7): Error while parsing file 'element "br" not allowed here; expected the element end-tag or element "address", "blockquote", "del", "div", "dl", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "ins", "noscript", "ns:svg", "ol", "p", "pre", "script", "table" or "ul" (with xmlns:ns="http://www.w3.org/2000/svg")'.
ERROR(RSC-005): Toaru_Maju...dexVolume1.epub/OEBPS/Text/0018_Epilogue_T...ohibitorum.xhtml(108,115): Error while parsing file 'element "br" not allowed here; expected the element end-tag or element "address", "blockquote", "del", "div", "dl", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "ins", "noscript", "ns:svg", "ol", "p", "pre", "script", "table" or "ul" (with xmlns:ns="http://www.w3.org/2000/svg")'.
ERROR(RSC-005): Toaru_Maju...dexVolume1.epub/OEBPS/Text/0018_Epilogue_T...ohibitorum.xhtml(112,7): Error while parsing file 'element "br" not allowed here; expected the element end-tag or element "address", "blockquote", "del", "div", "dl", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "ins", "noscript", "ns:svg", "ol", "p", "pre", "script", "table" or "ul" (with xmlns:ns="http://www.w3.org/2000/svg")'.
ERROR(RSC-005): Toaru_Maju...dexVolume1.epub/OEBPS/Text/0018_Epilogue_T...ohibitorum.xhtml(112,115): Error while parsing file 'element "br" not allowed here; expected the element end-tag or element "address", "blockquote", "del", "div", "dl", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "ins", "noscript", "ns:svg", "ol", "p", "pre", "script", "table" or "ul" (with xmlns:ns="http://www.w3.org/2000/svg")'.
Check finished with errors
epubcheck completed
Welp
Was extension running under Chrome or Firefox? As noted in https://github.com/dteviot/WebToEpub/issues/49 I suspect Chrome closes <br> elements for us, but Firefox does not.
It was run under chromium @dteviot
@belldandu Can you send me some of the offending files? e.g. 0006_Part_5.xhtml and 0008_Part_7.xhtml, I'd like to take a closer look at exactly where/what the problem is.
I know exactly what the problem is. XD the change i made before where i just replaced the BR tags had actually fixed that issue. @dteviot The problem is inconsistencies with the BR tags on baka-tsuki.
I want to confirm the problem that epubcheck is reporting is <br> elements that are not closed. And not something else.
I've seen Chrome closing the tags, but if Chromium is not closing them, then it's possible that Chrome doesn't either, at least in some cases.
Deleting and re-creating the elements may result in closed tags, if it's creating XTHML elements, not HTML elements.
I suspect the correct way to solve the problem is using importNode() https://developer.mozilla.org/en-US/docs/Web/API/Document/importNode when we copy the "content" from the Baka-Tsuki HTML to the EPUB's XHTML. example: http://stackoverflow.com/questions/12092532/how-to-convert-html-to-valid-xhtml. Note, I also need to create the empty XHTML document correctly, using createElementNS(), and it's family.
Note, I have not yet tested this to confirm. At moment, it's just a working hypothesis.
I believe this issue is now fixed too. The plugin in sonako branch closes br
tags properly.
I believe I now understand where the <br> error is coming from. epubcheck wants the <br> enclosed in a <p> tag. e.g. This XHTML fragment
<body>
<p>Some Text</p>
<br />
<p>More Text</p>
</body>
gives the error message
Error while parsing file 'element "br" not allowed here; expected the element end-tag or element "address", "blockquote", "del", "div", "dl", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "ins", "noscript", "ns:svg", "ol", "p", "pre", "script", "table" or "ul" (with xmlns:ns="http://www.w3.org/2000/svg")'.
This XHTML fragment does not
<body>
<p>Some Text</p>
<p><br /></p>
<p>More Text</p>
</body>
I've logged the issue with epubcheck https://github.com/IDPF/epubcheck/issues/700, and I'll see what they come back with. Note, they have had a similar issue in the past. https://github.com/IDPF/epubcheck/issues/209
@toshiya44, @belldandu Well, I got a response from the IDPF team
If you're creating an EPUB 2, then that's correct. The br element isn't allowed as a child of the body in XHTML 1.1. It's not an error in HTML5, and I can't reproduce in a conforming EPUB 3 publication.
I'm going to class this as a "won't fix". The behaviour is technically wrong, but:
@dteviot Agreed.
If i have time when i get back in 6 months and all the other issues have been fixed i might take a crack at it. But for now i agree with the "wontfix".
In BakaTsukiParser.js it says,
// discard br tags as epubcheck says they are invalid in the places they are at in xhtml util.removeElements(util.getElements(element, "br"));
br tags are not invalid, they only need to be closed properly in xhtml, like
<br/>
In Baka-Tsuki pages br tags show up as<br>
, that's why epubcheck didn't like it. Can you please replace all<br>
by<br/>
instead of removing it?