Open malcb opened 2 years ago
The same parse error occurs when the web page has errors too. This can be invisible errors, that is missing closing tags, corrupt tags, or similar that the browser overcomes so that the page still renders ok. I think the browser must just ignore the error so the text still displays ok, hence the error is invisible, but the parser in save-as-ebook throws out the text so the ebook doesn't match the web page.
I have a work around for this for anyone having similar problems. The extension rewriter allows you to set up rules for rewriting a page and these rules apply to changing the html too. Rewriter seems to affect the the whole page, not just the visible text. Hence rewriter can be set to remove all
I tried to convert a web page and got parse error. I added a html validator to check the web site and that suggested that the problem might be</o:p> tags. These are not standard tags but are added by MS word (typical!). I saved the web page and stripped </o:p> and then tried again with the local file. This time there was no parse error. Hence it looks like the problem is MS, as usual. Perhaps the fix would be to ignore unknown tags rather than throwing an error.