Open dteviot opened 7 years ago
Yep. They're supposed to be just <p>
nodes. Apparently MS word does that when a doc is converted to html.
https://stackoverflow.com/questions/7808968/what-do-op-elements-do-anyway
As for the <o:p></o:p>
pairs that appear inside paragraphs, they can be safely deleted.
@typhoon71, @toshiya44, @dreamer2908
Latest commit to Experimental Tab Branch should now generate EPUB v3 files if you check the "Create EPUB 3" advanced option. I hope this will solve the problem with sometimes not being able to convert HTML into valid XHTML. (EPUB 3 uses HTML 5 instead of XHTML.) This requires your EPUB reader to support EPUB 3, but hopefully by now most do. Please try it out and let me know how well the EPUB 3 works with your readers. Thanks.
To be honest I'm not very knowledgeable about EPUB3. I'm using this Sigil plugin, which is supposed to contain EpubCheck 4.0.2, in order to check errors.
Calibre officially doesn't endorse EPUB3 due to various reasons, so the editor part of Calibre is not very specialized for this. Please see this thread. However, the viewer has no issues with rendering EPUB3 (this was also discussed in that thread).
Errors: In the OPF, the image is listed as "image/jpeg"
even though it's a has a PNG extension (apparently the image is actually a bmp, no clue what's going on).
There's also an error notice for the "Form Feed" (value 0x0c) character that you mentioned in the issue.
URL: https://skythewood.blogspot.ca/2017/07/F15.html
Error: Epubcheck complains about having a name
attribute ( <a name="more"></a>
).
By the way, According to this stackoverflow thread, there doesn't seem to be any difference between name
and id
attribute in the context of ePub. So wouldn't it be fine to replace name
with id
? I've seen name
attributes used as id
s in wordpress sites as well.
@toshiya44 Firstly, thanks very much for the prompt response.
Errors: In the OPF, the image is listed as "image/jpeg" even though it's a has a PNG extension (apparently the image is actually a bmp, no clue what's going on).
I assume you're referring to: https://cdn.royalroadl.com/mooderino/6edd9796-3cb7-434c-a5ad-bb7dece2967a.png. It's listed as "image/jpeg" in the OPF is because when the file is fetched from the web server, the "content-type" in the HTTP response was "image/jpeg", so that's what went into the OPF file. Images don't always have extensions, so I was relying on the content-type. That said, this doesn't seem to cause any problems with the reader.
There's also an error notice for the "Form Feed" (value 0x0c) character that you mentioned in the issue.
Um. yes, I haven't fixed the warnings yet. That said, with EPUB viewer, it would not show the chapter due to the Form Feed character. Now it shows the HTML without a problem.
There are a number of HTML pages that are not correctly converted to XHTML. e.g.
It's also annoying that user is only aware of problem when the EPUB reader faults on the page. Fix probably needs to include following