dteviot / WebToEpub

A simple Chrome (and Firefox) Extension that converts Web Novels (and other web pages) into an EPUB.
Other
709 stars 134 forks source link

Google Play Books - Processing Failed #390

Closed seadesert closed 4 years ago

seadesert commented 4 years ago

Get a lot of "Processing Failed" on some of the epubs when I upload it on Google Play Books, but works fine when I generate .epub from EpubPress of the same website instead.

Dunno why Google Play Books does that but tried to solve it manually, I tried using

I think it got something to do with Cover Pages and Images as sometimes clearing the Cover Image or Skip Images would do the trick and solve the issue, or Google Play Books is just too sensitive to errors in the EPUB.

Thanks, appreciate your work!

dteviot commented 4 years ago

@ArunSriKrishna Give me

  1. An Epub that fails (I think you can just drag it onto this issue.)
  2. Link with instructions on how to upload to Google Play books and I'll see what I can do. (May take a while, I have other tasks in my ToDo queue.)
dteviot commented 4 years ago

@ArunSriKrishna Additional, if the EPUB passes http://validator.idpf.org/, then the problem is not with the EPUB, but with Google Play Books. In which case, you need to raise a support issue with them. Basically, go here: https://goo.gle/2QyJZSb Give them the failing Epub, and tell them

  1. It passes validator,
  2. You get a "Processing Failed" when you do the following steps.
seadesert commented 4 years ago

@ArunSriKrishna Give me

  1. An Epub that fails (I think you can just drag it onto this issue.)
  2. Link with instructions on how to upload to Google Play books and I'll see what I can do. (May take a while, I have other tasks in my ToDo queue.)
  1. I'll try to collect and add the different types of epub, where I get that Processing Error. Here's some, ill add more later.

    Sample 1: M_E_M_O_R_I_Z_E.zip

    Sample 2: Death_Mage.zip

  2. https://www.lifewire.com/upload-ebooks-to-google-1616145

seadesert commented 4 years ago

@ArunSriKrishna Additional, if the EPUB passes http://validator.idpf.org/, then the problem is not with the EPUB, but with Google Play Books. In which case, you need to raise a support issue with them. Basically, go here: https://goo.gle/2QyJZSb Give them the failing Epub, and tell them

  1. It passes validator,
  2. You get a "Processing Failed" when you do the following steps.

Yeah, I have considered that its Google Play Books at fault, but seeing it works fine when using EpubPress to generate the epub instead on the same site, I thought of raising the issue.

dteviot commented 4 years ago

@ArunSriKrishna I ran the MEMORIZE epub though epubcheck (https://github.com/w3c/epubcheck) and got this set of errors

WARNING(HTM-025): E:\unpack\M_E_M_O_R_I_Z_E.epub/OEBPS/Text/0000_Information.xhtml(16,232): Non-registered URI scheme type found in href. WARNING(HTM-025): E:\unpack\M_E_M_O_R_I_Z_E.epub/OEBPS/Text/0000_Information.xhtml(23,261): Non-registered URI scheme type found in href. WARNING(HTM-025): E:\unpack\M_E_M_O_R_I_Z_E.epub/OEBPS/Text/0000_Information.xhtml(23,377): Non-registered URI scheme type found in href. WARNING(HTM-025): E:\unpack\M_E_M_O_R_I_Z_E.epub/OEBPS/Text/0000_Information.xhtml(23,494): Non-registered URI scheme type found in href. WARNING(HTM-025): E:\unpack\M_E_M_O_R_I_Z_E.epub/OEBPS/Text/0000_Information.xhtml(23,613): Non-registered URI scheme type found in href. WARNING(HTM-025): E:\unpack\M_E_M_O_R_I_Z_E.epub/OEBPS/Text/0000_Information.xhtml(23,729): Non-registered URI scheme type found in href. WARNING(HTM-025): E:\unpack\M_E_M_O_R_I_Z_E.epub/OEBPS/Text/0000_Information.xhtml(23,851): Non-registered URI scheme type found in href. WARNING(HTM-025): E:\unpack\M_E_M_O_R_I_Z_E.epub/OEBPS/Text/0000_Information.xhtml(23,968): Non-registered URI scheme type found in href. WARNING(HTM-025): E:\unpack\M_E_M_O_R_I_Z_E.epub/OEBPS/Text/0000_Information.xhtml(23,1084): Non-registered URI scheme type found in href. FATAL(RSC-016): E:\unpack\M_E_M_O_R_I_Z_E.epub/OEBPS/Text/0043_43_00043_Preparation_for_Emergency.xhtml(1,21667): Fatal Error while parsing file 'Element type "sunken" must be followed by either attribute specifications, ">" or "/>".'. ERROR(RSC-005): E:\unpack\M_E_M_O_R_I_Z_E.epub/OEBPS/Text/0043_43_00043_Preparation_for_Emergency.xhtml(-1,-1): Error while parsing file 'Element type "sunken" must be followed by either attribute specifications, ">" or "/>".'. ERROR(RSC-005): E:\unpack\M_E_M_O_R_I_Z_E.epub/OEBPS/Text/0264_264_From_00264_boss.xhtml(1,24073): Error while parsing file 'element "system" not allowed anywhere; expected the element end-tag, text or element "a", "abbr", "acronym", "applet", "b", "bdo", "big", "br", "cite", "code", "del", "dfn", "em", "i", "iframe", "img", "ins", "kbd", "map", "noscript", "ns:svg", "object", "q", "samp", "script", "small", "span", "strong", "sub", "sup", "tt" or "var" (with xmlns:ns="http://www.w3.org/2000/svg")'. ERROR(RSC-005): E:\unpack\M_E_M_O_R_I_Z_E.epub/OEBPS/toc.ncx(1,123): Error while parsing file 'value of attribute "xml:lang" is invalid; must be an RFC 3066 language identifier'.

The errors fall into four groups:

  1. It doesn't like the hyperlinks with href's starting with "chrome-extension" in 0000_Information.xhtml
  2. Next is file 0043_43_00043_Preparation_for_Emergency.xhtml, Is complaining about <sunken[deep-set] eyes="">. Usually means the original source used angle brackets and wben scraped, site didn't correctly escape them. This is probably what broke importer (Note, this is listed as a FATAL error.)
  3. Similar issue with file 0264_264_From_00264_boss.xhtml
  4. Final issue, language code in toc.ncx is ""en_US", for some reason epub wants this to be "en"

Note, errors 2 and 3 are very difficult to fix automatically.

Anyway, I manually fixed the issues, and was able to upload the resulting epub to Google Play. M_E_M_O_R_I_Z_E.zip

dteviot commented 4 years ago

@ArunSriKrishna You could prove the problem by starting with the faulty epub, then replace the faulty files with the good ones one at a time and see when Google play accepts the file. My guess, after replacing 2 it might work, and after 3, it will probably work. (Proving the errors it doesn't like.) I'd do it, but it's getting too late and I need to go sleep.

seadesert commented 4 years ago

I'll try that, Thanks!

dteviot commented 4 years ago

@ArunSriKrishna Looking at Death Mage, there appear to be a large number of <span> elements that have a class of "ezoic-ad" with attributes that are not valid in an epub.

I removed these elements. (I've got a cleaner tool) and also fixed the xml:lang warning. Re-ran epubCheck and confirmed there were no other errors.

Then tried uploading to Google Play, which ultimately failed with "Processing Failed". Then I noticed, there's a webp image file, and webp isn't valid for EPUB images. WebToEpub should have warned you about this when you created the Epub.

So, I removed, the Webp image (and matching entry in content.opf), and it uploaded to Google Play.

Then went back to original epub you sent me, removed jst the Webp image and tried loading again. And it succeeded. (Working file attached.) Death_Mage.zip

In addition, the Fatal error for MEMORIZE is one that WebToEpub is aware of. I tried converting that page into an EPUB and WebToEpub gave a warning about the page. I assume you forgot about it.

However, you can load that EPUB into Calibre and have it fixed. (I think) Steps

  1. After loading file into Calibre
  2. Right click on epub and select "Edit"
  3. On new window, select "Tools" then "Check book"
  4. When it reports errors, click on "Try to correct all fixable errors automatically"
  5. If all errors are fixed, select "File" then "Save a copy"
dteviot commented 4 years ago

@ArunSriKrishna Additional, I've just tried replacing only the file with the FATAL error (0043_43_00043_Preparation_for_Emergency.xhtm) in MEMORIZE.epub and uploading that to Google Play. Result was, it uploaded fine. So, that;s the only thing Google Play had a problem with.

seadesert commented 4 years ago

Yeah, I saw the warning, as I mentioned before, I fixed it before by removing the cover in WebToEpub, I just sent it as a sample. Really, Google Play Books is too inflexible.

Should I continue to send epubs where I get the processing failed? (might take some time)

Looks like it would be better to just manually fix the epubs as you have shown above and upload to Google Play Books when I get that Processing Error, as ones I have sent seems to be specific cases and related to the Source.

Anyways, thanks a lot!

dteviot commented 4 years ago

@ArunSriKrishna

I fixed it before by removing the cover in WebToEpub,

I think just removing the cover might not be enough. You need to make sure there are no images in webp format in the epub. Probably the better solution is convert the webp images to jpeg and replace them in the epub. https://www.howtogeek.com/325864/how-to-save-googles-webp-images-as-jpeg-or-png/ (I suggest using the instructions for MS Paint.)

Should I continue to send epubs where I get the processing failed?

If you've replaced any webp images, and fixed all errors reported by Calibre then feel free to send it to me to have a look. I'm going to close this incident as resolved (for the time being). If you find anything, you can still add it to this issue, and I'll reopen.

dteviot commented 4 years ago

@ArunSriKrishna I've modified a program I have that may be of some use to you. It should

To use

  1. Download zip with program from https://drive.google.com/file/d/1kZjoj1OpBIzY4UhtKv2GIJLLVodeSkQF/view?usp=sharing
  2. Unzip file
  3. Run "MergeWebToEpub"
  4. On menu, click "File" -> "Open"
  5. Select Epub to check
  6. You'll get a popup message if there's a problem with the epub.
  7. To fix webp images, click on "Edit" then "Convert Webp to Jpeg"
  8. Click on "File" -> "Save"

Note, this code is still very much a prototype, so please let me know how it works.

seadesert commented 4 years ago

I get this exception while converting .webp to .jpeg Test file: Isaac.zip

See the end of this message for details on invoking 
just-in-time (JIT) debugging instead of this dialog box.

************** Exception Text **************
System.Collections.Generic.KeyNotFoundException: The given key was not present in the dictionary.
   at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
   at MergeWebToEpub.EpubUtils.FixupUrl(String uri, String itemPath, Dictionary`2 newAbsolutePaths)
   at MergeWebToEpub.EpubUtils.FixupReferences(XDocument doc, XName element, XName attributeName, String itemPath, Dictionary`2 newAbsolutePaths)
   at MergeWebToEpub.EpubUtils.FixupReferences(XDocument doc, String itemPath, Dictionary`2 newAbsolutePaths)
   at MergeWebToEpub.EpubUtils.UpdateXhtmlPage(EpubItem item, Dictionary`2 newAbsolutePaths)
   at MergeWebToEpub.EpubUtils.ConvertWebpImagesToJpeg(Epub epub)
   at MergeWebToEpub.Form1.changeWebpToJpegToolStripMenuItem_Click(Object sender, EventArgs e)
   at System.Windows.Forms.ToolStripItem.RaiseEvent(Object key, EventArgs e)
   at System.Windows.Forms.ToolStripMenuItem.OnClick(EventArgs e)
   at System.Windows.Forms.ToolStripItem.HandleClick(EventArgs e)
   at System.Windows.Forms.ToolStripItem.HandleMouseUp(MouseEventArgs e)
   at System.Windows.Forms.ToolStripItem.FireEventInteractive(EventArgs e, ToolStripItemEventType met)
   at System.Windows.Forms.ToolStripItem.FireEvent(EventArgs e, ToolStripItemEventType met)
   at System.Windows.Forms.ToolStrip.OnMouseUp(MouseEventArgs mea)
   at System.Windows.Forms.ToolStripDropDown.OnMouseUp(MouseEventArgs mea)
   at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks)
   at System.Windows.Forms.Control.WndProc(Message& m)
   at System.Windows.Forms.ScrollableControl.WndProc(Message& m)
   at System.Windows.Forms.ToolStrip.WndProc(Message& m)
   at System.Windows.Forms.ToolStripDropDown.WndProc(Message& m)
   at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
   at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
   at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)

************** Loaded Assemblies **************
mscorlib
    Assembly Version: 4.0.0.0
    Win32 Version: 4.8.4250.0 built by: NET48REL1LAST_C
    CodeBase: file:///C:/Windows/Microsoft.NET/Framework/v4.0.30319/mscorlib.dll
----------------------------------------
MergeWebToEpub
    Assembly Version: 1.0.0.0
    Win32 Version: 1.0.0.0
    CodeBase: file:///C:/Users/Arun%20Krishna/Downloads/MergeWebToEpub.2020-10-08%20(1)/MergeWebToEpub.exe
----------------------------------------
System.Windows.Forms
    Assembly Version: 4.0.0.0
    Win32 Version: 4.8.4250.0 built by: NET48REL1LAST_C
    CodeBase: file:///C:/Windows/Microsoft.Net/assembly/GAC_MSIL/System.Windows.Forms/v4.0_4.0.0.0__b77a5c561934e089/System.Windows.Forms.dll
----------------------------------------
System
    Assembly Version: 4.0.0.0
    Win32 Version: 4.8.4200.0 built by: NET48REL1LAST_C
    CodeBase: file:///C:/Windows/Microsoft.Net/assembly/GAC_MSIL/System/v4.0_4.0.0.0__b77a5c561934e089/System.dll
----------------------------------------
System.Drawing
    Assembly Version: 4.0.0.0
    Win32 Version: 4.8.4084.0 built by: NET48REL1
    CodeBase: file:///C:/Windows/Microsoft.Net/assembly/GAC_MSIL/System.Drawing/v4.0_4.0.0.0__b03f5f7f11d50a3a/System.Drawing.dll
----------------------------------------
System.Configuration
    Assembly Version: 4.0.0.0
    Win32 Version: 4.8.4190.0 built by: NET48REL1LAST_B
    CodeBase: file:///C:/Windows/Microsoft.Net/assembly/GAC_MSIL/System.Configuration/v4.0_4.0.0.0__b03f5f7f11d50a3a/System.Configuration.dll
----------------------------------------
System.Core
    Assembly Version: 4.0.0.0
    Win32 Version: 4.8.4220.0 built by: NET48REL1LAST_C
    CodeBase: file:///C:/Windows/Microsoft.Net/assembly/GAC_MSIL/System.Core/v4.0_4.0.0.0__b77a5c561934e089/System.Core.dll
----------------------------------------
System.Xml
    Assembly Version: 4.0.0.0
    Win32 Version: 4.8.4084.0 built by: NET48REL1
    CodeBase: file:///C:/Windows/Microsoft.Net/assembly/GAC_MSIL/System.Xml/v4.0_4.0.0.0__b77a5c561934e089/System.Xml.dll
----------------------------------------
Accessibility
    Assembly Version: 4.0.0.0
    Win32 Version: 4.8.4084.0 built by: NET48REL1
    CodeBase: file:///C:/Windows/Microsoft.Net/assembly/GAC_MSIL/Accessibility/v4.0_4.0.0.0__b03f5f7f11d50a3a/Accessibility.dll
----------------------------------------
System.Xml.Linq
    Assembly Version: 4.0.0.0
    Win32 Version: 4.8.4084.0 built by: NET48REL1
    CodeBase: file:///C:/Windows/Microsoft.Net/assembly/GAC_MSIL/System.Xml.Linq/v4.0_4.0.0.0__b77a5c561934e089/System.Xml.Linq.dll
----------------------------------------
DotNetZip
    Assembly Version: 1.13.8.0
    Win32 Version: 1.13.8
    CodeBase: file:///C:/Users/Arun%20Krishna/Downloads/MergeWebToEpub.2020-10-08%20(1)/DotNetZip.DLL
----------------------------------------

************** JIT Debugging **************
To enable just-in-time (JIT) debugging, the .config file for this
application or computer (machine.config) must have the
jitDebugging value set in the system.windows.forms section.
The application must also be compiled with debugging
enabled.

For example:


    


When JIT debugging is enabled, any unhandled exception
will be sent to the JIT debugger registered on the computer
rather than be handled by this dialog box.

dteviot commented 4 years ago

@ArunSriKrishna Well, that is embarrassing. I did not allow for case where only some of the images were webp. Updated version of MergeWebToEpub has been uploaded to https://drive.google.com/drive/folders/1B_X2WcsaI_eg9yA-5bHJb8VeTZGKExl8?usp=sharing

Note, please log any new issues you find with MergeWebToEpub against https://github.com/dteviot/MergeWebToEpub/issues.