alexadam / save-as-ebook

Save a web page/selection as an eBook (.epub format) - a Chrome/Firefox/Opera Web Extension
MIT License
1.1k stars 70 forks source link

Horrible Result #34

Open smaragdus opened 4 years ago

smaragdus commented 4 years ago

Hello,

I tried to save a single page (this article) using 'Save Page' command and the result was horrible, I would say that the generated EPUB file is unusable.

Screens:

CoolReader 3 0 56-42 - 2020-01-17 - 001

CoolReader 3 0 56-42 - 2020-01-17 - 002

For me such EPUB files are unreadable.

I am using Save as eBook version 1.3.5 with Cent Browser (Chromium-based) version 4.1.7.182 on Windows 8.

I hope that this extension is still in development and and new releases would fix such issues.

Regards

alexadam commented 4 years ago

I tried to save the article with the latest firefox and it doesn't even work - it freezes the extension. This is clearly a bug. But there are some problem with that web page too, you can see the HTML validator log here: https://validator.w3.org/nu/?doc=https%3A%2F%2Fwww.unz.com%2Farticle%2Fagainst-mishima%2F

If you find more pages that don't work as expected, please post the links here. I'll try to create some tests and then do a major release with more updates

Thank you!

smaragdus commented 4 years ago

@alexadam

Thanks for your quick response. If it happens that I come across pages which cause problems I will let you know posting the links here,

Regards

Verfallsdatum commented 4 years ago

Hi, I'm also having this problem but with the scientificamerican.com website (article example here)

the output file is not great...

Cosmic String Gravitational Waves Could Solve Antimatter Mystery - Scientific American (3).zip

thank you!

alexadam commented 4 years ago

@Verfallsdatum the SA page is full of html errors & a 'fatal' error that stopped the validator :) https://validator.w3.org/nu/?doc=https%3A%2F%2Fwww.scientificamerican.com%2Farticle%2Fcosmic-string-gravitational-waves-could-solve-antimatter-mystery%2F

I did some updates and I prepare a new release. In the next version you won't see that garbage html in the output ebook, but some error code/message.

I use a simple html parser that cannot handle errors and I'm still working on a method to fix this. The problem is, even if I 'force' the output of the ebook (for ex. just copy paste the html without parsing it) most ebook readers won't open it and throw an error... because they cannot read it - so the solution is to find and fix the error before. Any ideas are welcome!

ps. I investigated more and it seems that the error comes from a link in the social share box. A quick fix is to create a custom style that hides it, like this:

url regex: scientificamerican\.com\/article\/

css: .article-grid__share { display: none; } .share-box { display: none; }

Verfallsdatum commented 4 years ago

Ah yes indeed, removing all the elements that contain errors does fix this issue. Thank you!