bohdanbobrowski / blog2epub

Convert blog (blogspot.com, wordpress.com or another based on Wordpress) to epub using command line or GUI.
https://github.com/bohdanbobrowski/blog2epub
MIT License
35 stars 4 forks source link

Bypass content warning #2

Closed rosano closed 4 years ago

rosano commented 5 years ago

I spent a while trying to use this to download the amazing words at http://brucecameronelliott.blogspot.com/

I managed to bypass the content warning by copying the interstitial parameter from the button and appending it to the url before downloading.

I also used the mobile version of the site because the script's pattern matching doesn't work on the desktop version.

Posting here in case anyone wants it: https://github.com/rosano/blogspot2epub/commit/687fe1e82d14224a9dc171d8b0a732452657382f

rosano commented 5 years ago

And one more change to allow the script to find older posts: https://github.com/rosano/blogspot2epub/commit/b161f1da99707f1395394e0cf200cdfcf3cda337

bohdanbobrowski commented 5 years ago

As I wrote in comment to your pull request I'm working currently on code refactor. This script evolved directly from kind of "proof of concept" thing... so this kind of bugs don't surprise me at all ;-)

Idea of scraping mobile version looks really cool to me. I think I will add this to next version.

bohdanbobrowski commented 4 years ago

Added support for atom feed - interstitial cookie value is setted but logic behind that is more complex.

Currently blogs with "interstitial" landing page are crawled to 25 last posts only, but half a loaf is better than nothing!