bohdanbobrowski / blog2epub

Convert blog (blogspot.com, wordpress.com or another based on Wordpress) to epub using command line or GUI.
https://github.com/bohdanbobrowski/blog2epub
MIT License
35 stars 4 forks source link

Is it possible to parse archived blogs with a new host? #18

Open ctnoir opened 1 month ago

ctnoir commented 1 month ago

Thanks for this software! I have used it successfully to archive a lot of older blogs for offline reading on my phone.

I've been trying to get it to work on this archived blog, which was originally hosted on blogspot: https://thearchdruidreport-archive.200605.xyz/2017/05/index.html

It should have the same folder and link structure, but I get an error 403 when trying to transform into an epub.

Is there anything I can do about this issue?

bohdanbobrowski commented 1 month ago

First of all, I would like to point out that this is my private project, which I have been developing for several years - with varying degrees of success - but such comments certainly give me a lot of motivation to continue working. Thank you!

Answering to main question: such scraping should be possible, but I haven't tested it - so I'm not surprised you've got an error. I will leave this issue opened and I will try to implement it soon. I'm currently working on version 1.3.0 (there is a branch) which will bring a lot of changes to the UX as well as a code refactor that will allow to implement further changes easier and faster.

ctnoir commented 1 month ago

I live in a rural area with intermittent internet, so it is really nice for me to be able to turn websites into epubs. I did this manually in the past with wget and then using the raw html files in Calibre to create something readable on my phone, but it was always time consuming arranging things by hand. Plus sometimes the comments would not be captured, and that's where a lot of really good insight is.

So this software is an absolute godsend for me and I was so happy to discover it one day. I have created around 20 epubs so far with it and they all work perfectly with no additional work needed on my side. I can now happily read about the history of Western civilisation or about small town life without needing any additional internet connection in the woods :D Thank you so much for your work on this project.