bohdanbobrowski / blog2epub

Convert blog (blogspot.com, wordpress.com or another based on Wordpress) to epub using command line or GUI.
https://github.com/bohdanbobrowski/blog2epub
MIT License
35 stars 4 forks source link

Three errors from ./blog2epubcli.py #17

Open meedstrom opened 1 month ago

meedstrom commented 1 month ago

Command ./blog2epubcli.py https://eukaryotewritesblog.com failed on or after post 37: Nemesis club (next post would be Biodiversity for heretics)

File "/src/blog2epub/crawlers/wordpress.py", line 121, in get_images_with_captions
  img_caption = img_tr.xpath('//p[@class="wp-caption-text"]/text()').pop()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: pop from empty list

Command ./blog2epubcli.py https://agentyduck.blogspot.com/ failed on or after post 91: Mental Postures (next post would probably be Simulating Confusion)

File "src/lxml/apihelpers.pxi", line 1736, in lxml.etree._htmlTagValidOrRaise
ValueError: Invalid HTML tag name 'li"'

Command ./blog2epubcli.py https://kajsotala.fi/ failed at the start. Just a HTTP error so it may be on my end, but I wonder if you get the same too.

File "/usr/local/lib/python3.12/urllib/request.py", line 639, in http_error_default
  raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
bohdanbobrowski commented 1 month ago

This actually shows three separate bugs. Expect to see fix for them in version 1.3.0 wich I will try to deliver by the end of the next week.

Thanks @meedstrom for creating this issue. It's always very exciting for me, when someone uses your software :-)

meedstrom commented 1 month ago

Ya, a tool like this fills a distinct niche. I haven't found anything else so I'm reduced to using the EpubPress web-extension and manually clicking each article to download, and on top of that it has a max size per book and you don't know exactly when you'll hit the max and if the download fails for that reason then you have to click all over again.... surprisingly not so many people that want to read a blog whole?

Thanks for creating this project :)

bohdanbobrowski commented 1 month ago

The release of the new version is taking bit longer than I've estimated above, but it should be available soon (maybe this weekend). I'm polishing the latest changes and fixing building errors (after changing from venv to poetry, it broke a bit, but it's okay now - at least for Windows). Progress can be tracked on branch 1.3.0. Meanwhile, you can see the newly added functionality: the ability to select added chapters (articles) from all downloaded ones. Stay tuned!

obraz