bohdanbobrowski / blog2epub

Convert blog (blogspot.com, wordpress.com...) or any website to epub using GUI, CLI or Python.
https://github.com/bohdanbobrowski/blog2epub
MIT License
40 stars 6 forks source link

Three errors from ./blog2epubcli.py #17

Closed meedstrom closed 1 week ago

meedstrom commented 4 months ago

Command ./blog2epubcli.py https://eukaryotewritesblog.com failed on or after post 37: Nemesis club (next post would be Biodiversity for heretics)

File "/src/blog2epub/crawlers/wordpress.py", line 121, in get_images_with_captions
  img_caption = img_tr.xpath('//p[@class="wp-caption-text"]/text()').pop()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: pop from empty list

Command ./blog2epubcli.py https://agentyduck.blogspot.com/ failed on or after post 91: Mental Postures (next post would probably be Simulating Confusion)

File "src/lxml/apihelpers.pxi", line 1736, in lxml.etree._htmlTagValidOrRaise
ValueError: Invalid HTML tag name 'li"'

Command ./blog2epubcli.py https://kajsotala.fi/ failed at the start. Just a HTTP error so it may be on my end, but I wonder if you get the same too.

File "/usr/local/lib/python3.12/urllib/request.py", line 639, in http_error_default
  raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
bohdanbobrowski commented 4 months ago

This actually shows three separate bugs. Expect to see fix for them in version 1.3.0 wich I will try to deliver by the end of the next week.

Thanks @meedstrom for creating this issue. It's always very exciting for me, when someone uses your software :-)

meedstrom commented 4 months ago

Ya, a tool like this fills a distinct niche. I haven't found anything else so I'm reduced to using the EpubPress web-extension and manually clicking each article to download, and on top of that it has a max size per book and you don't know exactly when you'll hit the max and if the download fails for that reason then you have to click all over again.... surprisingly not so many people that want to read a blog whole?

Thanks for creating this project :)

bohdanbobrowski commented 4 months ago

The release of the new version is taking bit longer than I've estimated above, but it should be available soon (maybe this weekend). I'm polishing the latest changes and fixing building errors (after changing from venv to poetry, it broke a bit, but it's okay now - at least for Windows). Progress can be tracked on branch 1.3.0. Meanwhile, you can see the newly added functionality: the ability to select added chapters (articles) from all downloaded ones. Stay tuned!

obraz

bohdanbobrowski commented 1 week ago

@meedstrom it looks like both bugs are finally fully resolved. I've been working on a deep code refactor for a while now... it's definitely not over yet... but it works much better now. It's currently available on the dev branch, it will be in the next 1.5.0 release.

blog2epub https://eukaryotewritesblog.com -l=10
blog2epub https://agentyduck.blogspot.com -l=10
blog2epub https://kajsotala.fi -l=10

All these commands (note the bit different syntax after cli interface refactor) produce shiny epubs, for example:

obraz obraz

meedstrom commented 1 week ago

That’s nice to hear! Great work!

I’ll aim to try it soon. Maybe report new bugs on dev for your benefit 😉

On Fri, Nov 15, 2024 at 01:39 Bohdan Bobrowski @.***> wrote:

@meedstrom https://github.com/meedstrom it looks like both bugs are finally fully resolved. I've been working on a deep code refactor for a while now... it's definitely not over yet... but it works much better now. It's currently available on the dev branch, it will be in the next 1.5.0 release.

blog2epub https://eukaryotewritesblog.com -l=10 blog2epub https://agentyduck.blogspot.com -l=10 blog2epub https://kajsotala.fi -l=10

All these commands (note the bit different syntax after cli interface refactor) produce shiny epubs, for example:

obraz.png (view on web) https://github.com/user-attachments/assets/3b789a61-13ab-4151-84da-84b1e39f747a obraz.png (view on web) https://github.com/user-attachments/assets/c094e1ce-c457-451f-8b11-5036aef65da7

— Reply to this email directly, view it on GitHub https://github.com/bohdanbobrowski/blog2epub/issues/17#issuecomment-2477706440, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQNTTF52NAQFFIPBS24JQYL2AU7FJAVCNFSM6AAAAABKLKRLT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZXG4YDMNBUGA . You are receiving this because you were mentioned.Message ID: @.***>

bohdanbobrowski commented 1 week ago

@meedstrom new issues will be very welcome! :-)

meedstrom commented 5 days ago

Would it be possible to reverse the order of posts so that it's oldest-first? :-)

bohdanbobrowski commented 4 days ago

Hmmm, it should be sorted that way - on which example it starts from the newest?

meedstrom commented 4 days ago

You seem to be right, I assumed it does newest-first because I saw https://eukaryotewritesblog.com/ has newest post as first page, but then the second page has some old post, it's not consistently ordered.

But we can take this discussion to #33.