Closed jabofh closed 10 months ago
As for SOL specifically, updates fail, too. The log window says:
Status | Title | Author | Comment | URL
-- | -- | -- | -- | --
Skipped | AMA: The Boyfriend | BreaktheBar | Existing epub contains 147 chapters, web site only has 1. Use Overwrite or force_update_epub_always to force update. | https://storiesonline.net/s/31154/ama-the-boyfriend/i
Relevant parts from the debug log:
FF: DEBUG: 2023-11-19 14:10:58,777: story.py(737): use_flaresolverr_proxy:
FFF: DEBUG: 2023-11-19 14:11:00,874: calibre_plugins.fanficfare_plugin.fff_plugin(1125): FanFicFare v4.29.0
FFF: INFO: 2023-11-19 14:11:01,033: calibre_plugins.fanficfare_plugin.prefs(216): Using default settings
FFF: DEBUG: 2023-11-19 14:11:01,058: story.py(737): use_flaresolverr_proxy:
FFF: DEBUG: 2023-11-19 14:11:01,069: configurable.py(1078): use_browser_cache:
FFF: DEBUG: 2023-11-19 14:11:01,070: configurable.py(1098): use_basic_cache:true
FFF: DEBUG: 2023-11-19 14:11:01,072: adapter_storiesonlinenet.py(155): URL: https://storiesonline.net/s/31154/ama-the-boyfriend
FFF: DEBUG: 2023-11-19 14:11:01,072: cache_basic.py(116):
========== MISS (GET) BasicCache
https://storiesonline.net/s/31154/ama-the-boyfriend
FFF: DEBUG: 2023-11-19 14:11:01,072: fetcher_requests.py(114):
---------- REQ (GET) RequestsFetcher
https://storiesonline.net/s/31154/ama-the-boyfriend
FFF: DEBUG: 2023-11-19 14:11:01,929: fetcher_requests.py(127): response code:200
FFF: DEBUG: 2023-11-19 14:11:01,929: decorators.py(112): fromcache:False
FFF: DEBUG: 2023-11-19 14:11:01,930: decorators.py(123): random sleep(0.50-1.50):1.00
FFF: DEBUG: 2023-11-19 14:11:02,930: requestable.py(55): Encoding:utf8
FFF: DEBUG: 2023-11-19 14:11:02,931: adapter_storiesonlinenet.py(116): Will now login to URL (https://storiesonline.net/sol-secure/login.php) as (***@***.***)
FFF: DEBUG: 2023-11-19 14:11:02,931: cache_basic.py(116):
========== MISS (GET) BasicCache
https://storiesonline.net/sol-secure/login.php
FFF: DEBUG: 2023-11-19 14:11:02,931: fetcher_requests.py(114):
---------- REQ (GET) RequestsFetcher
https://storiesonline.net/sol-secure/login.php
FFF: DEBUG: 2023-11-19 14:11:03,809: fetcher_requests.py(127): response code:200
FFF: DEBUG: 2023-11-19 14:11:03,810: decorators.py(112): fromcache:False
FFF: DEBUG: 2023-11-19 14:11:03,810: decorators.py(123): random sleep(0.50-1.50):0.86
FFF: DEBUG: 2023-11-19 14:11:04,673: requestable.py(55): Encoding:utf8
FFF: DEBUG: 2023-11-19 14:11:04,690: cache_basic.py(116):
========== MISS (POST) BasicCache
https://login.wlpc.com/index.php
FFF: DEBUG: 2023-11-19 14:11:04,690: fetcher_requests.py(114):
---------- REQ (POST) RequestsFetcher
https://login.wlpc.com/index.php
FFF: DEBUG: 2023-11-19 14:11:05,008: fetcher_requests.py(127): response code:200
FFF: DEBUG: 2023-11-19 14:11:05,008: decorators.py(112): fromcache:False
FFF: DEBUG: 2023-11-19 14:11:05,008: decorators.py(123): random sleep(0.50-1.50):1.12
FFF: DEBUG: 2023-11-19 14:11:06,128: requestable.py(55): Encoding:utf8
FFF: DEBUG: 2023-11-19 14:11:06,129: cache_basic.py(116):
========== MISS (GET) BasicCache
https://storiesonline.net/s/31154/ama-the-boyfriend
FFF: DEBUG: 2023-11-19 14:11:06,130: fetcher_requests.py(114):
---------- REQ (GET) RequestsFetcher
https://storiesonline.net/s/31154/ama-the-boyfriend
FFF: DEBUG: 2023-11-19 14:11:10,546: fetcher_requests.py(127): response code:200
FFF: DEBUG: 2023-11-19 14:11:10,547: decorators.py(112): fromcache:False
FFF: DEBUG: 2023-11-19 14:11:10,547: decorators.py(123): random sleep(0.50-1.50):0.77
FFF: DEBUG: 2023-11-19 14:11:11,314: requestable.py(55): Encoding:utf8
FFF: INFO: 2023-11-19 14:11:11,331: adapter_storiesonlinenet.py(189): use url: https://storiesonline.net/s/31154/ama-the-boyfriend/i?ind=1
FFF: DEBUG: 2023-11-19 14:11:11,331: cache_basic.py(116):
========== MISS (GET) BasicCache
https://storiesonline.net/s/31154/ama-the-boyfriend/i?ind=1
FFF: DEBUG: 2023-11-19 14:11:11,332: fetcher_requests.py(114):
---------- REQ (GET) RequestsFetcher
https://storiesonline.net/s/31154/ama-the-boyfriend/i?ind=1
FFF: DEBUG: 2023-11-19 14:11:15,639: fetcher_requests.py(127): response code:200
FFF: DEBUG: 2023-11-19 14:11:15,640: decorators.py(112): fromcache:False
FFF: DEBUG: 2023-11-19 14:11:15,641: decorators.py(123): random sleep(0.50-1.50):0.79
FFF: DEBUG: 2023-11-19 14:11:16,429: requestable.py(55): Encoding:utf8
FFF: DEBUG: 2023-11-19 14:11:16,528: cache_basic.py(116):
========== MISS (GET) BasicCache
https://storiesonline.net/a/breakthebar/1
FFF: DEBUG: 2023-11-19 14:11:16,528: fetcher_requests.py(114):
---------- REQ (GET) RequestsFetcher
https://storiesonline.net/a/breakthebar/1
FFF: DEBUG: 2023-11-19 14:11:16,663: fetcher_requests.py(127): response code:200
FFF: DEBUG: 2023-11-19 14:11:16,663: decorators.py(112): fromcache:False
FFF: DEBUG: 2023-11-19 14:11:16,663: decorators.py(123): random sleep(0.50-1.50):0.87
FFF: DEBUG: 2023-11-19 14:11:17,538: requestable.py(55): Encoding:utf8
FFF: DEBUG: 2023-11-19 14:11:17,587: adapter_storiesonlinenet.py(318): Found story row on page 1
FFF: DEBUG: 2023-11-19 14:11:17,657: calibre_plugins.fanficfare_plugin.fff_plugin(1435): update existing id:327
bs4\builder\__init__.py:545: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument `features="xml"` into the BeautifulSoup constructor.
These sites have changed their story chapter URL format.
Before: https://finestories.com/s/1111:2222/chapter-1-story-title
After: https://finestories.com/s/1111/story-title/1
It looks like they've changed it for all stories, not just new ones. Removing a--presumably--unique chapter ID number (2222 in the example above) and chapter title in favor of story title and chapter number.
This has a straightforward fix--just look for the new chapter URLs. It's a one line change.
But that fix alone invalidates existing chapters. When you update an existing EPUB, FFF will download all chapters again because the chapter URLs in existing EPUB chapters won't match what the site now shows.
FFF has a mechanism (adapters' normalize_chapterurl()
method) to avoid that, but I don't have enough examples of 'before' stories from these sites to implement it. Is chapter-1
in the 'before' URL the chapter title as chosen by the author, or simply chapter-#?
Can either of you look inside some pre-existing EPUBs for these sites to help answer that? Specifically, stories with chapter names that aren't just 'Chapter 1', 'Chapter 2'.
<meta name="chapterurl" content="XXXXXXX" />
<meta name="chapterorigtitle" content="1. Chapter 1" />
<meta name="chaptertoctitle" content="1. Chapter 1" />
<meta name="chaptertitle" content="1. Chapter 1" />
What I want to know is, what is the value for chapterurl
when chapterorigtitle isn't just Chapter 1; when it's 'Prologue', 'Epilogue', 'Chapter XIV', 'Chapter 1: It all goes wrong'. Ideally, I'd like to see several examples.
I don't have many examples, these are two that I found with chapter titles that contain more than just "Chapter XX":
This is the second chapter of that story, named chapter one:
Another story:
Second chapter:
And a story with a single chapter, just in case it matters.
El dom, 19 nov 2023, 17:30, Jim Miller @.***> escribió:
What's Changed
These sites have changed their story chapter URL format.
Before: https://finestories.com/s/1111:2222/chapter-1-story-title After: https://finestories.com/s/1111/story-title/1
It looks like they've changed it for all stories, not just new ones. Removing a--presumably--unique chapter ID number (2222 in the example above) and chapter title in favor of story title and chapter number. Fixing It
This has a straightforward fix--just look for the new chapter URLs. It's a one line change. The Complication - Updating Existing EPUBs
But that fix alone invalidates existing chapters. When you update an existing EPUB, FFF will download all chapters again because the chapter URLs in existing EPUB chapters won't match what the site now shows.
FFF has a mechanism (adapters' normalize_chapterurl() method) to avoid that, but I don't have enough examples of 'before' stories from these sites to implement it. Is chapter-1 in the 'before' URL the chapter title as chosen by the author, or simply chapter-#? How to Help
Can either of you look inside some pre-existing EPUBs for these sites to help answer that? Specifically, stories with chapter names that aren't just 'Chapter 1', 'Chapter 2'.
- Open EPUB in Edit book (assuming Calibre)
- In the EPUB, open file0001.xhtml (or other chapter file)
- Look for HTML near the top that looks like:
What I want to know is, what is the value for chapterurl when chapterorigtitle isn't just Chapter 1; when it's 'Prologue', 'Epilogue', 'Chapter XIV', 'Chapter 1: It all goes wrong'. Ideally, I'd like to see several examples.
— Reply to this email directly, view it on GitHub https://github.com/JimmXinu/FanFicFare/issues/1013#issuecomment-1817906069, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARHIRB3GUVDIK3WIRXHRTDYFIXZ7AVCNFSM6AAAAAA7RXIQ5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJXHEYDMMBWHE . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thank you! That's all I need to see.
The first case, 'Prologue', proves we can't get normalize the pre-existing chapter URLs to match the new chapter URL form.
Don't click the chapter links above. They not only don't work, they cause the site to block that story & it's chapters from loading in the same browser session.
Test versions posted in the usual places.
I've tried the same credentials from my browser and it works flawlessly. On Friday (I think) it still worked without any problem, now it starts and fails before it parses any chapters.
I've also tried other WLPC sites and they fail at the same stage.