JimmXinu / FanFicFare

FanFicFare is a tool for making eBooks from stories on fanfiction and other web sites.
Other
746 stars 158 forks source link

Browser Cache Refactor & open_pages_in_browser feature #905

Closed JimmXinu closed 1 year ago

JimmXinu commented 1 year ago

This version contains a significant refactor of FFF's network fetching & caching layers plus a new browser cache feature.

  1. In past, on FFF download start, the browser cache feature scanned all recent browser cache entries looking for ones that matched the few browser cache supported sites (fanfiction.net, fictionpress.com & ficbook.net).
  2. This version instead looks for cache entries by hashed URL (like the browser itself does). This means that FFF will find updated/new cache entries instead of being limited to those found at start; but it also means the URLs have to match exactly.
  3. This version changes the fanfiction.net adapter to include the story title in chapter URLs for that exact match. Epub update still seems fine.
  4. It may be necessary (especially with ffnet) to reload the first viewed chapter page if CloudFlare intervened because it will have been cached with extra CF parameters.
  5. I tried to keep support for reading the cache as filled by WebToEpub for ffnet working.
  6. This version is not limited to a few hard-coded sites. However in my testing so far, many (most?) FFF supported sites explicitly serve story pages tagged "no-cache" and are not cached by the browser. Also, POST requests are not supported, so most (all?) logins are impossible.

This version introduces the open_pages_in_browser setting/feature.

If all of these settings are true:

browser_cache_path:...(your browser cache)...
use_browser_cache:true
use_browser_cache_only:true
open_pages_in_browser:true

...Then if FFF can't find a page in the browser cache, it will try to open it in your default browser as if you had clicked a link in a (non-browser) app, wait briefly, then look in the cache again. Sort of like a kludged proxy.

  1. This will only work if you use the browser cache feature with your default browser.
  2. This feature is very intrusive--in most OSes, opening a URL in browser like this will force your browser to the top and focus on it. You will almost certainly not be able to use this in the background while you do other things. The intended use cases are small numbers of updates, or as a last resort.
  3. The open_pages_in_browser feature has retries, sleeps and 'quit trying after X fails' code that will probably need some tweaking. There's also currently a ton of debug output that will be reduced later.
  4. Notably, the check_next_chapter:true ffnet adapter feature will often open pages to 'next' chapters that don't exist a couple times because the failure isn't cached. I'd recommend turning check_next_chapter off.

I've been using this refactored code since mid-December with few issues. Users who don't use browser cache shouldn't notice any difference.

Browser cache users who try this version without open_pages_in_browser may need to load/reload the page that CloudFlare intervened on more often. But hopefully that's it.

(This text largely from this MR post.)