dteviot / WebToEpub

A simple Chrome (and Firefox) Extension that converts Web Novels (and other web pages) into an EPUB.
Other
696 stars 134 forks source link

more rapid updating from WebToEpub without spaming new versions #1353

Open gamebeaker opened 3 months ago

gamebeaker commented 3 months ago

Is your feature request related to a problem? Please describe. The time until a website is added to the addon in the store after the parser is finished is random.

Describe the solution you'd like Implement a json file witch can be updated. (If someone presses update it downloads the latest json file from github and saves it in local storage) Some pages aren't that complicated and only the css selctor have to be found out. If the selector are in the json they could be used as variables depending on the website. Alternative: Inspiration from the Default Parser where you can change the css selectors. Json example not complete. (For each function in the parser file create a subkey with the same name.)

{ { (css selectors) Hostname: example.com, getChapterUrls: css, findContent: css, extractTitleImpl: css, extractAuthor: css, ... }, { (Default parser with predefined arguments) Hostname: example.com, URL_of_first_chapter: example.com/chapter1, css_content: css, css_title_chapter: css, css_to_remove } } potential problems to fix: how to check if there exists css args in json file for hostname. should json overwrite parser file? (example: site1 updates css and now the parser file is deprecated) How should versioning work? Does the json file has an own Version number independent from release number? How is it leagally if the addon requests the update file from github? (update to: Privacy Policy ?)

Kiradien commented 3 months ago

This... is actually fairly feasible for a lot of websites and would only take some slight modifications to a GenericParser I threw together a few years ago... It'd only need an additional code chunk to retrieve and update the json.

I'll note, it wouldn't work with every case, some sites still need special logic to work, so it wouldn't be an immediate fix... But this is probably worth looking into a complete implementation for some of the sites. I dug up my old code, I'll throw a link in case dteviot wants to look into a starting point for it... But it's really not too complex. https://github.com/Kiradien/WebToEpub/blob/GenericParser/plugin/js/parsers/GenericParser.js

Regardless, I'll probably play with this idea myself if nothing else.

dteviot commented 3 months ago
  1. My reading of https://developer.chrome.com/docs/webstore/program-policies/mv3-requirements is that this would be allowed. Assuming it's just loading CSS selectors. Although I think it might need to be declared when I submit to the Chrome and Firefox stores. Which may invite a manual inspection and delay approval.
  2. Also, I'm not sure I want Google taking too close a look at this extension. It could, theoretically, be used to violate copywrite. Which is something of a no-no for extensions.
  3. Looking at the last few commits for web sites, to see if they could have been done using this.

Incidentally, IIRC, URL of first chapter is used in Default Parser to load a page so user can inspect the results of the supplied CSS. It's not actually used by the Default Parser.

gamebeaker commented 1 month ago

@dteviot I interpret it so that the major pain point is the fetching of chapter list. As an idea: A lot of sites have a "next chapter" button. I think it would be easier to select these buttons with css than to load a chapter list. Pro.: faster parser creation because only simple css selection and no custom js to fetch the chapter list. contra: would break reading list and because of that Library functionality. Unable to download only specific chapter.

dteviot commented 1 month ago

@gamebeaker The other problem with following the "Next Chapter" buttons is that WebToEpub wasn't designed to crawl the individual chapters. It gets a list of chapters, and then fetches them. Following the "next page" links requires non-trivial changes to the main loop.