Open gamebeaker opened 3 months ago
This... is actually fairly feasible for a lot of websites and would only take some slight modifications to a GenericParser I threw together a few years ago... It'd only need an additional code chunk to retrieve and update the json.
I'll note, it wouldn't work with every case, some sites still need special logic to work, so it wouldn't be an immediate fix... But this is probably worth looking into a complete implementation for some of the sites. I dug up my old code, I'll throw a link in case dteviot wants to look into a starting point for it... But it's really not too complex. https://github.com/Kiradien/WebToEpub/blob/GenericParser/plugin/js/parsers/GenericParser.js
Regardless, I'll probably play with this idea myself if nothing else.
Incidentally, IIRC, URL of first chapter is used in Default Parser to load a page so user can inspect the results of the supplied CSS. It's not actually used by the Default Parser.
@dteviot I interpret it so that the major pain point is the fetching of chapter list. As an idea: A lot of sites have a "next chapter" button. I think it would be easier to select these buttons with css than to load a chapter list. Pro.: faster parser creation because only simple css selection and no custom js to fetch the chapter list. contra: would break reading list and because of that Library functionality. Unable to download only specific chapter.
@gamebeaker The other problem with following the "Next Chapter" buttons is that WebToEpub wasn't designed to crawl the individual chapters. It gets a list of chapters, and then fetches them. Following the "next page" links requires non-trivial changes to the main loop.
Is your feature request related to a problem? Please describe. The time until a website is added to the addon in the store after the parser is finished is random.
Describe the solution you'd like Implement a json file witch can be updated. (If someone presses update it downloads the latest json file from github and saves it in local storage) Some pages aren't that complicated and only the css selctor have to be found out. If the selector are in the json they could be used as variables depending on the website. Alternative: Inspiration from the Default Parser where you can change the css selectors. Json example not complete. (For each function in the parser file create a subkey with the same name.)
{ { (css selectors) Hostname: example.com, getChapterUrls: css, findContent: css, extractTitleImpl: css, extractAuthor: css, ... }, { (Default parser with predefined arguments) Hostname: example.com, URL_of_first_chapter: example.com/chapter1, css_content: css, css_title_chapter: css, css_to_remove } } potential problems to fix: how to check if there exists css args in json file for hostname. should json overwrite parser file? (example: site1 updates css and now the parser file is deprecated) How should versioning work? Does the json file has an own Version number independent from release number? How is it leagally if the addon requests the update file from github? (update to: Privacy Policy ?)