domenic / worm-scraper

Scrapes the web serial Worm, its sequel Ward, and the bridge series Glow-worm into an ebook format
Other
210 stars 48 forks source link

Adapt to other books #9

Open StylinGreymon opened 6 years ago

StylinGreymon commented 6 years ago

Not really an issue, since this does exactly what it says on the tin, but how would I go about making this work for the author's other books, like the Worm sequel?

domenic commented 6 years ago

The main files that are going to be Worm-specific are convert.js and substitutions.json. Much of convert.js might still be applicable, but a lot is worm-specific, and some stuff like next chapter/previous chapter detection might need tweaking as well.

download.js also contains the code for detecting the chapter title and next chapter URL, which might work out of the box, but might need fixing.

scaffold.js contains the metadata which you'd want to change appropriately.

worm-scraper.js contains the default start URL.

Hope that helps!

fridokus commented 6 years ago

download.js works out of the box as you said. Simply changing the default start URL gets all the current chapters of the sequel down. Without any substitutions or fixes, of course.

Hate9 commented 6 years ago

Is there an easy way to run it without doing any of the substitutions?

domenic commented 4 years ago

This project now works out-of-the-box for Ward. Its architecture should also be easier to generalize to other books as well now. Roughly:

There may still be tweaks needed for things like chapter title/next chapter URL, e.g. I had to make the changes in https://github.com/domenic/worm-scraper/commit/2593540551c4df55060dbb274ff6dc16b6afc95e and https://github.com/domenic/worm-scraper/commit/559681e4ece0cc5eb76a6b6af8c4fb0cf83a97ff for Ward. And while the general fixups in convert.js will be applied, you'd need to add any one-offs (keyed by URL) in substitutions.json, e.g. as done in https://github.com/domenic/worm-scraper/commit/9e112adb816d99bd2c23cdb6768ed345db362ca6.

I'll keep the project Worm/Ward-focused for now, i.e. I won't be changing anything in the current code or accepting pull requests adding support for the other books at this time. But this should make it easier if folks want to fork or play around with things locally.

Although I am interested in how to handle Glow-worm. I could either make it part 0 of Ward, or a third book. (Right now it's not included in Ward.)