domenic / worm-scraper

Scrapes the web serial Worm, its sequel Ward, and the bridge series Glow-worm into an ebook format
Other
210 stars 48 forks source link

Twig #7

Closed demoran23 closed 4 years ago

demoran23 commented 7 years ago

In attempting to use worm-scraper with Twig, I encountered the following issue:

When improperly providing the --start-url parameter, it began to download Worm by default. After that, it would always download Worm. I thought that I was continuing to improperly pass the start url.

Clearing the cache folder resolved the problem.

This appears to be related to starting from the latest position in the existing manifest.

domenic commented 7 years ago

Oh, interesting; I never thought of using this for Twig. Does this work well otherwise? I guess the title and cover image and stuff will be different.

I suppose what should happen here is that we should invalidate the cache if --start-url is different from the previous --start-url.

demoran23 commented 7 years ago

Yeah, it seems to work just fine. I just needed to edit the epub description, cover, and title.

I was thinking that the cache directory would be prefaced with the book title. That way, you could resume / refresh two books without wiping out your progress on the previous one.

The other thing I was thinking of was supporting a configuration file. This would contain all of the info regarding the book, and people could push configs to the repo and then you could be like "Hey, download using this configuration file".

I'd like to see the scraper abstracted enough that it could handle books from other sources, like Mother of Learning.

domenic commented 7 years ago

I guess I'd prefer other people work on generalizing this, as this project remains about Worm for me (see e.g. the Worm-specific text fixes). Happy to link to any work you or others do from the README, though.

I'm still interested in fixing this bug with regard to caching vs. start-url though, one way or another.

demoran23 commented 7 years ago

That's cool. You may get a pull request from me with some architectural plumbing =)

domenic commented 4 years ago

Let's roll this into #9.