Closed demoran23 closed 4 years ago
Oh, interesting; I never thought of using this for Twig. Does this work well otherwise? I guess the title and cover image and stuff will be different.
I suppose what should happen here is that we should invalidate the cache if --start-url is different from the previous --start-url.
Yeah, it seems to work just fine. I just needed to edit the epub description, cover, and title.
I was thinking that the cache directory would be prefaced with the book title. That way, you could resume / refresh two books without wiping out your progress on the previous one.
The other thing I was thinking of was supporting a configuration file. This would contain all of the info regarding the book, and people could push configs to the repo and then you could be like "Hey, download using this configuration file".
I'd like to see the scraper abstracted enough that it could handle books from other sources, like Mother of Learning.
I guess I'd prefer other people work on generalizing this, as this project remains about Worm for me (see e.g. the Worm-specific text fixes). Happy to link to any work you or others do from the README, though.
I'm still interested in fixing this bug with regard to caching vs. start-url though, one way or another.
That's cool. You may get a pull request from me with some architectural plumbing =)
Let's roll this into #9.
In attempting to use worm-scraper with Twig, I encountered the following issue:
When improperly providing the --start-url parameter, it began to download Worm by default. After that, it would always download Worm. I thought that I was continuing to improperly pass the start url.
Clearing the cache folder resolved the problem.
This appears to be related to starting from the latest position in the existing manifest.