Closed Edgeburn closed 3 months ago
@Edgeburn Hi, this looks like an issue with aiohttp_client_cache
. You can switch CachedSession(...)
in the following lines to aiohttp.ClientSession(headers=headers)
This is the quick and dirty fix, and removes caching. Discord is better for support. Out of curiosity, what are you working on?
@Edgeburn Hi, this looks like an issue with
aiohttp_client_cache
. You can switchCachedSession(...)
in the following lines toaiohttp.ClientSession(headers=headers)
1. https://github.com/TheOnlyWayUp/WattpadDownloader/blob/3f6eb6ed7c1bdecf4ddab95671380c066b709958/src/api/src/create_book.py#L19 2. https://github.com/TheOnlyWayUp/WattpadDownloader/blob/3f6eb6ed7c1bdecf4ddab95671380c066b709958/src/api/src/create_book.py#L40 3. https://github.com/TheOnlyWayUp/WattpadDownloader/blob/3f6eb6ed7c1bdecf4ddab95671380c066b709958/src/api/src/create_book.py#L61
This is the quick and dirty fix, and removes caching. Discord is better for support.
If you don't end up pushing these changes, I'll definitely go ahead and fork to make them. Much appreciated!
Out of curiosity, what are you working on?
I'm building a book library + archiver for my girlfriend's enormous Wattpad book collection, and I am using a few instances of your project as an API call to download copies of all 2k+ books; it's worked fantastically aside from this issue, and I appreciate your work!
Hey @Edgeburn, that sounds sickkk. Are you using Calibre to store the library?
On the topic of caching, I likely won't be removing it on the master branch.
Caching is especially useful during ratelimits - If we're downloading a 200 part book and get ratelimited on the 50th part, the user can retry and continue from the 51st (the first 50 will hit the client cache, valid for 12 hours).
Tbf I didn't know aiohttp_client_cache
didn't hold up to rapid requests, I'll have to try it out later (haven't seen this issue in my instance)
If you really wanna speed things up, you can use asyncio.gather on this function
https://github.com/TheOnlyWayUp/WattpadDownloader/blob/master/src/api/src/main.py#L20C1-L51C6
After changing the return value and removing the router decorator. You'd have to find a way to deal with ratelimits, the backoff library + semaphores or chunking should help
There's no obligation to do any of this haha, I'm hoping our thread will be useful to others trying to download a lot of books.
I'd love to hear more about your project, how can I get in touch?
Although I've used Calibre in order to convert some Amazon Kindle books to epub, my project is all entirely custom-built. It stores everything in a MariaDB database, including the epubs encoded in base64.
The rate limiting on Wattpad's end definitely was a bit of an issue, and my hacky workaround was basically just to deploy 4 instances of your program across 4 separate servers and have mine simply cycle through them all, which worked well enough.
I appreciate your interest in my project! It's written in Java using the Spark framework, with basic HTML, CSS, and JavaScript webpages for the frontend. (Web frontend is not my strong suit at all, lol.)
The project is still in very early stages and is not yet public on GitHub. However, I intend to create a publicly usable version. The master branch was forked for a more basic version to get her book collection archived as quickly as possible.
If you're interested in it, please feel free to email me at edgeburn@edgeburnmedia.com or message me on Discord at @edgeburn02.
I am running an instance of this application as part of my own project that sends multiple requests to it in parallel. Although the application works perfectly when responding to one request at a time, it fails when hit with multiple at once, and outputs this to the logs