Closed TheLethalCode closed 5 years ago
By using multiprocessing you can't increase the speed by more than 2 times. making it more than 2 will give the 429 error while making the http requests i.e too many requests at the same time.
Are you sure, because I am working on a project which uses multiprocessing for brute forcing a password. See this for reference, http://blog.adnansiddiqi.me/how-to-speed-up-your-python-web-scraper-by-using-multiprocessing/ .
I tried that as well, still it says too many requests
Dude, I am not even sure what you did, but I just got it working in my laptop, and hence I know it is possible. You might have overlapping arguments
What do you mean by overlapping arguments? And what did you do exactly?
My bad. The server is throttling the connection. The only way is to change the IP which is too much of a work. If you want to try it, have a look. But since this is a one time script, we will just run it through the 10 hours
I can do it, but it would be difficult to make it work on any wifi over campus. I can make it work over mobile data. Should I do it?
As I said, it's overkill for a one-time script running.
umm..making it work over mobile data isn't that much of a work. Anyway it's fine for running it 10 hours. I guess there'll be errors as the script uses "soup.find()" and in few cases there are no description for the anime, so that might give an error. Should I fix it? it only needs the try and except thing.
Nah, it is actually more work than you think. If you change, you have to keep on changing your ip before every request, and the changing itself takes time. And anyway, there is no error till 10000, I have handled them.
Yeah,there was a pretty nice article on rotating your ip's but fine anyway. Cool then I guess this thing is done.
Right now, the scraping of all anime details will take around 10 hours on a rough estimate. We have to speed it up. One possible option is to use multiprocessing