Scrape limit of 200/hr - Githubissues

cbanack / comic-vine-scraper

An add-on script for ComicRack that lets you copy details from Comic Vine into your comic books.

243 stars 47 forks source link

Scrape limit of 200/hr #494

Closed solidus0079 closed 3 months ago

solidus0079 commented 3 months ago

I know they've limited the number of scrapes for a long time, but it looks like about a month ago they started enforcing 200/hr. Not sure what it was previously but I seem to be hitting the wall again.

cbanack commented 3 months ago

Unfortunately, there isn't really much that I can do to change how Comic Vine limits access to their API.

If you are trying to scrape a large number of comics, the standard solution to this problem is to use the SCRAPE_DELAY advanced setting. You may have to adjust it a bit to find the right value that slows things down enough so that you don't go over the limit. I've heard people say that something between 30 and 40 seconds seems to work. Then scraping becomes brutally slow. :(

The SCRAPE_DELAY setting is described in more detail near the bottom of the page here:

https://github.com/cbanack/comic-vine-scraper/wiki/Advanced-Settings

solidus0079 commented 3 months ago

Scrape delay sounds like all I’ll need, cheers!

ucapato commented 3 months ago

My two cents of contribution here:

SCRAPE_DELAY=19 (1 hour = 3600 seconds, divided by 200 comic books, you get 18. So 19 is the magic number here), it will make you scrape your Comics above 200 (and above 1 hour) at once. But of course, it will consume time. Advisable to run overnight, so the next morning your list will be settled. I did with more than 500 comics at once, and it worked flawlessly.

solidus0079 commented 3 months ago

Yeah that was the math I came up with too, and my overnight job last night worked just fine too.

giotte commented 3 months ago

Yes, agreed. I found that SCRAPE_DELAY of somewhere between 18-20 is the magic range where CVS will average nearly 200 books per hour without hitting the limit. Letting it run overnight with theses settings is really the only way to go at this point for large scraping sessions