Use Wayback Machine Archive to Prevent Accidental AntiDDOS Blocking During Scraping

QasimK / mal-scraper

MyAnimeList web scraper is a Python library for gathering data for analysis

MIT License

19 stars 9 forks source link

So I know many people have trouble due to the fact that MyAnimeLIst no longer whitelists new IP addresses from their antiDDOS software which leads to many people struggling to scrape data of the website. An workaround I discovered is to access the website's archive.org backup instead of the website itself. Does this package allow you to do this? If not, it doesn't seem like it'd be a very difficult to add as a nice feature. You could even update archive.org's backup by requesting that pages that haven't been indexed by the wayback machine are added (through archive.org's API).

QasimK / mal-scraper

Use Wayback Machine Archive to Prevent Accidental AntiDDOS Blocking During Scraping #12