Open Skylion007 opened 8 years ago
Using the internet archive as a secondary source is a good idea as it (currently) indexes MAL. The package does not currently allow this as it is still very much WIP. I plan to add this as a post version 1 feature.
The Wayback Machine supports a simple API allowing you to discover the history of a page and the right URLs to use. The page itself should parse similarly.
"Memento" may offer additional sources/metadata.
So I know many people have trouble due to the fact that MyAnimeLIst no longer whitelists new IP addresses from their antiDDOS software which leads to many people struggling to scrape data of the website. An workaround I discovered is to access the website's archive.org backup instead of the website itself. Does this package allow you to do this? If not, it doesn't seem like it'd be a very difficult to add as a nice feature. You could even update archive.org's backup by requesting that pages that haven't been indexed by the wayback machine are added (through archive.org's API).