appledaily needs new mechanism

disinfoRG / ZeroScraper

Web scraper made by 0archive.

MIT License

10 stars 2 forks source link

Appledaily has changed their website mechanism. Now the list of articles are loaded dynamically, and hence we could not get any new articles because a page would stuck like this:

I have tried using selenium, however it makes the current login failed. So we need to look into if there's a better way to resolve this problem.

One possible approach is to use selenium to collect article urls, and then use another spider to login and grab the content of articles.

Or we might be able to use their api, an example api is https://tw.appledaily.com/pf/api/v3/content/fetch/collections?query={"id":"xxx","website":"tw-appledaily"}&d=70&_website=tw-appledaily. The "id" parameter is required to use this api and hence we need to figure out how to get that id.

disinfoRG / ZeroScraper

appledaily needs new mechanism #111