clips / pattern

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
https://github.com/clips/pattern/wiki
BSD 3-Clause "New" or "Revised" License
8.74k stars 1.58k forks source link

Integrate API changes (pattern.web) #181

Open markus-beuckelmann opened 7 years ago

markus-beuckelmann commented 7 years ago

Some of the APIs for services (such as Bing Search API, Google Translate API and Facebook) supported by pattern.web are deprecated, require a paid subscription or have changed in some other way.

  1. Bing yields URLError: Invalid header value 'BasicO lZuSkVLNEhUbG50RTNTeUY1OFFMa1VDTHAvNzh0a1lqVjFGbDNKN2xIYTA9\n' when running the example in examples/01-web/03-bing.py.
  2. Google (Translate) yields HTTP401Authentication: Google translate API is a paid service when running the example in examples/01-web/02-google-translate.py.
    • In fact, the Google Cloud Translation API is a paid service now and apparently has no free quota.
  3. Facebook yields HTTP401Authentication: when running the example in examples/01-web/11-facebook.py.
    • Seems that Facebook Graph API has changed.
  4. Yahoo: HTTP401Authentication: Yahoo search API is a paid service
    • The BOSS JSON Search API is discontinued since March 31, 2016.

We should find out if any of the authentication errors are simply related to outdated tokens that simply need to be updated or if they actually require a (new) subscription. If they require a subscription, we should find out whether they offer a free quota (e.g. 100 request/day or something). If not, users will have to deal with license keys themselves, in this case we should raise some exception when no license key is provided.

dsaw commented 5 years ago

The Bing API is outdated and been replaced by Bing Web Search v7. Access keys are required to search which can can be retrieved after opening a free account. I would like to update the API and open a PR soon. Note that another account would be needed for testing.