ContextLab / hypertools

A Python toolbox for gaining geometric insights into high-dimensional data
http://hypertools.readthedocs.io/en/latest/
MIT License
1.83k stars 160 forks source link

text download idea #187

Open jeremymanning opened 6 years ago

jeremymanning commented 6 years ago

We could add a function to automatically download text for a given keyword (or set of keywords) from one of a pre-selected set of sources (e.g. wikipedia, twitter, youtube comments, etc.). E.g.:

words = hyp.tools.textlookup(['apple', 'banana', 'cat', 'dog'], text_source='twitter', text_params={'n_tweets': 1000}) returns a list (length 4) of lists of tweets about each topic. Or

words = hyp.tools.textlookup(['apple', 'banana', 'cat', 'dog'], text_source='wikipedia') would return a list (length 4) of strings, each containing the text from the corresponding wikipedia article.

Both of these should be able to be plotted or analyzed via hyp.plot(words).

As a shortcut, the user could also pass: hyp.plot(['apple', 'banana', 'cat', 'dog'], text_source='twitter'), which would (with that one command):

Example twitter scraper: https://github.com/ContextLab/storytelling-with-data/blob/master/data-stories/twitter-finance/twitter-finance.ipynb