Data4Democracy / far-right-analysis

Analysis related to the behavior of extreme far right online communities
35 stars 10 forks source link

Exploratory NLP on Breitbart articles #9

Open sjacks26 opened 7 years ago

sjacks26 commented 7 years ago

@bstarling put together a notebook that explains how to access Breitbart articles to do analysis in Python. If you need access to the data.world dataset, ping @jonathon or @sharon in Slack

To start analysis, here are some basic NLP ideas:

  1. Generate word counts in article leads across the whole datasets (filtering stopwords) 1b. Generate word counts (and/or tf-idf) for article leads sorted by category 1c. Generate word counts (and/or tf-idf) for article leads sorted by author (possibly excluding Breitbart News and Breitbart TV)

  2. Search article leads for keywords of interest (Trump, Putin, alt-right, pepe, etc.) 2b. Plot number of article leads with a given keyword over time (for example, number of article leads mentioning Trump by week)

  3. Search for trends in links to other website (for example, are there more links to nytimes.com during national political campaigns?)

Have more ideas for this dataset? Post them here, or propose them in #far-right in Slack.

GuiMarthe commented 7 years ago

Will start working on it right now!

brucerowan commented 7 years ago

@GuiMarthe @sjacks26 Is this still in progress?

GuiMarthe commented 7 years ago

I've done 1 but didn't send a PR for what ever reason!

gati commented 7 years ago

Nice! Can you submit a PR? I'd love to include it!

On Tue, Jul 18, 2017, 7:29 AM Guilherme Marthe notifications@github.com wrote:

I've done 1 but didn't send a PR for what ever reason!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Data4Democracy/far-right-analysis/issues/9#issuecomment-316081924, or mute the thread https://github.com/notifications/unsubscribe-auth/AAhLi-poxVq89Jaan94ydifkZFeR87TJks5sPMFDgaJpZM4L-XbJ .