Data4Democracy / far-right-analysis

Analysis related to the behavior of extreme far right online communities
35 stars 10 forks source link

Collect data from alt-right Facebook pages #21

Open gati opened 7 years ago

gati commented 7 years ago

Using collect-social (https://github.com/data4democracy/collect-social), or a library you're more familiar with (@gati/@jonathon in Slack can help with this), grab the posts, comments, users, etc from far-right/alt-right Facebook pages. A list of pages associated with hate groups, holocaust deniers, etc, is attached.

Fair warning: the content will be unpleasant.

Ideally you'll drop the data in a sqlite database, series of JSON files, CSV files, or another format that's easy for you, and we'll upload to data.world!

gati commented 7 years ago

Facebook.Submission.Rd.1.Listed.Hate.Groups.xlsx

ybot1122 commented 7 years ago

Looking into this. I think can raise a PR by end of weekend.

gati commented 7 years ago

Sounds awesome @ybot1122! Just let me know if I can help with anything

ybot1122 commented 7 years ago

Sorry for the lateness. Hope it can still be used for future analysis. I have very hacky, unoptimized python script which can take an array of PageIDs and simply return a flat array of all the statuses, feed posts, and comments from the page.

output directory structure:

output/
    Holy Nation of Odin/
        feed.txt
        statuses.txt
        comments.txt

.txt file structure

[
    "Latest Radio Series is now online. Wilmot Robertson\u2019s The Dispossessed Majority \u2013 Part 2\nhttp://tinyurl.com/a2z-radio",
    "Check out the latest Radio Show - I discuss Wilmot Robertson\u2019s 1972 book, \u201cThe Dispossessed Majority.\u201d \n\nhttp://tinyurl.com/a2z-radio"
]

Deliverables

the script: https://gist.github.com/ybot1122/93c216072d9564ea99250b33fecf6680 the output for the pages listed: https://s3-us-west-2.amazonaws.com/random-stuff-toby/output.zip (P.S. Repent Amarillo returned a page not found)