Data4Democracy / far-right-analysis

Analysis related to the behavior of extreme far right online communities
35 stars 10 forks source link

Exploratory analysis of 4chan forum posts #20

Open gati opened 7 years ago

gati commented 7 years ago

The D4D community has acquired a few million recent 4chan posts. This issue is to explore that data, using techniques such as topic modeling, network analysis, social media analytics (who posts most often, what times of day are popular, whose posts get the most replies, average comment thread lengths, etc).

Fair warning: This content will likely be a little gross, because it's 4chan.

C-Hipple commented 7 years ago

I'll be working on this today, slack handle @hipplec.

Is this one of the datasets on the s3 bucket or one of the private data.world sets? If anyone is interested in progress throughout the day I'll make sure to push my commits to my fork often.

gati commented 7 years ago

Awesome, thanks @C-Hipple! We haven't been sure what's valuable/interesting in the 4chan data, so it'll be really helpful to have a sense of what's in there.

C-Hipple commented 7 years ago

I put a pr to the assemble repo here with two notebooks, one for addressing this issue on exploring/cleaning the sample 4chan dataset in the s3 bucket and another for aggregating the bidaily scrapes of json files in the bucket into a dataframe to make analysis more simple for others.

https://github.com/Data4Democracy/assemble/pull/60