Data4Democracy / discursive

Twitter topic search and indexing with Elasticsearch
21 stars 11 forks source link

Community detection #4

Closed hadoopjax closed 7 years ago

hadoopjax commented 7 years ago

There are lots of ways to do community detection using Twitter data. We'll want to discuss the nuts-and-bolts on Slack but once we select an implementation we like we can track progress here. There's lots of neat emerging research we could try out, too (i.e. https://arxiv.org/pdf/1608.01771v1.pdf)!

alejandrox1 commented 7 years ago

I would like to help on this. I see this is a relatively old issue, Is there already something set up?

hadoopjax commented 7 years ago

Hi @alejandrox1 nope this one's just on the list but not yet started. I'll DM you on Slack to talk about getting started!

alejandrox1 commented 7 years ago

Hello, time to get started!

There are many ways to get this started, there are different consensus on the best tools/methods to use and what data is most important for community detection in social media. I have included a couple references I found interesting in here: https://github.com/alejandrox1/References

First of all, I would like to encourage everyone contribute whatever references you have found interesting so that we can have them all in one place. I think it would be best if everyone wanting to contribute tried to replicate the results from one of the reference materials. By each one of us working to try and replicate the work that has already been done we can all learn by actually doing - while having a benchmark for comparison - and by maintaining communication through Github and Slack it will become obvious what the common issues, the benefits, and the shortcomings of the different methods are.

In terms of possible projects within this issue are those related to building networks, visualization, analysis, prediction, and performance.

To get started on any of these topics check out these tutorials:

These tutorials briefly cover how to build networks, visualize them, some measures that can be used to analyze the network, and link prediction @nick and @grichardson are working on building graphs.

Also, there is the library community, which works on top of networkx and is used for community detection: https://bitbucket.org/taynaud/python-louvain

Graph Databases

acompa commented 7 years ago

Hey there! I might be able to help out with this -- I worked on both the algorithmic and engineering sides of community detection at Scale Model. I'll share some scattered thoughts below.

We used friendships between users to build graphs (eg. user A follows user B => A -> B), although we had to drop directionality since, IIRC, Louvain (which we also used) cannot partition directed graphs.

I've used both igraph and networkx for building and partitioning Twitter subgraphs. Note that igraph is actually a C-optimized graph library similar to networkx. You'll find networkx to be easy but slow, while igraph has a more esoteric API that runs way faster.

I actually think the best large-scale graph solution is something like GraphX, while igraph is best for partitioning smaller graphs efficiently (if we don't have money to throw at this problem, like your average startup :) ).

Feel free to message me on D4D Slack (achompas) or on here if you have any specific questions.

hadoopjax commented 7 years ago

I'm closing this issue as it has moved to Assemble