Data4Democracy / assemble

NOT AN ACTIVE PROJECT -- Check readme for data sources
MIT License
36 stars 27 forks source link

Community detection using spectral matrix analysis and clustering #23

Open henripal opened 7 years ago

henripal commented 7 years ago

The idea here is to treat the graph matrix as a feature matrix and to use traditional dimension reduction/clustering techniques on these features.

An example workflow would be:

good testing ground is the twitter #far-right data.

also check out great post and tutorial by @alejandrox1 https://github.com/Data4Democracy/discursive/issues/4 https://github.com/Data4Democracy/tutorials

ashkan-leo commented 7 years ago

I can start working on this one. I'm thinking doing spectral clustering on graph Laplacian (instead of the adjacency matrix itself). How are we going to test the algorithm though? (do we have the labels?) I don't know where to find the #far-right data.

bstarling commented 7 years ago

@ashkan-leo added you to github org so I can assign this one to you. Please ping me on slack @bstarling to get the far-right data.

henripal commented 7 years ago

@ashkan-leo we don't have labels. How to evaluate the results is a great question. We could be rank users (by number of followers or PageRank) then try to manually identify some communities using the top ranked users as a guideline & comparing to the algorithmically generated communities?

gvdr commented 7 years ago

Hi all.

My personal taste for large scale linear algebra problems is to first give it a go with Julia. The base svds is as powerful as I like it. shttp://docs.julialang.org/en/stable/stdlib/linalg/#Base.svds

We also have improved algos for large networks through https://github.com/nassarhuda/MatrixNetworks.jl and https://github.com/JuliaGraphs/LightGraphs.jl

gvdr commented 7 years ago

e.g., truncated SVD (10 singular values computed) on a sparse 45600x45600 matrix on my laptop: 16.078000 seconds (3.13 M allocations: 1.117 GB, 0.80% gc time)

henripal commented 7 years ago

@gvdr I'm prototyping in julia as well and love it; but definitely not a problem if anyone else want to prototype in their favorite language either at this stage, I guess?

gvdr commented 7 years ago

Absolutely! I was thinking in terms of infrastructures: if we end setting up a virtual environment where to do analysis (wherever it is) let's make it open to Julia as well, not only python and R ;-)

bstarling commented 7 years ago

I reached out to the eventador folks about adding Julia kernel to the exiting notebook (they already added R). Next question will probably be in regards to packaging. Do you have a list of most common that you would want pre install you can post in channel or DM me? FYI domino who has donated compute infrastructure has Julia kernel as well.

JoeMcEwen commented 6 years ago

Curious, is this still being worked on? I am interested in helping.