Lab41 / Circulo

Community Detection Research Effort
http://lab41.github.io/Circulo/
Other
79 stars 39 forks source link

Trying to identify algorithm types mapping to data types #18

Open Lab41PaulM opened 10 years ago

ymt123 commented 10 years ago

Components:

1) Looking at comparing the different algorithms using omega score

2) Try to understand what (if any) properties of a graph determine which class of algorithm performs well

ymt123 commented 9 years ago

Addressing item 1: Implemented the relative omega scores creating a N^2 matrix of relative distances. An example for the flights data is attached (the diagonal is the comparison against ground truth): image I then used spectral clustering on the resulting distance matrix to determine how often (across all the datasets) that two algorithms ended up in the same cluster (varying the # of clusters that the algorithm was looking for as well). The results of that is a proxy for how similar two datasets are. The top couple are here: clauset_newman_moore--fastgreedy 50 leading_eigenvector--multilevel 31 conga--congo 27 infomap--spinglass 27 multilevel--walktrap 26 label_propagation--radicchi_strong 25 multilevel--spinglass 25

I also looked at how often different algorithm results got clustered with the ground truth results: groundtruth--infomap 20 groundtruth--radicchi_weak 14 groundtruth--multilevel 12 fastgreedy--groundtruth 11 groundtruth--radicchi_strong 11

Ultimately I'm not sure about using relative omega score for distance. I that it might be more relevant to compare community by community rather than results set vs results set.