Open Lab41PaulM opened 10 years ago
Addressing item 1: Implemented the relative omega scores creating a N^2 matrix of relative distances. An example for the flights data is attached (the diagonal is the comparison against ground truth): I then used spectral clustering on the resulting distance matrix to determine how often (across all the datasets) that two algorithms ended up in the same cluster (varying the # of clusters that the algorithm was looking for as well). The results of that is a proxy for how similar two datasets are. The top couple are here: clauset_newman_moore--fastgreedy 50 leading_eigenvector--multilevel 31 conga--congo 27 infomap--spinglass 27 multilevel--walktrap 26 label_propagation--radicchi_strong 25 multilevel--spinglass 25
I also looked at how often different algorithm results got clustered with the ground truth results: groundtruth--infomap 20 groundtruth--radicchi_weak 14 groundtruth--multilevel 12 fastgreedy--groundtruth 11 groundtruth--radicchi_strong 11
Ultimately I'm not sure about using relative omega score for distance. I that it might be more relevant to compare community by community rather than results set vs results set.
Components:
1) Looking at comparing the different algorithms using omega score
2) Try to understand what (if any) properties of a graph determine which class of algorithm performs well