evaluation of embeddings

boersmamarcel commented 5 years ago

When we have good embeddings we should yield useful clusters, in one paper (Zhang et al: Learning Node Embeddings in Interaction Graphs) I found the following paragraph describing that we can evaluate the performance of the clusters:

Clustering. We first use K-Means to test embeddings on the unsupervised task. We use Normalized Mutual Information (NMI) [23] score to evaluate clustering results. The NMI score is between 0 and 1. The larger the value, the better the performance. A labeling will have score 1 if it matches the ground truth perfectly, and 0 if it is completely random. Since entities in the Yelp dataset are multi-labeled, we ignore the entities that belong to multiple categories when calculate NMI score.

with our toy-set we can create ground-truth labels and evaluate the embedding technique. We can even compare this with directly applying other techniques (metapath2vec,deepwalk etc)

for the real data-sets no ground truth is known hence we must describe it in a different way.

boersmamarcel commented 5 years ago

@AlexWorldD if you have other metric in mind, then I'm open for suggestions.

boersmamarcel commented 5 years ago

It would be nice if we can get a notebook which evaluates the results of the toy-example. Present:

[x] couple (3 would be okay) of examples processes and the similar processes as indicated by the algorithm
[x] t-sne plot with labels in colours (hopefully we get some nice groups)

boersmamarcel commented 5 years ago

Can we apply a clustering algorithm; In this case we put in, for example, 10-core processes we added realistic noise. If I apply a clustering algorithm, would I then be able to find the 10-core processes correctly?

boersmamarcel commented 5 years ago

I would like to use this to evaluate the clusters obtained in real data, when the processes are classified as one core-processes or another then I can discuss the obtained clusters with an expert.

AlexWorldD / NetEmbs

evaluation of embeddings #7