Open redouane-dziri opened 4 years ago
I have read papers on Co-Clustering. Co-Clustering is a field that tries to cluster unlabeled data but also the features used by the data point. A good example is Text Documents: each document is composed of words. The idea is to cluster some documents together WITH some features. If we see that problem as a Bipartite Graph then it is a partitioning of the bipartite graph with a minimum cut.
Here are the papers I have read:
I have implemented the first two, but that will go on another issue.
The last paper mentioned is really interesting since it creates a new graph with exactly k connected components that will be our k clusters. It is a very beautiful article.
The first two introduce very well the linear algebra of graphs, and especially bipartite ones.
Probably a must-read for everyone in the team :
A LITERATURE STUDY OF EMBEDDINGS ON SOURCE CODE
https://arxiv.org/pdf/1904.03061.pdf
Will comment later with my thoughts
Github implementations of code embeddings that work for C/C++ that stand out from the review :
https://github.com/defreez-ucd/func2vec-fse2018-artifact
3.Using RNN on Contextual Flow Graphs : https://github.com/spcl/ncc
We should all keep reading on what other people are doing in similar problems and link articles here, with fresh ideas.
Hoping to get Yorgos' Deep Learning references sometime soon to get cracking on that front if it rocks anyone's boat :)