Zhongdao / gcn_clustering

Code for CVPR'19 paper Linkage-based Face Clustering via GCN
MIT License
360 stars 86 forks source link

pred will contain nan if k_at_hop is large #6

Open luzai opened 5 years ago

luzai commented 5 years ago

Thank you very much for your inspiring work!

As suggested in the paper, "In the testing phase, it is not necessary to keep the same configuration with the training phase. ", setting k_at_hop=[20,5] of test.py is reasonable for fast testing. But the pred and loss seem to become nan if k_at_hop=[200,10]. May I ask whether this phenomena is reproduced on your side, and why nan occurs?

luzai commented 5 years ago

I find that it is related to here. On test dataset, there may exists some nodes not connecting to any other nodes, causing A/D contains some entry of 1/0. May I ask your advice for this phenomena? What is your consideration to avoid self-loop in graph convolution?

Zhongdao commented 5 years ago

Sorry for my late reply. We follow GraphSAGE on the design of GCN. The reason why we abandon self-loop is that we want to separate the information of the node and its neighbors.

In your case, I think you can simply ignore such nodes (that are not connected to any others). It means the local context cannot be found, so there is no need to predict the linkage with other nodes.

luzai commented 5 years ago

Thank you very much for your advice! It is a good idea to ignore linkage of nan for large k_at_hop and indeed k_at_hop=[20,5] is enough for high performance clustering.

xxx2974 commented 5 years ago

你好,我想问一下,当k_at_hop=[20,5]时,one-hop节点指的应该是knn_graph上的前20个吧?那么当k_at_hop=[50,5]时,one-hop节点就应该是knn_graph上的前50个吧? 但是我发现,在用edge_labels值来表示one-hop节点是否与与中心节点的id相同时,k_at_hop=[50,5]比k_at_hop=[20,5]对应的edge_labels里面的1值更靠后了,这是为什么呢?

@luzai @Zhongdao