I got problems about the BlogCatalog dataset

DEEP-PolyU / AANE_Python

Accelerated Attributed Network Embedding, SDM 2017

50 stars 17 forks source link

I got problems about the BlogCatalog dataset #3

Open Tomposon opened 5 years ago

Tomposon commented 5 years ago

I try to run the Runme.py, in which the BlogCatalog set is trained. But when I used the embedding for node classification, the performance was terrible. The micro f1 was around 0.2, Why?

xhuang31 commented 5 years ago

Thanks for your interest. Did you make the "Indices" in your evaluation be consistent with the one in the embedding learning?

Thanks.

Tomposon commented 5 years ago

my label file follow the id of node. From 0 to... what the order does the Embedding.mat file follow in your source code?

xhuang31 commented 5 years ago

CombG = G[Group1+Group2, :][:, Group1+Group2] The order in the Embedding.mat follows the "Group1+Group2".

It is for evaluation. Sorry for the confusion. I just directly release the code in my evaluation. I will update it when I get time.

Thanks.

Tomposon commented 5 years ago

Thank you very much.I have another question. In your source code, dose the all network data used to train rather than “remove the edges between train data and test dat” mentioned in your paper.

xhuang31 commented 5 years ago

Yes. CombG = G[Group1+Group2, :][:, Group1+Group2]

We use the whole network to train Embedding.mat. After getting Embedding.mat, you could do cross validation on it.

Thanks.

Tomposon commented 5 years ago

I made the "Indices" in my evaluation be consistent with the one in the embedding learning, but the performance of Flickr dataset was lower than the paper. I just used the default parameters in your implementation. @xhuang31

xhuang31 commented 5 years ago

How about the BlogCatalog. I use the SVM in Matlab to perform the classification in my papers.

As long as you use the same classifier, you will get similar results for AANE and baselines. They may become worser together, but relatively AANE would outperforms baselines in general.

Thanks.

Tomposon commented 5 years ago

I also used linear svm, i used 30% of BlogCatalog nodes to train classifier, and the micro-f1 is around 0.82. @xhuang31 Thank for your attention.

xhuang31 commented 5 years ago

It is five-fold cross-validation. Should be 80% of data for training. Plz check the paper. Thanks.