Node classification - Githubissues

siri-ius commented 4 years ago

Hi,

I would like to replicate the results on the node classification task. Could you guide me on what should I do in order to replicate your results?

Thank you! Looking forward to your answer!

panda0881 commented 4 years ago

You need to download the data following the instructions in the readme and then train the embedding with the provided code. And then you can train a simple classifier based on the embeddings.

siri-ius commented 4 years ago

Thanks for your answer! I have downloaded the same dataset you have used for node classification (CKM) but I cannot reproduce it. So, please could you guide me how to run it? Also, I would like to ask if there is a link with a solution for the problem with gensim and C compiler. That would be very helpful. Thank you!

panda0881 commented 4 years ago

Hi,

Which part is bugging you? Can you generate the embeddings? I currently do not have the code for the classification on my new laptop, but I know someone who has just repeated the results this summer, maybe I can point him to you.

Best Regards, Hongming

On Wed, Sep 23, 2020 at 5:53 PM Ylli notifications@github.com wrote:

Thanks for your answer! I have downloaded the same dataset you have used for node classification (CKM) but I cannot reproduce it. So, please could you guide me how to run it? Also, I would like to ask if there is a link with a solution for the problem with gensim and C compiler. That would be very helpful. Thank you!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/HKUST-KnowComp/MNE/issues/27#issuecomment-697992226, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHNOSAEBP67ERIAKGV5M6TSHJU5XANCNFSM4RV3K22A .

siri-ius commented 4 years ago

Hi,

I have found a paper that was published this year, DGMI, and the authors have published results for node clustering, if that is what you are talking about. Yes, I can generate embeddings, but right now I have a problem with gensim, it is running so slowly, this is my main issue. If you could help with this that would be great!

Thank you!

panda0881 commented 4 years ago

Oh, the gensim part is slow because I didn't change the C++ code and replace the optimization part with Python.

I think with the current speed, I was able to handle less than 1 million nodes in 2-3 days on a single machine, if you have more, you need to either use a cluster of machines or revise the C++ code.

Best Regards, Hongming

On Wed, Sep 23, 2020 at 8:26 PM Ylli notifications@github.com wrote:

Hi,

I have found a paper that was published this year, DGMI, and the authors have published results for node clustering, if that is what you are talking about. Yes, I can generate embeddings, but right now I have a problem with gensim, it is running so slowly, this is my main issue. If you could help with this that would be great!

Thank you!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/HKUST-KnowComp/MNE/issues/27#issuecomment-698040576, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHNOSECGE6DMIQWD6HUW6LSHKG2PANCNFSM4RV3K22A .

empty-id commented 4 years ago

Hi @panda0881 , I am also looking for the code for reproducing node classification result. At least, could you provide the CKM dataset (2-fold cross validation version)?

By the way, @siri-ius what is DGMI? Could you provide a reference?

Thank you!

panda0881 commented 4 years ago

Hi Marcin,

Sorry to bother you again. It seems like others also have problems reproducing the classification results and need the code.

Since I do not have it on my new laptop, I am wondering by any chance do you still keep it? Maybe you pull another pull request and then I can merge them?

Best Regards, Hongming

On Sun, Oct 25, 2020 at 9:34 PM null-id notifications@github.com wrote:

Hi @panda0881 https://github.com/panda0881 , I am also looking for the code for reproducing node classification result. At least, could you provide the CKM dataset (2-fold cross validation version)?

By the way, @siri-ius https://github.com/siri-ius what is DGMI? Could you provide a reference?

Thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HKUST-KnowComp/MNE/issues/27#issuecomment-716255109, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHNOSENMNUZF34YAWNDYB3SMTGZZANCNFSM4RV3K22A .

siri-ius commented 4 years ago

Hi @null-id ,

actually it is DMGI, a published paper. Here is a link to github: https://github.com/pcy1302/DMGI.

Best!

empty-id commented 4 years ago

Thank you @siri-ius !

mpietrasik commented 4 years ago

I never reproduced the exact results from the paper since the paper uses a logistic regression classifier and I used an SVM classifier. If you want to test the embeddings on an SVM classifier you could try something like this:

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score

embeddings = np.load('embeddings.npy')
classes = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

assert embeddings.shape[0] == classes.shape[0]

training_input, testing_input, training_output, testing_output = train_test_split(embeddings, classes, test_size=0.5)

classifier = svm.SVC(decision_function_shape='ovo', kernel='linear')
classifier.fit(training_input, training_output)

prediction = classifier.predict(testing_input)

print(accuracy_score(testing_output, prediction))

First run train_model.py on the Vickers-Chan dataset provided. Then run the code above on the learned embeddings. The classification accuracy should be between 90% and 100% with an average around 97%. These results are consistent with other models found in the literature.

As far as the CKM dataset, I never ended up using it in my work. If I recall correctly, there were issues with misaligned nodes and various ways of interpreting ground truth node labels.

panda0881 commented 4 years ago

Dear Marcin,

Thank you so much for the reply!

Best Regards, Hongming

On Mon, 26 Oct 2020, 6:37 pm Marcin Pietrasik, notifications@github.com wrote:

I never reproduced the exact results from the paper since the paper uses a linear regression classifier and I used an SVM classifier. If you want to test the embeddings on an SVM classifier you could try something like this:

import numpy as np from sklearn.model_selection import train_test_split from sklearn import svm from sklearn.metrics import accuracy_score

embeddings = np.load('embeddings.npy') classes = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

assert embeddings.shape[0] == classes.shape[0]

training_input, testing_input, training_output, testing_output = train_test_split(embeddings, classes, test_size=0.5)

classifier = svm.SVC(decision_function_shape='ovo', kernel='linear') classifier.fit(training_input, training_output)

prediction = classifier.predict(testing_input)

print(accuracy_score(testing_output, prediction))

First run train_model.py on the Vickers-Chan dataset provided. Then run the code above on the learned embeddings. The classification accuracy should be between 90% and 100% with an average around 97%. These results are consistent with other models found in the literature.

As far as the CKM dataset, I never ended up using it in my work. If I recall correctly, there were issues with misaligned nodes and various ways of interpreting ground truth node labels.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HKUST-KnowComp/MNE/issues/27#issuecomment-716861762, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHNOSGPLSPIRAIR2GVCEWDSMX22LANCNFSM4RV3K22A .

HKUST-KnowComp / MNE

Node classification #27