Closed siri-ius closed 4 years ago
You need to download the data following the instructions in the readme and then train the embedding with the provided code. And then you can train a simple classifier based on the embeddings.
Thanks for your answer! I have downloaded the same dataset you have used for node classification (CKM) but I cannot reproduce it. So, please could you guide me how to run it? Also, I would like to ask if there is a link with a solution for the problem with gensim and C compiler. That would be very helpful. Thank you!
Hi,
Which part is bugging you? Can you generate the embeddings? I currently do not have the code for the classification on my new laptop, but I know someone who has just repeated the results this summer, maybe I can point him to you.
Best Regards, Hongming
On Wed, Sep 23, 2020 at 5:53 PM Ylli notifications@github.com wrote:
Thanks for your answer! I have downloaded the same dataset you have used for node classification (CKM) but I cannot reproduce it. So, please could you guide me how to run it? Also, I would like to ask if there is a link with a solution for the problem with gensim and C compiler. That would be very helpful. Thank you!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/HKUST-KnowComp/MNE/issues/27#issuecomment-697992226, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHNOSAEBP67ERIAKGV5M6TSHJU5XANCNFSM4RV3K22A .
Hi,
I have found a paper that was published this year, DGMI, and the authors have published results for node clustering, if that is what you are talking about. Yes, I can generate embeddings, but right now I have a problem with gensim, it is running so slowly, this is my main issue. If you could help with this that would be great!
Thank you!
Oh, the gensim part is slow because I didn't change the C++ code and replace the optimization part with Python.
I think with the current speed, I was able to handle less than 1 million nodes in 2-3 days on a single machine, if you have more, you need to either use a cluster of machines or revise the C++ code.
Best Regards, Hongming
On Wed, Sep 23, 2020 at 8:26 PM Ylli notifications@github.com wrote:
Hi,
I have found a paper that was published this year, DGMI, and the authors have published results for node clustering, if that is what you are talking about. Yes, I can generate embeddings, but right now I have a problem with gensim, it is running so slowly, this is my main issue. If you could help with this that would be great!
Thank you!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/HKUST-KnowComp/MNE/issues/27#issuecomment-698040576, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHNOSECGE6DMIQWD6HUW6LSHKG2PANCNFSM4RV3K22A .
Hi @panda0881 , I am also looking for the code for reproducing node classification result. At least, could you provide the CKM dataset (2-fold cross validation version)?
By the way, @siri-ius what is DGMI? Could you provide a reference?
Thank you!
Hi Marcin,
Sorry to bother you again. It seems like others also have problems reproducing the classification results and need the code.
Since I do not have it on my new laptop, I am wondering by any chance do you still keep it? Maybe you pull another pull request and then I can merge them?
Best Regards, Hongming
On Sun, Oct 25, 2020 at 9:34 PM null-id notifications@github.com wrote:
Hi @panda0881 https://github.com/panda0881 , I am also looking for the code for reproducing node classification result. At least, could you provide the CKM dataset (2-fold cross validation version)?
By the way, @siri-ius https://github.com/siri-ius what is DGMI? Could you provide a reference?
Thank you!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HKUST-KnowComp/MNE/issues/27#issuecomment-716255109, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHNOSENMNUZF34YAWNDYB3SMTGZZANCNFSM4RV3K22A .
Hi @null-id ,
actually it is DMGI, a published paper. Here is a link to github: https://github.com/pcy1302/DMGI.
Best!
Thank you @siri-ius !
I never reproduced the exact results from the paper since the paper uses a logistic regression classifier and I used an SVM classifier. If you want to test the embeddings on an SVM classifier you could try something like this:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score
embeddings = np.load('embeddings.npy')
classes = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
assert embeddings.shape[0] == classes.shape[0]
training_input, testing_input, training_output, testing_output = train_test_split(embeddings, classes, test_size=0.5)
classifier = svm.SVC(decision_function_shape='ovo', kernel='linear')
classifier.fit(training_input, training_output)
prediction = classifier.predict(testing_input)
print(accuracy_score(testing_output, prediction))
First run train_model.py
on the Vickers-Chan dataset provided. Then run the code above on the learned embeddings. The classification accuracy should be between 90% and 100% with an average around 97%. These results are consistent with other models found in the literature.
As far as the CKM dataset, I never ended up using it in my work. If I recall correctly, there were issues with misaligned nodes and various ways of interpreting ground truth node labels.
Dear Marcin,
Thank you so much for the reply!
Best Regards, Hongming
On Mon, 26 Oct 2020, 6:37 pm Marcin Pietrasik, notifications@github.com wrote:
I never reproduced the exact results from the paper since the paper uses a linear regression classifier and I used an SVM classifier. If you want to test the embeddings on an SVM classifier you could try something like this:
import numpy as np from sklearn.model_selection import train_test_split from sklearn import svm from sklearn.metrics import accuracy_score
embeddings = np.load('embeddings.npy') classes = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
assert embeddings.shape[0] == classes.shape[0]
training_input, testing_input, training_output, testing_output = train_test_split(embeddings, classes, test_size=0.5)
classifier = svm.SVC(decision_function_shape='ovo', kernel='linear') classifier.fit(training_input, training_output)
prediction = classifier.predict(testing_input)
print(accuracy_score(testing_output, prediction))
First run train_model.py on the Vickers-Chan dataset provided. Then run the code above on the learned embeddings. The classification accuracy should be between 90% and 100% with an average around 97%. These results are consistent with other models found in the literature.
As far as the CKM dataset, I never ended up using it in my work. If I recall correctly, there were issues with misaligned nodes and various ways of interpreting ground truth node labels.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HKUST-KnowComp/MNE/issues/27#issuecomment-716861762, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHNOSGPLSPIRAIR2GVCEWDSMX22LANCNFSM4RV3K22A .
Hi,
I would like to replicate the results on the node classification task. Could you guide me on what should I do in order to replicate your results?
Thank you! Looking forward to your answer!