adrinta / MAGNET

MAGNET: Multi-Label Text Classification using Attention-based Graph Neural Network
55 stars 11 forks source link

the entire code #2

Open haoc111 opened 3 years ago

haoc111 commented 3 years ago

Hello, thanks you for your great work! Now,I am doing research on multi label classification. Can you upload the entire code structure? I want to reproduce your work! Thank you!

adrinta commented 3 years ago

@haoc111 Sorry for slow respon, i just update this repository. You can run this project by running train.ipynb. But i am stll struggling with how to use BERT as embedding. I try using BERT as contextualized embedding and static embedding but the result is far from similar like in the paper. Another problem is the official reuters dataset with 10788 data points that i get from NLTK has not same split like in the paper. As the paper said that it has 8630 - 2158 data split, and the official split from NLTK that i get is 7769 - 3019. Also if you want to use glove or another embedding you have to download by yourself in the official website.

If you have another question or you have an answer for my problem implementing the paper, please let me know.

Thank you

haoc111 commented 3 years ago

@haoc111 Sorry for slow respon, i just update this repository. You can run this project by running train.ipynb. But i am stll struggling with how to use BERT as embedding. I try using BERT as contextualized embedding and static embedding but the result is far from similar like in the paper. Another problem is the official reuters dataset with 10788 data points that i get from NLTK has not same split like in the paper. As the paper said that it has 8630 - 2158 data split, and the official split from NLTK that i get is 7769 - 3019. Also if you want to use glove or another embedding you have to download by yourself in the official website.

If you have another question or you have an answer for my problem implementing the paper, please let me know.

Thank you

Thanks for your sharing.I will try to use your code to reproduce your work. Now,I am still research on it.I'll share with you if I make any progress. Thanks again for your sharing.

haoc111 commented 3 years ago

@haoc111 Sorry for slow respon, i just update this repository. You can run this project by running train.ipynb. But i am stll struggling with how to use BERT as embedding. I try using BERT as contextualized embedding and static embedding but the result is far from similar like in the paper. Another problem is the official reuters dataset with 10788 data points that i get from NLTK has not same split like in the paper. As the paper said that it has 8630 - 2158 data split, and the official split from NLTK that i get is 7769 - 3019. Also if you want to use glove or another embedding you have to download by yourself in the official website.

If you have another question or you have an answer for my problem implementing the paper, please let me know.

Thank you

Thanks for your code.I have reproduced your code. I would like to ask how the multilabelbinarizer.pickle file is generated. Hope to get your answer.

adrinta commented 3 years ago

@haoc111 Sorry for slow respon, i just update this repository. You can run this project by running train.ipynb. But i am stll struggling with how to use BERT as embedding. I try using BERT as contextualized embedding and static embedding but the result is far from similar like in the paper. Another problem is the official reuters dataset with 10788 data points that i get from NLTK has not same split like in the paper. As the paper said that it has 8630 - 2158 data split, and the official split from NLTK that i get is 7769 - 3019. Also if you want to use glove or another embedding you have to download by yourself in the official website. If you have another question or you have an answer for my problem implementing the paper, please let me know. Thank you

Thanks for your code.I have reproduced your code. I would like to ask how the multilabelbinarizer.pickle file is generated. Hope to get your answer.

you can try this code:

import pickle import pandas as pd from sklearn.preprocessing import MultiLabelBinarizer df = pd.readpickle('train.pickle') multilabelbinarizer = MultiLabelBinarizer() multilabelbinarizer.fit(df.label.values) multilabelbinarizer.classes pickle.dump(multilabelbinarizer, open('multilabelbinarizer.pickle', 'wb'))

haoc111 commented 3 years ago

@haoc111 Sorry for slow respon, i just update this repository. You can run this project by running train.ipynb. But i am stll struggling with how to use BERT as embedding. I try using BERT as contextualized embedding and static embedding but the result is far from similar like in the paper. Another problem is the official reuters dataset with 10788 data points that i get from NLTK has not same split like in the paper. As the paper said that it has 8630 - 2158 data split, and the official split from NLTK that i get is 7769 - 3019. Also if you want to use glove or another embedding you have to download by yourself in the official website. If you have another question or you have an answer for my problem implementing the paper, please let me know. Thank you

Thanks for your code.I have reproduced your code. I would like to ask how the multilabelbinarizer.pickle file is generated. Hope to get your answer.

you can try this code:

import pickle import pandas as pd from sklearn.preprocessing import MultiLabelBinarizer df = pd.readpickle('train.pickle') multilabelbinarizer = MultiLabelBinarizer() multilabelbinarizer.fit(df.label.values) multilabelbinarizer.classes pickle.dump(multilabelbinarizer, open('multilabelbinarizer.pickle', 'wb'))

Thank you fou your sharing

haoc111 commented 3 years ago

@haoc111 Sorry for slow respon, i just update this repository. You can run this project by running train.ipynb. But i am stll struggling with how to use BERT as embedding. I try using BERT as contextualized embedding and static embedding but the result is far from similar like in the paper. Another problem is the official reuters dataset with 10788 data points that i get from NLTK has not same split like in the paper. As the paper said that it has 8630 - 2158 data split, and the official split from NLTK that i get is 7769 - 3019. Also if you want to use glove or another embedding you have to download by yourself in the official website.

If you have another question or you have an answer for my problem implementing the paper, please let me know.

Thank you

@haoc111 Sorry for slow respon, i just update this repository. You can run this project by running train.ipynb. But i am stll struggling with how to use BERT as embedding. I try using BERT as contextualized embedding and static embedding but the result is far from similar like in the paper. Another problem is the official reuters dataset with 10788 data points that i get from NLTK has not same split like in the paper. As the paper said that it has 8630 - 2158 data split, and the official split from NLTK that i get is 7769 - 3019. Also if you want to use glove or another embedding you have to download by yourself in the official website.

If you have another question or you have an answer for my problem implementing the paper, please let me know.

Thank you

Excuse me,now I want to try AAPD dataset,but I have been suffered some problem. How is the train.pickle and test.pickle generate? If you could help me,I will appreciate you very much. I want to add capsule network to feature extraction of the model,Does it improve the results?

lincan1 commented 3 years ago

Sorry to interrupt you,can you upload the entire code structure?. I try using BERT as contextualized embedding but the result is far from similar like in the paper.

sk0829 commented 6 months ago

很抱歉打扰您,请上传整个代码结构吗?我尝试使用 BERT 作为上下文化嵌入,但结果与论文中的结果相去甚远。

sk0829 commented 4 months ago

请问用 BERT 作为嵌入,应该怎么运行这个项目呢?

sk0829 commented 1 month ago

请问用BERT作为嵌入呢,应该怎么运行这个项目?