Open haoc111 opened 3 years ago
@haoc111 Sorry for slow respon, i just update this repository. You can run this project by running train.ipynb. But i am stll struggling with how to use BERT as embedding. I try using BERT as contextualized embedding and static embedding but the result is far from similar like in the paper. Another problem is the official reuters dataset with 10788 data points that i get from NLTK has not same split like in the paper. As the paper said that it has 8630 - 2158 data split, and the official split from NLTK that i get is 7769 - 3019. Also if you want to use glove or another embedding you have to download by yourself in the official website.
If you have another question or you have an answer for my problem implementing the paper, please let me know.
Thank you
@haoc111 Sorry for slow respon, i just update this repository. You can run this project by running train.ipynb. But i am stll struggling with how to use BERT as embedding. I try using BERT as contextualized embedding and static embedding but the result is far from similar like in the paper. Another problem is the official reuters dataset with 10788 data points that i get from NLTK has not same split like in the paper. As the paper said that it has 8630 - 2158 data split, and the official split from NLTK that i get is 7769 - 3019. Also if you want to use glove or another embedding you have to download by yourself in the official website.
If you have another question or you have an answer for my problem implementing the paper, please let me know.
Thank you
Thanks for your sharing.I will try to use your code to reproduce your work. Now,I am still research on it.I'll share with you if I make any progress. Thanks again for your sharing.
@haoc111 Sorry for slow respon, i just update this repository. You can run this project by running train.ipynb. But i am stll struggling with how to use BERT as embedding. I try using BERT as contextualized embedding and static embedding but the result is far from similar like in the paper. Another problem is the official reuters dataset with 10788 data points that i get from NLTK has not same split like in the paper. As the paper said that it has 8630 - 2158 data split, and the official split from NLTK that i get is 7769 - 3019. Also if you want to use glove or another embedding you have to download by yourself in the official website.
If you have another question or you have an answer for my problem implementing the paper, please let me know.
Thank you
Thanks for your code.I have reproduced your code. I would like to ask how the multilabelbinarizer.pickle file is generated. Hope to get your answer.
@haoc111 Sorry for slow respon, i just update this repository. You can run this project by running train.ipynb. But i am stll struggling with how to use BERT as embedding. I try using BERT as contextualized embedding and static embedding but the result is far from similar like in the paper. Another problem is the official reuters dataset with 10788 data points that i get from NLTK has not same split like in the paper. As the paper said that it has 8630 - 2158 data split, and the official split from NLTK that i get is 7769 - 3019. Also if you want to use glove or another embedding you have to download by yourself in the official website. If you have another question or you have an answer for my problem implementing the paper, please let me know. Thank you
Thanks for your code.I have reproduced your code. I would like to ask how the multilabelbinarizer.pickle file is generated. Hope to get your answer.
you can try this code:
import pickle import pandas as pd from sklearn.preprocessing import MultiLabelBinarizer df = pd.readpickle('train.pickle') multilabelbinarizer = MultiLabelBinarizer() multilabelbinarizer.fit(df.label.values) multilabelbinarizer.classes pickle.dump(multilabelbinarizer, open('multilabelbinarizer.pickle', 'wb'))
@haoc111 Sorry for slow respon, i just update this repository. You can run this project by running train.ipynb. But i am stll struggling with how to use BERT as embedding. I try using BERT as contextualized embedding and static embedding but the result is far from similar like in the paper. Another problem is the official reuters dataset with 10788 data points that i get from NLTK has not same split like in the paper. As the paper said that it has 8630 - 2158 data split, and the official split from NLTK that i get is 7769 - 3019. Also if you want to use glove or another embedding you have to download by yourself in the official website. If you have another question or you have an answer for my problem implementing the paper, please let me know. Thank you
Thanks for your code.I have reproduced your code. I would like to ask how the multilabelbinarizer.pickle file is generated. Hope to get your answer.
you can try this code:
import pickle import pandas as pd from sklearn.preprocessing import MultiLabelBinarizer df = pd.readpickle('train.pickle') multilabelbinarizer = MultiLabelBinarizer() multilabelbinarizer.fit(df.label.values) multilabelbinarizer.classes pickle.dump(multilabelbinarizer, open('multilabelbinarizer.pickle', 'wb'))
Thank you fou your sharing
@haoc111 Sorry for slow respon, i just update this repository. You can run this project by running train.ipynb. But i am stll struggling with how to use BERT as embedding. I try using BERT as contextualized embedding and static embedding but the result is far from similar like in the paper. Another problem is the official reuters dataset with 10788 data points that i get from NLTK has not same split like in the paper. As the paper said that it has 8630 - 2158 data split, and the official split from NLTK that i get is 7769 - 3019. Also if you want to use glove or another embedding you have to download by yourself in the official website.
If you have another question or you have an answer for my problem implementing the paper, please let me know.
Thank you
@haoc111 Sorry for slow respon, i just update this repository. You can run this project by running train.ipynb. But i am stll struggling with how to use BERT as embedding. I try using BERT as contextualized embedding and static embedding but the result is far from similar like in the paper. Another problem is the official reuters dataset with 10788 data points that i get from NLTK has not same split like in the paper. As the paper said that it has 8630 - 2158 data split, and the official split from NLTK that i get is 7769 - 3019. Also if you want to use glove or another embedding you have to download by yourself in the official website.
If you have another question or you have an answer for my problem implementing the paper, please let me know.
Thank you
Excuse me,now I want to try AAPD dataset,but I have been suffered some problem. How is the train.pickle and test.pickle generate? If you could help me,I will appreciate you very much. I want to add capsule network to feature extraction of the model,Does it improve the results?
Sorry to interrupt you,can you upload the entire code structure?. I try using BERT as contextualized embedding but the result is far from similar like in the paper.
很抱歉打扰您,请上传整个代码结构吗?我尝试使用 BERT 作为上下文化嵌入,但结果与论文中的结果相去甚远。
请问用 BERT 作为嵌入,应该怎么运行这个项目呢?
请问用BERT作为嵌入呢,应该怎么运行这个项目?
Hello, thanks you for your great work! Now,I am doing research on multi label classification. Can you upload the entire code structure? I want to reproduce your work! Thank you!