brightmart / text_classification

all kinds of text classification models and more with deep learning
MIT License
7.83k stars 2.57k forks source link

model accuracy is very low,i don't know why? #74

Closed kevinsay closed 5 years ago

kevinsay commented 6 years ago

I load the zhihu-word2vec-title-desc.bin-100 as the wordvector file,train-zhihu4-only-title-all.txt as the trainning file,set multi_label_flag=false,use_embedding=true, a01_FastText a03_TextRNN a04_TextRCNN a05_HierarchicalAttentionNetwork a06_Seq2seqWithAttention these models can run,but the accuracy is very low,i don't know why. and predict,also set multi_label_flag=false,use_embedding=true,there will be more than one prediction label,i need you help.thanks.

f20500909 commented 6 years ago

may i ask you where did you find zhihu-word2vec-title-desc.bin-100 file. I can find it in the project , did you generate it ? @kevinsay

kevinsay commented 6 years ago

i get it from author Baidu cloud sharing,we can also use google word2vec to generate it.

f20500909 commented 6 years ago

I use create_voabulary funtion to generate vocab_label.pik to substitute zhihu-word2vec-title-desc.bin-100 file. But I can not find the Baidu cloud sharing link in README.md,i think it is very helpful for us to study this project. I would be very grateful if you could share it, can you have a share? @kevinsay

kevinsay commented 6 years ago

link:https://pan.baidu.com/s/1orPKC0cahrIW0CUvPxts1g pwd:bguc @f20500909 i share the file to you,and hope you can share your trainning and predict results with me.

f20500909 commented 6 years ago

Thank you very much. After I comprehend the code and run it accurately. i will share my corpus and trainning and predict results with you It's very greatful of you to share these files
@kevinsay

Thank you for sharing. But after many days of trying, I found it is too hard to understand the code for me.I had given up learning the project so couldn't share with you my results. But I found an equally good project that achieve similar functions and it is easy to learn, and the corpus is also very complete ,so i share with you. I hope it will help you. Thanks for your sharing again. link:https://github.com/zhengwsh/text-classification

@kevinsay

lreaderl commented 6 years ago

Hello, my F1 score is very low on single label classification as below: Epoch 19 Validation Loss:2.709 F1 Score:0.282 Precision:0.169 Recall:0.846 Have you find any solution to that?

brightmart commented 6 years ago

hi, which kinds of data and trining size you use?


发件人: lreaderl notifications@github.com 发送时间: 2018年8月6日 0:11 收件人: brightmart/text_classification 抄送: Subscribed 主题: Re: [brightmart/text_classification] model accuracy is very low,i don't know why? (#74)

Hello, my F1 score is very low on single label classification as below: Epoch 19 Validation Loss:2.709 F1 Score:0.282 Precision:0.169 Recall:0.846 Have you find any solution to that?

― You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/brightmart/text_classification/issues/74#issuecomment-410530419, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ASuYMDD_MNZlJrFH23OnnS_VRVZ28d8kks5uNxk1gaJpZM4Vi0yW.

lreaderl commented 6 years ago

My dataset has 19 classes, with about 100000 training samples. And the average length of training data is about 150.

kevinsay commented 6 years ago

@f20500909 ok,thanks.

kevinsay commented 6 years ago

@brightmart Does the length of training sample affect the accuracy of the model?one of my datasets,the average length of the sample is 10,but i pad_sequences them to 20,50 or 100 when i train model,accuracy is low.

brightmart commented 5 years ago

if implement correctly, pad should have mini impact to performance, as you can mask out the embedding from pad token.