HadoopIt / rnn-nlu

A TensorFlow implementation of Recurrent Neural Networks for Sequence Classification and Sequence Labeling
483 stars 171 forks source link

how to impove predict result? #2

Open liaicheng opened 7 years ago

liaicheng commented 7 years ago

HI, i used your code and train a model. When predict with my test data, intent result seems good, but tagging task seems worse comparing to other mate. I changed some flags parameters, such as double 'batch_size',double 'word_embedding_size', double 'max_training_steps',double 'num_layers',no use. Can you give me other tip? :)

thanks!

HadoopIt commented 7 years ago

Thanks for trying out this sample code. When doing model tuning, typically I will first look at the training curve and validation curve, to see whether the model overfits or underfits. From there, I will tune accordingly to address the problem.

Moreover, the sample code published here does not include the slot label dependency modeling as mentioned in the code comment. If there are strong dependencies among your labels/tags, connecting label/tag output back to RNN hidden state might help.

liaicheng commented 7 years ago

The training like below: image

It seems that test fl score and accuracy is good. I didn't get your meaning:

Moreover, the sample code published here does not include the slot label dependency modeling as mentioned in the code comment. If there are strong dependencies among your labels/tags, connecting label/tag output back to RNN hidden state might help.

PS: I don't know if you know chinese, if you can , may i leave messeage in chinese ?:)

hariom-yadaw commented 7 years ago

@HadoopIt I have some questions as below regarding the data set and implementation on github repo as below.

  1. Since we create vocabulary only from training input seq(train.seq.in), the new names for the 2 slots(from_City & to_City), will be taken as _UNK in vocabulary. Having said this, will RNN be able to detect this slots correctly?

  2. If YES, then how actually it does so to detect slots correctly without having the word in vocab build from training data ? Whats the logic behind it?

  3. I trained on the data set in your github repo, I was not able to get good f1 score or accuracy. What are the most effective ways to improve it.

If you have free time, I request you to please answer my questions above. I want to learn about these. Thanks for your time and concern.

hariom-yadaw commented 7 years ago

@liaicheng To get the F1 score calculated properly from conlleval.pl, do we need to have some fixed format for 'slots' ? Because I didn't have the same format in my dataset slots, and somehow its printing F1 scores = 0.00 (but actually F1 score is not zero when I manually check the predicted slots in test_results)

hariom-yadaw commented 7 years ago

@HadoopIt @liaicheng Have anyone tried using pre-trained embedding word vector to improve the accuracy ? how to do this and where to add this embedding in this project ? Thanks!

liaicheng commented 7 years ago

@hariom-yadaw actually i process the slot data as the sample show, so i can get the f1 score. i didn't make pre-trained,,accuracy seems not very well. i think you can try to change training parameters and do it again, maybe something would be better.

HadoopIt commented 7 years ago

@hariom-yadaw Slot labels for words that do not appear in training set might be inferred from the structure in the sequence, e.g. structure like "flight from A to B".

Pre-trained word embedding can be fed to the graph when you do tf.session.run, just like feeding other input values. More details: http://stackoverflow.com/questions/35687678/using-a-pre-trained-word-embedding-word2vec-or-glove-in-tensorflow

hariom-yadaw commented 7 years ago

Thanks a lot @HadoopIt In case we use pre-trained embedding, do we need to change other things in code like vocab will be taken from pre-trained file and not from train.seq.in ?

hariom-yadaw commented 7 years ago

@HadoopIt Also I get weird results as below when I run it on my dataset. the Training perplexity(loss) 1.00 even though the accuracy and tagging is NOT 100%. Does Training perplexity comes from contribution of Intent(accuracy) and slot(f1-score) both ? or only slot tagging(f1-score)

Also why the accuracy doesn't follow monotonous behaviour ? It decreases and decreases in between. what could be the reason behind this ?

I get good tagging but somehow conlleval.py not able to calculate f1 score, may be due to the way I name my slots as shown below in my dataset?? But when I manually check its not zero, it gives good results although not 100% correct.

global step 300 step-time 0.08. Training perplexity 1.53 Eval accuracy: 85.71 18/21 Eval f1-score: 0.00 Test accuracy: 77.78 14/18 Test f1-score: 0.00 global step 600 step-time 0.09. Training perplexity 1.06 Eval accuracy: 85.71 18/21 Eval f1-score: 0.00 Test accuracy: 83.33 15/18 Test f1-score: 0.00 global step 900 step-time 0.09. Training perplexity 1.02 Eval accuracy: 85.71 18/21 Eval f1-score: 0.00 Test accuracy: 88.89 16/18 Test f1-score: 0.00 global step 1200 step-time 0.09. Training perplexity 1.01 Eval accuracy: 85.71 18/21 Eval f1-score: 0.00 Test accuracy: 83.33 15/18 Test f1-score: 0.00 global step 1500 step-time 0.09. Training perplexity 1.01 Eval accuracy: 90.48 19/21 Eval f1-score: 0.00 Test accuracy: 83.33 15/18 Test f1-score: 0.00 global step 1800 step-time 0.09. Training perplexity 1.00 Eval accuracy: 85.71 18/21 Eval f1-score: 0.00 Test accuracy: 83.33 15/18 Test f1-score: 0.00 global step 2100 step-time 0.09. Training perplexity 1.00 Eval accuracy: 85.71 18/21 Eval f1-score: 0.00 Test accuracy: 83.33 15/18 Test f1-score: 0.00 global step 2400 step-time 0.09. Training perplexity 1.00 Eval accuracy: 85.71 18/21 Eval f1-score: 0.00 Test accuracy: 83.33 15/18 Test f1-score: 0.00 global step 2700 step-time 0.09. Training perplexity 1.00 Eval accuracy: 80.95 17/21 Eval f1-score: 0.00 Test accuracy: 88.89 16/18 Test f1-score: 0.00 global step 3000 step-time 0.09. Training perplexity 1.00 Eval accuracy: 85.71 18/21 Eval f1-score: 0.00 Test accuracy: 88.89 16/18 Test f1-score: 0.00

My Test data set.

test.seq.in ............................................ can you give me a ppt on Plantation ? I want your help on Higher Education. I expect you to help me on Education System. Show me some data on Mahabharat. I need your help on Moon. what can you show me on Sun ? I would like to have your views on Water Pollution. What ideas do you have on Traffic System. Hey Hey, what is your name ? you can call me Pankaj. You can have my name as Gaurav. what help you can provide ? what you are good at ? Entertain me. Tell me a joke. Hi! Thanks for your great help.

test.seq.out(slot tagging) (Note- some of the sentences dont have any slot tagging, only intent detection) .............................................................. 0 0 0 0 0 0 0 B_eduTopic 0 0 0 0 0 0 B_eduTopic E_eduTopic 0 0 0 0 0 0 0 0 B_eduTopic E_eduTopic 0 0 0 0 0 0 B_eduTopic 0 0 0 0 0 0 B_eduTopic 0 0 0 0 0 0 0 B_eduTopic 0 0 0 0 0 0 0 0 0 B_eduTopic E_eduTopic 0 0 0 0 0 0 0 B_eduTopic E_eduTopic 0 0 0 0 0 0 0 0 0 0 0 0 0 B_user.name 0 0 0 0 0 0 0 B_user.name 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0